# Problem
Explore Brooklyn and figure which neighbourhood could be ripe for a high-end tiered gym facility. Think Lifetime fitness.

# Background of Brooklyn
Brooklyn is an enormous coastal borough (i.e. on the ocean, a bay, or inlet) located in the state of New York. With a population of 2,648,771 people and 760 constituent neighborhoods, Brooklyn is the largest community in New York. Following are some of the reasons why a high-end tiered gym could be just the right thing for the neighbourhood:

* Brooklyn home prices are not only among the most expensive in New York, but Brooklyn real estate also consistently ranks among the most expensive in America. They can afford a gym membership.

* Brooklyn is a decidedly white-collar borough, with fully 85.21% of the workforce employed in white-collar jobs, well above the national average. Overall, Brooklyn is a borough of professionals, service providers, and sales and office workers. There are especially a lot of people living in Brooklyn who work in office and administrative support (12.49%), sales jobs (9.49%), and management occupations (9.05%). However, one downside of living in Brooklyn is that it can take a long time to commute to work. In Brooklyn, the average commute to work is 40.47 minutes, which is quite a bit higher than the national average. On the other hand, local public transit is widely used in the borough, so leaving the car at home and taking transit is often a viable alternative. In addition, it is also a pedestrian-friendly borough. Many of Brooklyn’s neighborhoods are dense enough and have amenities close enough together that people find it feasible to get around on foot.
This indicates if the gyms are sparse in the neighbourhood than perhaps a lot of these people will skip a workout to be with their families. Hitting the gym that's somewhat at a spitting distance from their homes might just be the key.

* The citizens of Brooklyn are very well educated compared to the average community in the nation: 34.08% of adults in Brooklyn have a bachelor's degree or even advanced degree. Its a known fact the more educated you are the more likely you are going to be health conscious.

SOURCE:https://www.neighborhoodscout.com/ny/brooklyn

# Data that will identify the perfect location to open the gym.

### Data procurement
* A listing of neighborhoods by ZIP Code obtained from:
(https://geo.nyu.edu/catalog/nyu_2451_34572)
* Latitude/Longitude data provided via Cognitive Class (https://cocl.us/Geospatial_data)
* New York city data accessible through the FourSquare API 

### What will we do with the data?

* The data will identify the list of the businesses/venues in the neighbourhood.
* List neighborhoods along with the top 5 most common venues.
* Perhaps use clustering method to determine which neighbourood is best for opening a gym.

### Target Audience
* Attracting people (artists) who are looking for holistic gym environment (think sauna, Yoga studios, weights).
* People who have a busy work schedule (white/blue collar jobs) and therefore can only go to gyms that are closest to their residence.

# METHODOLOGY
We will begin by importing libraries and resources that we will need for the purposes of this project.

In [158]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Collecting package metadata: done
Solving environment: done

# All requested packages already installed.

Collecting package metadata: done
Solving environment: done

# All requested packages already installed.

Libraries imported.


In [160]:
!wget -q -O 'newyork_data.json' https://cocl.us/new_york_dataset
print('Data downloaded!')

Data downloaded!


### Lets load and explore the data.

In [159]:
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

In [6]:
#newyork_data uncomment to make sure the data is loaded.

Notice how all the relevant data is in the features key, which is basically a list of the neighborhoods. So, let's define a new variable that includes this data.

In [161]:
neighborhoods_data = newyork_data['features']

In [162]:
neighborhoods_data[0]

{'type': 'Feature',
 'id': 'nyu_2451_34572.1',
 'geometry': {'type': 'Point',
  'coordinates': [-73.84720052054902, 40.89470517661]},
 'geometry_name': 'geom',
 'properties': {'name': 'Wakefield',
  'stacked': 1,
  'annoline1': 'Wakefield',
  'annoline2': None,
  'annoline3': None,
  'annoangle': 0.0,
  'borough': 'Bronx',
  'bbox': [-73.84720052054902,
   40.89470517661,
   -73.84720052054902,
   40.89470517661]}}

Tranform the data into a pandas dataframe
The next task is essentially transforming this data of nested Python dictionaries into a pandas dataframe. So let's start by creating an empty dataframe.

In [163]:
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

Take a look at the empty dataframe to confirm that the columns are as intended.

In [164]:
neighborhoods

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude


In [165]:
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

Quickly examine the resulting dataframe.

In [166]:
neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


And make sure that the dataset has all 5 boroughs and 306 neighborhoods.

In [167]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(neighborhoods['Borough'].unique()),
        neighborhoods.shape[0]
    )
)

The dataframe has 5 boroughs and 306 neighborhoods.


Folium is a great visualization library.

However, for our current purpose, let's segment and cluster only the neighborhoods in Brooklyn. So let's slice the original dataframe and create a new dataframe of the Brooklyn data.

In [168]:
brooklyn_data = neighborhoods[neighborhoods['Borough'] == 'Brooklyn'].reset_index(drop=True)
brooklyn_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Brooklyn,Bay Ridge,40.625801,-74.030621
1,Brooklyn,Bensonhurst,40.611009,-73.99518
2,Brooklyn,Sunset Park,40.645103,-74.010316
3,Brooklyn,Greenpoint,40.730201,-73.954241
4,Brooklyn,Gravesend,40.59526,-73.973471


### Use geopy library to get the latitude and longitude values of Brooklyn.
In order to define an instance of the geocoder, we need to define a user_agent. We will name our agent ny_explorer, as shown below.

In [169]:
address = 'Brooklyn, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Brooklyn are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Brooklyn are 40.6501038, -73.9495823.


Lets visualize the Brooklyn neighbourhood.

In [170]:
# create map of Brooklyn using latitude and longitude values
map_brooklyn = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(brooklyn_data['Latitude'], brooklyn_data['Longitude'], brooklyn_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_brooklyn)  
    
map_brooklyn

Next, we are going to start utilizing the Foursquare API to explore the neighborhoods and segment them.

### Define Foursquare Credentials and Version

In [171]:
CLIENT_ID = 'GXKGLDVIN2WEQTPHXYSGQ0XSATZ5KRVAGFF5R5GIGE3AISUH' # your Foursquare ID
CLIENT_SECRET = 'CRVI4I0FHNZPZ4FZTEL12SQVAGESQ2YQORRYVYEOJL5HWPUF' # your Foursquare Secret
VERSION = '20190115' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: GXKGLDVIN2WEQTPHXYSGQ0XSATZ5KRVAGFF5R5GIGE3AISUH
CLIENT_SECRET:CRVI4I0FHNZPZ4FZTEL12SQVAGESQ2YQORRYVYEOJL5HWPUF


#### Let's explore the first neighborhood in our dataframe.
Get the neighborhood's name.

In [172]:
brooklyn_data.loc[0, 'Neighborhood']

'Bay Ridge'

Get the neighbourhoods Lat and Long values

In [173]:
neighborhood_latitude = brooklyn_data.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = brooklyn_data.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = brooklyn_data.loc[0, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Bay Ridge are 40.625801065010656, -74.03062069353813.


### Now, let's get the top 100 venues that are in Marble Hill within a radius of 500 meters.
First, let's create the GET request URL. Name your URL url.

In [174]:
# type your answer here
LIMIT=100
radius = 500
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?&client_id=GXKGLDVIN2WEQTPHXYSGQ0XSATZ5KRVAGFF5R5GIGE3AISUH&client_secret=CRVI4I0FHNZPZ4FZTEL12SQVAGESQ2YQORRYVYEOJL5HWPUF&v=20190115&ll=40.625801065010656,-74.03062069353813&radius=500&limit=100'

Send the GET request and examine the resutls

In [175]:
results = requests.get(url).json()
#results

From the Foursquare lab in the previous module, we know that all the information is in the *items* key. Before we proceed, let's borrow the **get_category_type** function from the Foursquare lab.

In [176]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Now we are ready to clean the json and structure it into a *pandas* dataframe.

In [177]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Pilo Arts Day Spa and Salon,Spa,40.624748,-74.030591
1,Cocoa Grinder,Juice Bar,40.623967,-74.030863
2,Bagel Boy,Bagel Shop,40.627896,-74.029335
3,Pegasus Cafe,Breakfast Spot,40.623168,-74.031186
4,Ho' Brah Taco Joint,Taco Place,40.62296,-74.031371


And how many venues were returned by Foursquare?

In [178]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

86 venues were returned by Foursquare.


#### Let's create a function to repeat the same process to all the neighborhoods in Brooklyn

In [179]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Now write the code to run the above function on each neighborhood and create a new dataframe called *brooklyn_venues*.

In [180]:
brooklyn_venues = getNearbyVenues(names=brooklyn_data['Neighborhood'],
                                   latitudes=brooklyn_data['Latitude'],
                                   longitudes=brooklyn_data['Longitude']
                                  )


Bay Ridge
Bensonhurst
Sunset Park
Greenpoint
Gravesend
Brighton Beach
Sheepshead Bay
Manhattan Terrace
Flatbush
Crown Heights
East Flatbush
Kensington
Windsor Terrace
Prospect Heights
Brownsville
Williamsburg
Bushwick
Bedford Stuyvesant
Brooklyn Heights
Cobble Hill
Carroll Gardens
Red Hook
Gowanus
Fort Greene
Park Slope
Cypress Hills
East New York
Starrett City
Canarsie
Flatlands
Mill Island
Manhattan Beach
Coney Island
Bath Beach
Borough Park
Dyker Heights
Gerritsen Beach
Marine Park
Clinton Hill
Sea Gate
Downtown
Boerum Hill
Prospect Lefferts Gardens
Ocean Hill
City Line
Bergen Beach
Midwood
Prospect Park South
Georgetown
East Williamsburg
North Side
South Side
Ocean Parkway
Fort Hamilton
Ditmas Park
Wingate
Rugby
Remsen Village
New Lots
Paerdegat Basin
Mill Basin
Fulton Ferry
Vinegar Hill
Weeksville
Broadway Junction
Dumbo
Homecrest
Highland Park
Madison
Erasmus


In [181]:
brooklyn_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Brooklyn,Bay Ridge,40.625801,-74.030621
1,Brooklyn,Bensonhurst,40.611009,-73.99518
2,Brooklyn,Sunset Park,40.645103,-74.010316
3,Brooklyn,Greenpoint,40.730201,-73.954241
4,Brooklyn,Gravesend,40.59526,-73.973471


#### Let's check the size of the resulting dataframe

In [182]:
print(brooklyn_venues.shape)
brooklyn_venues.head()

(2786, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Bay Ridge,40.625801,-74.030621,Pilo Arts Day Spa and Salon,40.624748,-74.030591,Spa
1,Bay Ridge,40.625801,-74.030621,Cocoa Grinder,40.623967,-74.030863,Juice Bar
2,Bay Ridge,40.625801,-74.030621,Bagel Boy,40.627896,-74.029335,Bagel Shop
3,Bay Ridge,40.625801,-74.030621,Pegasus Cafe,40.623168,-74.031186,Breakfast Spot
4,Bay Ridge,40.625801,-74.030621,Ho' Brah Taco Joint,40.62296,-74.031371,Taco Place


#### Let's group similar venues. Start with a group I am going to call 'Eatery/Bar/Shops'

In [183]:
targets = ['Juice', 'Bagel','Breakfast','Bar','Bagel','Restaurant','Taco','Donut','Bakery','Bistro',
          'Food','Burger','Deli','Pizza','Ice Cream','Café','Cafe','Coffee','Wine','Shop','Joint','Bodega','Dessert']
pattern = '|'.join(targets)
brooklyn_venues.loc[brooklyn_venues['Venue Category'].str.contains(pattern), 'Venue Category'] = 'Eatery/Bar/Shop'


#### The second group is going to be the 'Gym' category with any kind of gym fitness yoga grouped into one.

In [184]:
target = ['Gym', 'Fitness','Yoga','Box','Judo','Karate']
pattern = '|'.join(target)
brooklyn_venues.loc[brooklyn_venues['Venue Category'].str.contains(pattern), 'Venue Category'] = 'Gym'

brooklyn_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Bay Ridge,40.625801,-74.030621,Pilo Arts Day Spa and Salon,40.624748,-74.030591,Spa
1,Bay Ridge,40.625801,-74.030621,Cocoa Grinder,40.623967,-74.030863,Eatery/Bar/Shop
2,Bay Ridge,40.625801,-74.030621,Bagel Boy,40.627896,-74.029335,Eatery/Bar/Shop
3,Bay Ridge,40.625801,-74.030621,Pegasus Cafe,40.623168,-74.031186,Eatery/Bar/Shop
4,Bay Ridge,40.625801,-74.030621,Ho' Brah Taco Joint,40.62296,-74.031371,Eatery/Bar/Shop


#### Let's put that into a *pandas* dataframe
First, let's write a function to sort the venues in descending order.

In [185]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [186]:
neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Bath Beach,Eatery/Bar/Shop,Pharmacy,Women's Store,Kids Store,Shoe Store,Surf Spot,Park,Diner,Video Game Store,Rental Car Location
1,Bay Ridge,Eatery/Bar/Shop,Spa,Playground,Grocery Store,Pharmacy,Sandwich Place,Diner,Department Store,Hotel,Clothing Store
2,Bedford Stuyvesant,Eatery/Bar/Shop,Boutique,Bus Station,Bus Stop,Thrift / Vintage Store,Park,Fruit & Vegetable Store,Waterfront,Fish Market,Garden Center
3,Bensonhurst,Eatery/Bar/Shop,Spa,Grocery Store,Park,Liquor Store,Noodle House,Factory,Butcher,Fish Market,Garden
4,Bergen Beach,Harbor / Marina,Playground,Eatery/Bar/Shop,Athletics & Sports,Baseball Field,Waterfront,Fish Market,Gas Station,Garden Center,Garden
5,Boerum Hill,Eatery/Bar/Shop,Dance Studio,Gym,Spa,Arts & Crafts Store,Athletics & Sports,Kids Store,Sandwich Place,Grocery Store,Martial Arts Dojo
6,Borough Park,Eatery/Bar/Shop,Bank,Pharmacy,Hostel,Hotel,Farmers Market,Garden Center,Garden,Furniture / Home Store,Fruit & Vegetable Store
7,Brighton Beach,Eatery/Bar/Shop,Beach,Pharmacy,Bank,Diner,Convenience Store,Other Great Outdoors,Non-Profit,Grocery Store,Bookstore
8,Broadway Junction,Eatery/Bar/Shop,Diner,Sandwich Place,Bus Station,Moving Target,Garden,Furniture / Home Store,Fruit & Vegetable Store,Fish Market,Field
9,Brooklyn Heights,Eatery/Bar/Shop,Gym,Park,Grocery Store,Diner,Pet Store,Plaza,Scenic Lookout,Mattress Store,Garden


Let's create a new dataframe that includes the cluster as well as the top 5 venues for each neighborhood.

In [103]:
brooklyn_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Bath Beach,50,50,50,50,50,50
Bay Ridge,86,86,86,86,86,86
Bedford Stuyvesant,28,28,28,28,28,28
Bensonhurst,31,31,31,31,31,31
Bergen Beach,6,6,6,6,6,6
Boerum Hill,80,80,80,80,80,80
Borough Park,19,19,19,19,19,19
Brighton Beach,43,43,43,43,43,43
Broadway Junction,16,16,16,16,16,16
Brooklyn Heights,100,100,100,100,100,100


#### Let's find out how many unique categories can be curated from all the returned venues

In [187]:
print('There are {} uniques categories.'.format(len(brooklyn_venues['Venue Category'].unique())))

There are 154 uniques categories.


## Analyze Each Neighborhood

In [188]:
# one hot encoding
brooklyn_onehot = pd.get_dummies(brooklyn_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
brooklyn_onehot['Neighborhood'] = brooklyn_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [brooklyn_onehot.columns[-1]] + list(brooklyn_onehot.columns[:-1])
brooklyn_onehot = brooklyn_onehot[fixed_columns]

brooklyn_onehot.head()

Unnamed: 0,Women's Store,Adult Boutique,Art Gallery,Arts & Crafts Store,Arts & Entertainment,Athletics & Sports,Bank,Baseball Field,Baseball Stadium,Basketball Court,Beach,Beer Garden,Beer Store,Bike Rental / Bike Share,Boat or Ferry,Bookstore,Boutique,Brewery,Bridge,Buffet,Burrito Place,Bus Line,Bus Station,Bus Stop,Butcher,Candy Store,Check Cashing Service,Child Care Service,Church,Clothing Store,Concert Hall,Convenience Store,Coworking Space,Creperie,Cycle Studio,Dance Studio,Department Store,Diner,Discount Store,Distillery,Dog Run,Eatery/Bar/Shop,Electronics Store,Event Service,Event Space,Factory,Farm,Farmers Market,Field,Fish Market,Fruit & Vegetable Store,Furniture / Home Store,Garden,Garden Center,Gas Station,Gastropub,General Entertainment,Golf Course,Grocery Store,Gym,Harbor / Marina,Hardware Store,Health & Beauty Service,Herbs & Spices Store,History Museum,Home Service,Hostel,Hotel,Indie Movie Theater,Indie Theater,Jazz Club,Jewelry Store,Kids Store,Lake,Laser Tag,Laundry Service,Library,Lingerie Store,Liquor Store,Lounge,Market,Martial Arts Dojo,Massage Studio,Mattress Store,Medical Center,Men's Store,Metro Station,Monument / Landmark,Movie Theater,Moving Target,Museum,Music Store,Music Venue,Nail Salon,Neighborhood,Nightclub,Non-Profit,Noodle House,Opera House,Organic Grocery,Other Great Outdoors,Other Nightlife,Outdoors & Recreation,Paper / Office Supplies Store,Park,Pedestrian Plaza,Performing Arts Venue,Pet Store,Pharmacy,Photography Studio,Pilates Studio,Playground,Plaza,Poke Place,Pool,Pool Hall,Pub,Racetrack,Rental Car Location,Residential Building (Apartment / Condo),Rock Club,Roller Rink,Roof Deck,Salad Place,Sandwich Place,Scenic Lookout,Sculpture Garden,Shipping Store,Shoe Store,Skating Rink,Ski Area,Snack Place,Soccer Field,Spa,Speakeasy,Sports Club,Stadium,Steakhouse,Supermarket,Surf Spot,Tattoo Parlor,Tea Room,Tennis Court,Theater,Theme Park Ride / Attraction,Thrift / Vintage Store,Toy / Game Store,Trail,Used Bookstore,Vape Store,Varenyky restaurant,Video Game Store,Video Store,Waterfront
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Bay Ridge,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Bay Ridge,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Bay Ridge,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Bay Ridge,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Bay Ridge,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


And let's examine the new dataframe size

In [189]:
brooklyn_onehot.shape

(2786, 154)

#### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [190]:
brooklyn_grouped = brooklyn_onehot.groupby('Neighborhood').mean().reset_index()
brooklyn_grouped

Unnamed: 0,Neighborhood,Women's Store,Adult Boutique,Art Gallery,Arts & Crafts Store,Arts & Entertainment,Athletics & Sports,Bank,Baseball Field,Baseball Stadium,Basketball Court,Beach,Beer Garden,Beer Store,Bike Rental / Bike Share,Boat or Ferry,Bookstore,Boutique,Brewery,Bridge,Buffet,Burrito Place,Bus Line,Bus Station,Bus Stop,Butcher,Candy Store,Check Cashing Service,Child Care Service,Church,Clothing Store,Concert Hall,Convenience Store,Coworking Space,Creperie,Cycle Studio,Dance Studio,Department Store,Diner,Discount Store,Distillery,Dog Run,Eatery/Bar/Shop,Electronics Store,Event Service,Event Space,Factory,Farm,Farmers Market,Field,Fish Market,Fruit & Vegetable Store,Furniture / Home Store,Garden,Garden Center,Gas Station,Gastropub,General Entertainment,Golf Course,Grocery Store,Gym,Harbor / Marina,Hardware Store,Health & Beauty Service,Herbs & Spices Store,History Museum,Home Service,Hostel,Hotel,Indie Movie Theater,Indie Theater,Jazz Club,Jewelry Store,Kids Store,Lake,Laser Tag,Laundry Service,Library,Lingerie Store,Liquor Store,Lounge,Market,Martial Arts Dojo,Massage Studio,Mattress Store,Medical Center,Men's Store,Metro Station,Monument / Landmark,Movie Theater,Moving Target,Museum,Music Store,Music Venue,Nail Salon,Nightclub,Non-Profit,Noodle House,Opera House,Organic Grocery,Other Great Outdoors,Other Nightlife,Outdoors & Recreation,Paper / Office Supplies Store,Park,Pedestrian Plaza,Performing Arts Venue,Pet Store,Pharmacy,Photography Studio,Pilates Studio,Playground,Plaza,Poke Place,Pool,Pool Hall,Pub,Racetrack,Rental Car Location,Residential Building (Apartment / Condo),Rock Club,Roller Rink,Roof Deck,Salad Place,Sandwich Place,Scenic Lookout,Sculpture Garden,Shipping Store,Shoe Store,Skating Rink,Ski Area,Snack Place,Soccer Field,Spa,Speakeasy,Sports Club,Stadium,Steakhouse,Supermarket,Surf Spot,Tattoo Parlor,Tea Room,Tennis Court,Theater,Theme Park Ride / Attraction,Thrift / Vintage Store,Toy / Game Store,Trail,Used Bookstore,Vape Store,Varenyky restaurant,Video Game Store,Video Store,Waterfront
0,Bath Beach,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.64,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.02,0.06,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0
1,Bay Ridge,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011628,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011628,0.0,0.0,0.0,0.0,0.0,0.011628,0.0,0.0,0.0,0.0,0.0,0.0,0.011628,0.023256,0.0,0.0,0.0,0.616279,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.011628,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011628,0.0,0.0,0.0,0.0,0.011628,0.0,0.0,0.0,0.0,0.011628,0.0,0.011628,0.0,0.0,0.0,0.011628,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011628,0.0,0.0,0.0,0.023256,0.0,0.0,0.023256,0.0,0.0,0.0,0.011628,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.011628,0.0,0.0,0.011628,0.0,0.05814,0.0,0.0,0.0,0.0,0.011628,0.0,0.0,0.011628,0.0,0.0,0.0,0.0,0.011628,0.0,0.0,0.0,0.0,0.011628,0.0,0.0
2,Bedford Stuyvesant,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.035714,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.785714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Bensonhurst,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.677419,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.064516,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.0,0.064516,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.064516,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Bergen Beach,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Boerum Hill,0.0,0.0,0.0,0.025,0.0,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0125,0.0,0.0,0.0,0.0125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0125,0.0125,0.0,0.0,0.0,0.0,0.0375,0.0125,0.0,0.0,0.0,0.0,0.525,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.0375,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0125,0.0,0.0,0.0,0.0,0.025,0.0,0.0,0.0,0.0,0.0,0.0125,0.0,0.0125,0.025,0.0,0.0,0.0,0.0125,0.0,0.0,0.0,0.0,0.0,0.0125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0125,0.0,0.0125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0375,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0125,0.0125,0.0,0.0125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Borough Park,0.0,0.0,0.0,0.0,0.0,0.0,0.157895,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.631579,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.105263,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Brighton Beach,0.0,0.0,0.0,0.0,0.0,0.0,0.046512,0.0,0.0,0.0,0.069767,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.55814,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.046512,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.0
8,Broadway Junction,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.5625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Brooklyn Heights,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.59,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.1,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.04,0.0,0.0,0.02,0.01,0.0,0.01,0.01,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Lets confirm the new size.

In [191]:
brooklyn_grouped.shape

(70, 154)

### Let's print each neighbourhood along with the top 5 most common values.

In [192]:
num_top_venues = 5

for hood in brooklyn_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = brooklyn_grouped[brooklyn_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Bath Beach----
             venue  freq
0  Eatery/Bar/Shop  0.64
1         Pharmacy  0.06
2    Women's Store  0.04
3       Kids Store  0.04
4       Shoe Store  0.02


----Bay Ridge----
             venue  freq
0  Eatery/Bar/Shop  0.62
1              Spa  0.06
2         Pharmacy  0.02
3    Grocery Store  0.02
4       Playground  0.02


----Bedford Stuyvesant----
                    venue  freq
0         Eatery/Bar/Shop  0.79
1  Thrift / Vintage Store  0.04
2                Boutique  0.04
3                    Park  0.04
4             Bus Station  0.04


----Bensonhurst----
             venue  freq
0  Eatery/Bar/Shop  0.68
1              Spa  0.06
2    Grocery Store  0.06
3             Park  0.06
4     Noodle House  0.03


----Bergen Beach----
                venue  freq
0     Harbor / Marina  0.33
1  Athletics & Sports  0.17
2          Playground  0.17
3      Baseball Field  0.17
4     Eatery/Bar/Shop  0.17


----Boerum Hill----
             venue  freq
0  Eatery/Bar/Shop  0.52
1    

#### Let's put that into a *pandas* dataframe

First, let's write a function to sort the venues in descending order.

In [193]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [194]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = brooklyn_grouped['Neighborhood']

for ind in np.arange(brooklyn_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(brooklyn_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Bath Beach,Eatery/Bar/Shop,Pharmacy,Women's Store,Kids Store,Shoe Store,Surf Spot,Park,Diner,Video Game Store,Rental Car Location
1,Bay Ridge,Eatery/Bar/Shop,Spa,Playground,Grocery Store,Pharmacy,Sandwich Place,Diner,Department Store,Hotel,Clothing Store
2,Bedford Stuyvesant,Eatery/Bar/Shop,Boutique,Bus Station,Bus Stop,Thrift / Vintage Store,Park,Fruit & Vegetable Store,Waterfront,Fish Market,Garden Center
3,Bensonhurst,Eatery/Bar/Shop,Spa,Grocery Store,Park,Liquor Store,Noodle House,Factory,Butcher,Fish Market,Garden
4,Bergen Beach,Harbor / Marina,Playground,Eatery/Bar/Shop,Athletics & Sports,Baseball Field,Waterfront,Fish Market,Gas Station,Garden Center,Garden


## Cluster Neighborhoods

Run *k*-means to cluster the neighborhood into 5 clusters.

In [195]:
# set number of clusters
kclusters =2

brooklyn_grouped_clustering = brooklyn_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(brooklyn_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 1, 1, 1, 0, 1, 1, 1, 1, 1], dtype=int32)

In [196]:
neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Bath Beach,Eatery/Bar/Shop,Pharmacy,Women's Store,Kids Store,Shoe Store,Surf Spot,Park,Diner,Video Game Store,Rental Car Location
1,Bay Ridge,Eatery/Bar/Shop,Spa,Playground,Grocery Store,Pharmacy,Sandwich Place,Diner,Department Store,Hotel,Clothing Store
2,Bedford Stuyvesant,Eatery/Bar/Shop,Boutique,Bus Station,Bus Stop,Thrift / Vintage Store,Park,Fruit & Vegetable Store,Waterfront,Fish Market,Garden Center
3,Bensonhurst,Eatery/Bar/Shop,Spa,Grocery Store,Park,Liquor Store,Noodle House,Factory,Butcher,Fish Market,Garden
4,Bergen Beach,Harbor / Marina,Playground,Eatery/Bar/Shop,Athletics & Sports,Baseball Field,Waterfront,Fish Market,Gas Station,Garden Center,Garden


Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [197]:
brooklyn_merged = brooklyn_data

# add clustering labels
brooklyn_merged['Cluster Labels'] = kmeans.labels_

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
brooklyn_merged = brooklyn_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

brooklyn_merged.head() # check the last columns!

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Brooklyn,Bay Ridge,40.625801,-74.030621,1,Eatery/Bar/Shop,Spa,Playground,Grocery Store,Pharmacy,Sandwich Place,Diner,Department Store,Hotel,Clothing Store
1,Brooklyn,Bensonhurst,40.611009,-73.99518,1,Eatery/Bar/Shop,Spa,Grocery Store,Park,Liquor Store,Noodle House,Factory,Butcher,Fish Market,Garden
2,Brooklyn,Sunset Park,40.645103,-74.010316,1,Eatery/Bar/Shop,Bank,Pharmacy,Gym,Women's Store,Grocery Store,Stadium,Sandwich Place,Shoe Store,Video Game Store
3,Brooklyn,Greenpoint,40.730201,-73.954241,1,Eatery/Bar/Shop,Gym,Sandwich Place,Furniture / Home Store,Grocery Store,Spa,Boutique,Playground,Coworking Space,Organic Grocery
4,Brooklyn,Gravesend,40.59526,-73.973471,0,Eatery/Bar/Shop,Lounge,Spa,Bus Station,Pharmacy,Music Venue,Baseball Field,Grocery Store,Furniture / Home Store,Fish Market


#### Finally, let's visualize the resulting clusters

In [198]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(brooklyn_merged['Latitude'], brooklyn_merged['Longitude'], brooklyn_merged['Neighborhood'], brooklyn_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Examine Clusters

Now, you can examine each cluster and determine the discriminating venue categories that distinguish each cluster. Based on the defining categories, you can then assign a name to each cluster. I will leave this exercise to you.

#### Cluster 1

In [199]:
brooklyn_merged.loc[brooklyn_merged['Cluster Labels'] == 0, brooklyn_merged.columns[[1] + list(range(5, brooklyn_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,Gravesend,Eatery/Bar/Shop,Lounge,Spa,Bus Station,Pharmacy,Music Venue,Baseball Field,Grocery Store,Furniture / Home Store,Fish Market
10,East Flatbush,Eatery/Bar/Shop,Check Cashing Service,Department Store,Pharmacy,Moving Target,Park,Supermarket,Hardware Store,Waterfront,Furniture / Home Store
14,Brownsville,Eatery/Bar/Shop,Moving Target,Playground,Pool,Discount Store,Plaza,Pharmacy,Trail,Performing Arts Venue,Park
17,Bedford Stuyvesant,Eatery/Bar/Shop,Boutique,Bus Station,Bus Stop,Thrift / Vintage Store,Park,Fruit & Vegetable Store,Waterfront,Fish Market,Garden Center
23,Fort Greene,Eatery/Bar/Shop,Gym,Performing Arts Venue,Theater,Spa,Opera House,Pet Store,Museum,Furniture / Home Store,Farmers Market
24,Park Slope,Eatery/Bar/Shop,Pet Store,Spa,Bookstore,Pub,Gym,Pilates Studio,Hardware Store,Organic Grocery,Boutique
29,Flatlands,Eatery/Bar/Shop,Pharmacy,Discount Store,Nightclub,Bus Stop,Sandwich Place,Paper / Office Supplies Store,Park,Lounge,Video Store
31,Manhattan Beach,Eatery/Bar/Shop,Sandwich Place,Playground,Bus Stop,Harbor / Marina,Beach,Waterfront,Garden,Furniture / Home Store,Fruit & Vegetable Store
32,Coney Island,Eatery/Bar/Shop,Beach,Theme Park Ride / Attraction,Brewery,Monument / Landmark,Sandwich Place,Other Great Outdoors,Skating Rink,Baseball Stadium,Music Venue
38,Clinton Hill,Eatery/Bar/Shop,Gym,Grocery Store,Sandwich Place,Diner,Arts & Crafts Store,Pet Store,Lounge,Convenience Store,Clothing Store


#### Cluster 2

In [200]:
brooklyn_merged.loc[brooklyn_merged['Cluster Labels'] == 1, brooklyn_merged.columns[[1] + list(range(5, brooklyn_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Bay Ridge,Eatery/Bar/Shop,Spa,Playground,Grocery Store,Pharmacy,Sandwich Place,Diner,Department Store,Hotel,Clothing Store
1,Bensonhurst,Eatery/Bar/Shop,Spa,Grocery Store,Park,Liquor Store,Noodle House,Factory,Butcher,Fish Market,Garden
2,Sunset Park,Eatery/Bar/Shop,Bank,Pharmacy,Gym,Women's Store,Grocery Store,Stadium,Sandwich Place,Shoe Store,Video Game Store
3,Greenpoint,Eatery/Bar/Shop,Gym,Sandwich Place,Furniture / Home Store,Grocery Store,Spa,Boutique,Playground,Coworking Space,Organic Grocery
5,Brighton Beach,Eatery/Bar/Shop,Beach,Pharmacy,Bank,Diner,Convenience Store,Other Great Outdoors,Non-Profit,Grocery Store,Bookstore
6,Sheepshead Bay,Eatery/Bar/Shop,Buffet,Sandwich Place,Beach,Diner,Creperie,Park,Grocery Store,Gym,Harbor / Marina
7,Manhattan Terrace,Eatery/Bar/Shop,Nightclub,Jazz Club,Grocery Store,Bank,Convenience Store,Bus Station,Steakhouse,Metro Station,Fruit & Vegetable Store
8,Flatbush,Eatery/Bar/Shop,Pharmacy,Sandwich Place,Plaza,Liquor Store,Lounge,Bank,Waterfront,Garden,Furniture / Home Store
9,Crown Heights,Eatery/Bar/Shop,Museum,Playground,Metro Station,Supermarket,Candy Store,Farmers Market,Pharmacy,Event Space,Factory
11,Kensington,Eatery/Bar/Shop,Grocery Store,Sandwich Place,Racetrack,Music Venue,Furniture / Home Store,Spa,Metro Station,Supermarket,Outdoors & Recreation


#### Now we find out how many neighborhoods have gyms.

##### Cluster 1

In [201]:
Cluster1=brooklyn_merged.loc[brooklyn_merged['Cluster Labels'] == 0, brooklyn_merged.columns[[1] + list(range(5, brooklyn_merged.shape[1]))]]
Cluster1.iloc[ np.flatnonzero((Cluster1=='Gym').values)//Cluster1.shape[1], 
    np.unique(np.flatnonzero((Cluster1=='Gym').values)%Cluster1.shape[1]) ]

Unnamed: 0,2nd Most Common Venue,4th Most Common Venue,6th Most Common Venue,7th Most Common Venue
23,Gym,Theater,Opera House,Pet Store
24,Pet Store,Bookstore,Gym,Pilates Studio
38,Gym,Sandwich Place,Arts & Crafts Store,Pet Store
40,Sandwich Place,Gym,Performing Arts Venue,Diner
51,Gym,Steakhouse,Music Store,Sandwich Place
52,Gym,Bus Station,General Entertainment,Steakhouse
60,Pharmacy,Bus Station,Grocery Store,Gym
69,Gym,Music Venue,Grocery Store,Convenience Store


In [202]:
#Number of neighborhoods with gym in Cluster 1
ClusterGym1 = Cluster1.iloc[ np.flatnonzero((Cluster1=='Gym').values)//Cluster1.shape[1], 
              np.unique(np.flatnonzero((Cluster1=='Gym').values)%Cluster1.shape[1]) ].shape[0]
ClusterGym1

8

##### Cluster 2

In [203]:
Cluster2=brooklyn_merged.loc[brooklyn_merged['Cluster Labels'] == 1, brooklyn_merged.columns[[1] + list(range(5, brooklyn_merged.shape[1]))]]
Cluster2.iloc[ np.flatnonzero((Cluster2=='Gym').values)//Cluster2.shape[1], 
    np.unique(np.flatnonzero((Cluster2=='Gym').values)%Cluster2.shape[1]) ]

Unnamed: 0,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,6th Most Common Venue,8th Most Common Venue,9th Most Common Venue
2,Bank,Pharmacy,Gym,Grocery Store,Sandwich Place,Shoe Store
3,Gym,Sandwich Place,Furniture / Home Store,Spa,Playground,Coworking Space
6,Buffet,Sandwich Place,Beach,Creperie,Grocery Store,Gym
13,Gym,Diner,Garden Center,Clothing Store,Sandwich Place,Fish Market
15,Gym,Diner,Pet Store,Steakhouse,Event Space,Park
16,Discount Store,Thrift / Vintage Store,Shoe Store,Grocery Store,Gym,Playground
18,Gym,Park,Grocery Store,Pet Store,Scenic Lookout,Mattress Store
19,Gym,Playground,Boutique,Park,Diner,Pilates Studio
20,Gym,Spa,Bank,Bookstore,Playground,Diner
22,Furniture / Home Store,Gym,Art Gallery,Steakhouse,Dance Studio,Rock Club


In [204]:
#Number of neighborhoods with gym in Cluster 2
ClusterGym2 = Cluster2.iloc[ np.flatnonzero((Cluster2=='Gym').values)//Cluster2.shape[1], 
              np.unique(np.flatnonzero((Cluster2=='Gym').values)%Cluster2.shape[1]) ].shape[0]
ClusterGym2

24

### Results and Recommendations

In [205]:
#Number of neighborhoods with gym in Cluster 1
ClusterGym1 = Cluster1.iloc[ np.flatnonzero((Cluster1=='Gym').values)//Cluster1.shape[1], 
              np.unique(np.flatnonzero((Cluster1=='Gym').values)%Cluster1.shape[1]) ].shape[0]
#Total Number of neighborhoods in Cluster 1
Cluster1Tot = Cluster1.shape[0]

#Number of neighborhoods with gym in Cluster 2
ClusterGym2 = Cluster2.iloc[ np.flatnonzero((Cluster2=='Gym').values)//Cluster2.shape[1], 
              np.unique(np.flatnonzero((Cluster2=='Gym').values)%Cluster2.shape[1]) ].shape[0]
#Total Number of neighborhoods in Cluster 2
Cluster2Tot = Cluster2.shape[0]

###Results
print('{0:.2f}%'.format(((1-(ClusterGym1 /Cluster1Tot)) * 100)),'neighbourhoods in Cluster 1 donot have' 
      ' a gym that falls into the top ten most frequented places which is definitely an opportunity compared to',
       '{0:.2f}%'.format(((1-(ClusterGym2  /Cluster2Tot)) * 100)),'in Cluster 2 neighborhoods')


61.90% neighbourhoods in Cluster 1 donot have a gym that falls into the top ten most frequented places which is definitely an opportunity compared to 51.02% in Cluster 2 neighborhoods


# DISCUSSION

We have come up with lot of neighbourhoods in the Brooklyn borough where we can open a gym. Some addition information that could possibly help further narrow down the neighbourhood would be household income in each neighbourhood of the borough. 

The clustering technique used for the most frequently visited spots in neighbourhoods helped find potential neighbourhoods which would be good for a new Gym facility.

# CONCLUSION

The neighborhoods in Cluster 1 give us a higher probability of opening successful gym facility because of the dearth of gyms in those areas.