# Introduction

   In this project, we will be looking to help PO LLC. find the optimal neighborhood to open up a new, modern bar in either Toronto or New York City.  The business problem that we would like to figure out will be: how can we use readily available data on the internet to determine where the best spot to open up such a bar would be?  As these cities are highly populated and diverse, it will be interesting to figure out within these two cities where there will be the least competition (or a higher need) for a bar.  

   After initial research, PO LLC. has decided that they would like to open up a bar in either the boroughs of Central Toronto, or Bronx.  It is our goal to figure out which specific neighborhood will be best suited for the opening of this new restaurant.  It is the firm's goal to find the most optimal spot for profit potential so that it will be relatively easy to convince potential stakeholders to invest in our new establishment.  
   
   As we have just discussed, it is going to be incredibly important for PO LLC. to have financial backings from our stakeholders.  Furthermore, we must not forget who this new bar will impact the most, which will be residents of the neighborhood in which the bar will be opening.  This poses some questions.  Will the neighborhood accept that a new, modern bar will be opening up in their area of residence?  Will this cause problems with the local authorities?  Will the local government allow for our establishment to open?  All of these questions we will attempt to answer utilizing data and research.  

# Data

The data we will be utilizing to answer this question will be found primarily from:

 - List of Postal Codes of Canada from wikipedia
     - This wiki webpage will allow for us to combine geographical data from Toronto with the data from the foursquare API essentially allowing us to look into potential spots to open up a bar in Central Toronto.
 - JSON file of NYC boroughs
     - This JSON file will allow us to access geospatial data of the New York City boroughs and neighborhoods.  Similarly to the Toronto wiki webpage, this well help us look into potential spots to open up a bar in the Bronx.
 - Foursquare API Data
     - The Foursquare API data will allow us to navigate through the geospatial data of the boroughs/neighborhoods that we will be looking at.  Specifically the foursquare API data will be intrinsic in finding spots where there may be high demand for a modern bar, as we will get to see specific venue types/locations in our areas of interest.
 - NYC Demographic Data
     - The NYC Demographic Data will allow us to further dive into the needs of our target market for the Bronx.  Specifically, we will need to know the population of certain neighborhoods in order to make an intelligent guess on where profit will be highest once we have narrowed down our search for the correct neighborhood.

# Methodology

###### Libraries 

In [98]:
# Important libraries that will be used for the project.
import numpy as np 
import pandas as pd 
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
from pandas.io.json import json_normalize
import json 
!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim
import requests 
from pandas.io.json import json_normalize 
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
!conda install -c conda-forge folium=0.5.0 --yes
import folium
from bs4 import BeautifulSoup 
import requests

print('All the libraries are now imported.')

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

All the libraries are now imported.


### Data Prepping and Wrangling

###### Bronx Dataset

In [73]:
# Get NYC Data from Geojson file
with open('ny-geojson.json') as json_data:
    newyork_data = json.load(json_data)

# Demographics (used later)
NYC_Demographics = pd.read_csv('New_York_City_Population_By_Neighborhood_Tabulation_Areas.csv')

# Use 'features' key and view first data point.
neighborhoods_data = newyork_data['features']
neighborhoods_data[0]

{'type': 'Feature',
 'id': 'nyu_2451_34572.1',
 'geometry': {'type': 'Point',
  'coordinates': [-73.84720052054902, 40.89470517661]},
 'geometry_name': 'geom',
 'properties': {'name': 'Wakefield',
  'stacked': 1,
  'annoline1': 'Wakefield',
  'annoline2': None,
  'annoline3': None,
  'annoangle': 0.0,
  'borough': 'Bronx',
  'bbox': [-73.84720052054902,
   40.89470517661,
   -73.84720052054902,
   40.89470517661]}}

In [74]:
# Define df columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# Create df using column_names for the columns
NYC_neighborhoods = pd.DataFrame(columns=column_names)
NYC_neighborhoods

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude


In [75]:
# We will now create a loop to fill the NYC_neighborhoods df. 
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough']
    neighborhood_name = data['properties']['name']  
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    NYC_neighborhoods = NYC_neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

NYC_neighborhoods.head()    

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


In [76]:
# Create df with only Bronx neighborhoods
Bronx_data = NYC_neighborhoods[NYC_neighborhoods['Borough'] == 'Bronx'].reset_index(drop=True)
Bronx_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


###### Central Toronto Dataset

In [77]:
# get the data from wiki using Soup library
res = requests.get("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M")
soup = BeautifulSoup(res.content,'lxml')
table = soup.find_all('table')[0] 

# turn table into df
table_df = pd.read_html(str(table))
df = table_df[0]
df = df.groupby('Postal Code').agg({'Borough': 'first', 'Neighbourhood':','.join }).reset_index()
dframe=df[(df['Borough']!='Not assigned') & (df['Neighbourhood']!='Not assigned')]
dframe
dframe.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
1,M1B,Scarborough,"Malvern, Rouge"
2,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek"
3,M1E,Scarborough,"Guildwood, Morningside, West Hill"
4,M1G,Scarborough,Woburn
5,M1H,Scarborough,Cedarbrae


In [82]:
# Combine Toronto df with geojson data

lats=[]
lngs =[]
lats.clear()
lngs.clear()
postcodes = dframe['Postal Code'].tolist()
gsdata_df=pd.read_csv('https://cocl.us/Geospatial_data')
postalcodes=gsdata_df['Postal Code'].tolist()
latitudes=gsdata_df['Latitude'].tolist()
longitudes=gsdata_df['Longitude'].tolist()
zp=zip(postalcodes,latitudes,longitudes)
for ps,la,lg in zp:
  for postcode in postcodes:
    if postcode == ps:
      lats.append(la)
      lngs.append(lg)
dframe['Latitude']= lats
dframe['Longitude']=lngs
dframe.head()    

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
1,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
2,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
3,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
4,M1G,Scarborough,Woburn,43.770992,-79.216917
5,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


In [83]:
# Rename 'Neighbourhood' to 'Neighborhood' because I am American :)
dframe.rename(columns = {'Neighbourhood':'Neighborhood'}, inplace = True)
# Get df
dframe.head()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,


Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
1,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
2,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
3,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
4,M1G,Scarborough,Woburn,43.770992,-79.216917
5,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


In [84]:
# Return only Central Toronto Neighborhoods
CentralToronto = dframe[dframe['Borough'] == 'Central Toronto'].reset_index(drop=True)
CentralToronto.head()

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879
1,M4P,Central Toronto,Davisville North,43.712751,-79.390197
2,M4R,Central Toronto,"North Toronto West, Lawrence Park",43.715383,-79.405678
3,M4S,Central Toronto,Davisville,43.704324,-79.38879
4,M4T,Central Toronto,"Moore Park, Summerhill East",43.689574,-79.38316


###### Foursquare geolocation data for the Bronx

In [85]:
# My foursquare credentials
CLIENT_ID = 'N2CTAOVIS4PH3ZVMTGJAD4NOHVGZZLIFWDMKDDODXZNLFMU3' 
CLIENT_SECRET = 'ORPUSW0INDQP0PKAB1SUDEFECYX020RJEQ0IHODZRO11VJ2W' 
VERSION = '20180605' 

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: N2CTAOVIS4PH3ZVMTGJAD4NOHVGZZLIFWDMKDDODXZNLFMU3
CLIENT_SECRET:ORPUSW0INDQP0PKAB1SUDEFECYX020RJEQ0IHODZRO11VJ2W


In [86]:
def getNearbyVenues(names, latitudes, longitudes, radius=500, LIMIT =100):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [87]:
# Get venues in the Bronx
Bronx_venues = getNearbyVenues(names=Bronx_data['Neighborhood'],
                                   latitudes=Bronx_data['Latitude'],
                                   longitudes=Bronx_data['Longitude']
                                  )

# View the df
Bronx_venues.head()

Wakefield
Co-op City
Eastchester
Fieldston
Riverdale
Kingsbridge
Woodlawn
Norwood
Williamsbridge
Baychester
Pelham Parkway
City Island
Bedford Park
University Heights
Morris Heights
Fordham
East Tremont
West Farms
High  Bridge
Melrose
Mott Haven
Port Morris
Longwood
Hunts Point
Morrisania
Soundview
Clason Point
Throgs Neck
Country Club
Parkchester
Westchester Square
Van Nest
Morris Park
Belmont
Spuyten Duyvil
North Riverdale
Pelham Bay
Schuylerville
Edgewater Park
Castle Hill
Olinville
Pelham Gardens
Concourse
Unionport
Edenwald
Claremont Village
Concourse Village
Mount Eden
Mount Hope
Bronxdale
Allerton
Kingsbridge Heights


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Wakefield,40.894705,-73.847201,Lollipops Gelato,40.894123,-73.845892,Dessert Shop
1,Wakefield,40.894705,-73.847201,Carvel Ice Cream,40.890487,-73.848568,Ice Cream Shop
2,Wakefield,40.894705,-73.847201,Walgreens,40.896528,-73.8447,Pharmacy
3,Wakefield,40.894705,-73.847201,Rite Aid,40.896649,-73.844846,Pharmacy
4,Wakefield,40.894705,-73.847201,Shell,40.894187,-73.845862,Gas Station


###### Foursquare geolocation data for Central Toronto

In [88]:
# Get venues in the Bronx using 'getNearbyVenues' function that has been defined.
CentralToronto_venues = getNearbyVenues(names=CentralToronto['Neighborhood'],
                                   latitudes=CentralToronto['Latitude'],
                                   longitudes=CentralToronto['Longitude']
                                  )

# View the df
CentralToronto_venues.head()

Lawrence Park
Davisville North
North Toronto West, Lawrence Park
Davisville
Moore Park, Summerhill East
Summerhill West, Rathnelly, South Hill, Forest Hill SE, Deer Park
Roselawn
Forest Hill North & West, Forest Hill Road Park
The Annex, North Midtown, Yorkville


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Lawrence Park,43.72802,-79.38879,Lawrence Park Ravine,43.726963,-79.394382,Park
1,Lawrence Park,43.72802,-79.38879,Zodiac Swim School,43.728532,-79.38286,Swim School
2,Lawrence Park,43.72802,-79.38879,TTC Bus #162 - Lawrence-Donway,43.728026,-79.382805,Bus Line
3,Davisville North,43.712751,-79.390197,Sherwood Park,43.716551,-79.387776,Park
4,Davisville North,43.712751,-79.390197,Summerhill Market North,43.715499,-79.392881,Food & Drink Shop


##### Exploratory Data Analysis

In [89]:
# Bronx EDA
print(Bronx_venues.shape)
print('There are {} unique venue categories.'.format(len(Bronx_venues['Venue Category'].unique())))
print('There are {} unique venues.'.format(len(Bronx_venues['Venue'].unique())))
print('There are {} unique Neighborhoods.'.format(len(Bronx_venues['Neighborhood'].unique())))
Bronx_venues.groupby('Neighborhood').count().head()

(1214, 7)
There are 172 unique venue categories.
There are 879 unique venues.
There are 52 unique Neighborhoods.


Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Allerton,28,28,28,28,28,28
Baychester,20,20,20,20,20,20
Bedford Park,34,34,34,34,34,34
Belmont,100,100,100,100,100,100
Bronxdale,12,12,12,12,12,12


In [90]:
# Toronto EDA
print(CentralToronto_venues.shape)
print('There are {} unique venue categories.'.format(len(CentralToronto_venues['Venue Category'].unique())))
print('There are {} unique venues.'.format(len(CentralToronto_venues['Venue'].unique())))
print('There are {} unique Neighborhoods.'.format(len(CentralToronto_venues['Neighborhood'].unique())))
CentralToronto_venues.groupby('Neighborhood').count().head()

(111, 7)
There are 64 unique venue categories.
There are 99 unique venues.
There are 9 unique Neighborhoods.


Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Davisville,33,33,33,33,33,33
Davisville North,7,7,7,7,7,7
"Forest Hill North & West, Forest Hill Road Park",4,4,4,4,4,4
Lawrence Park,3,3,3,3,3,3
"Moore Park, Summerhill East",4,4,4,4,4,4


From our exploratory data analysis, we can tell that the Bronx venues dataset is much larger than the Central Toronto venues dataset.  This most likely has to do with the population in the Bronx being much larger than that of Central Toronto.  Furthermore, the difference in 'unique venues' is much larger in the Bronx dataset.  This could also be attributed to the difference in diversity accross the two borroughs, since the Bronx is a far more diverse location.

Let's continue our analysis with a more statistical, in depth look at the two datasets.

### Statistical Analysis (Exploration)

In [91]:
# Using one hot encoding to analyze each neighborhood in the Bronx

Bronx_onehot = pd.get_dummies(Bronx_venues[['Venue Category']], prefix="", prefix_sep="")
Bronx_onehot['Neighborhood'] = Bronx_venues['Neighborhood'] 
fixed_columns = [Bronx_onehot.columns[-1]] + list(Bronx_onehot.columns[:-1])
Bronx_onehot = Bronx_onehot[fixed_columns]

# Group By each neighborhood

Bronx_grouped = Bronx_onehot.groupby('Neighborhood').mean().reset_index()
print(Bronx_grouped.shape)
Bronx_grouped.head()

(52, 173)


Unnamed: 0,Neighborhood,Accessories Store,African Restaurant,American Restaurant,Arcade,Arepa Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Basketball Court,Beer Bar,Boat or Ferry,Bookstore,Bowling Alley,Breakfast Spot,Brewery,Buffet,Building,Burger Joint,Burrito Place,Bus Line,Bus Station,Bus Stop,Business Service,Café,Candy Store,Caribbean Restaurant,Check Cashing Service,Cheese Shop,Chinese Restaurant,Clothing Store,Coffee Shop,Comfort Food Restaurant,Construction & Landscaping,Convenience Store,Cosmetics Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Distillery,Dive Bar,Doctor's Office,Dog Run,Donut Shop,Eastern European Restaurant,Electronics Store,Farmers Market,Fast Food Restaurant,Fish Market,Food,Food & Drink Shop,Food Truck,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Gas Station,Gift Shop,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Health & Beauty Service,Historic Site,History Museum,Home Service,Hookah Bar,Hotel,IT Services,Ice Cream Shop,Indian Restaurant,Indie Theater,Intersection,Italian Restaurant,Japanese Restaurant,Jewelry Store,Juice Bar,Kids Store,Korean Restaurant,Lake,Latin American Restaurant,Laundromat,Lawyer,Liquor Store,Lounge,Market,Martial Arts Dojo,Mattress Store,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Moving Target,Music Venue,Nail Salon,Nightclub,Office,Optical Shop,Other Great Outdoors,Outdoors & Recreation,Outlet Store,Paella Restaurant,Paper / Office Supplies Store,Park,Performing Arts Venue,Peruvian Restaurant,Pet Store,Pharmacy,Piano Bar,Pizza Place,Platform,Playground,Plaza,Pool,Post Office,Pub,Recreation Center,Rental Car Location,Restaurant,River,Salon / Barbershop,Sandwich Place,School,Seafood Restaurant,Shipping Store,Shoe Store,Shopping Mall,Smoke Shop,Social Club,Soup Place,South American Restaurant,Southern / Soul Food Restaurant,Spa,Spanish Restaurant,Sporting Goods Shop,Sports Bar,Sports Club,Storage Facility,Supermarket,Supplement Shop,Sushi Restaurant,Tattoo Parlor,Tennis Court,Tennis Stadium,Thai Restaurant,Thrift / Vintage Store,Track,Trail,Train Station,Used Auto Dealership,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Waste Facility,Weight Loss Center,Wings Joint,Women's Store
0,Allerton,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.107143,0.035714,0.035714,0.0,0.035714,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.035714,0.0,0.035714,0.0,0.0,0.0,0.035714,0.0,0.0,0.035714,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.142857,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Baychester,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.1,0.0,0.05,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.05,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Bedford Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.117647,0.0,0.0,0.117647,0.029412,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.029412,0.0,0.088235,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Belmont,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.03,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.01,0.01,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.08,0.01,0.03,0.01,0.01,0.0,0.0,0.0,0.0,0.03,0.01,0.01,0.0,0.01,0.02,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.18,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.01,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.09,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0
4,Bronxdale,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.083333,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Let's do the same for Central Toronto.

In [92]:
# Using one hot encoding to analyze each neighborhood in the Bronx

CentralToronto_onehot = pd.get_dummies(CentralToronto_venues[['Venue Category']], prefix="", prefix_sep="")
CentralToronto_onehot['Neighborhood'] = CentralToronto_venues['Neighborhood'] 
fixed_columns = [CentralToronto_onehot.columns[-1]] + list(CentralToronto_onehot.columns[:-1])
CentralToronto_onehot = CentralToronto_onehot[fixed_columns]

# Group By each neighborhood

CentralToronto_grouped = CentralToronto_onehot.groupby('Neighborhood').mean().reset_index()
print(CentralToronto_grouped.shape)
CentralToronto_grouped.head()

(9, 65)


Unnamed: 0,Neighborhood,American Restaurant,Asian Restaurant,BBQ Joint,Bagel Shop,Bank,Breakfast Spot,Brewery,Burger Joint,Bus Line,Café,Chinese Restaurant,Clothing Store,Coffee Shop,Cosmetics Shop,Department Store,Dessert Shop,Diner,Donut Shop,Farmers Market,Fast Food Restaurant,Flower Shop,Food & Drink Shop,Fried Chicken Joint,Garden,Gas Station,Gourmet Shop,Greek Restaurant,Gym,Gym / Fitness Center,History Museum,Hotel,Indian Restaurant,Indoor Play Area,Italian Restaurant,Jewelry Store,Lawyer,Light Rail Station,Liquor Store,Mexican Restaurant,Middle Eastern Restaurant,Music Venue,Park,Pet Store,Pharmacy,Pizza Place,Pub,Restaurant,Salon / Barbershop,Sandwich Place,Seafood Restaurant,Shoe Store,Spa,Sporting Goods Shop,Sports Bar,Supermarket,Sushi Restaurant,Swim School,Tennis Court,Thai Restaurant,Toy / Game Store,Trail,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Yoga Studio
0,Davisville,0.0,0.0,0.0,0.0,0.0,0.0,0.030303,0.0,0.0,0.060606,0.0,0.0,0.060606,0.0,0.0,0.090909,0.030303,0.0,0.030303,0.0,0.0,0.0,0.0,0.0,0.030303,0.030303,0.030303,0.060606,0.0,0.0,0.0,0.030303,0.030303,0.060606,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.030303,0.0,0.030303,0.090909,0.0,0.030303,0.0,0.090909,0.030303,0.0,0.0,0.0,0.0,0.0,0.060606,0.0,0.0,0.030303,0.030303,0.0,0.0,0.0,0.0
1,Davisville North,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Forest Hill North & West, Forest Hill Road Park",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0
3,Lawrence Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"Moore Park, Summerhill East",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.25,0.0,0.0,0.0


Let's look at the top 10 venues for all the neighborhoods in both the Bronx and Central Toronto.  We define a new function and create new dataframes for both the Bronx and Central Toronto to help us visualize the top venues.

Let's start with the Bronx:

In [93]:
# function
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]


# table
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns 
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create new dataframe
Bronx_venues_sorted = pd.DataFrame(columns=columns)
Bronx_venues_sorted['Neighborhood'] = Bronx_grouped['Neighborhood']

for ind in np.arange(Bronx_grouped.shape[0]):
    Bronx_venues_sorted.iloc[ind, 1:] = return_most_common_venues(Bronx_grouped.iloc[ind, :], num_top_venues)

Bronx_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Allerton,Pizza Place,Deli / Bodega,Supermarket,Cosmetics Shop,Donut Shop,Fried Chicken Joint,Food,Fast Food Restaurant,Bus Station,Chinese Restaurant
1,Baychester,Donut Shop,Men's Store,Supermarket,Pizza Place,Convenience Store,Discount Store,Other Great Outdoors,Sandwich Place,Electronics Store,Fast Food Restaurant
2,Bedford Park,Diner,Deli / Bodega,Pizza Place,Sandwich Place,Chinese Restaurant,Mexican Restaurant,Park,Bus Station,Pub,Pharmacy
3,Belmont,Italian Restaurant,Pizza Place,Deli / Bodega,Bakery,Donut Shop,Grocery Store,Bank,Mexican Restaurant,Dessert Shop,Coffee Shop
4,Bronxdale,Italian Restaurant,Gym,Performing Arts Venue,Chinese Restaurant,Paper / Office Supplies Store,Eastern European Restaurant,Spanish Restaurant,Mexican Restaurant,Supermarket,Bank


Now, Central Toronto:

In [94]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns 
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create new dataframe
CentralToronto_venues_sorted = pd.DataFrame(columns=columns)
CentralToronto_venues_sorted['Neighborhood'] = CentralToronto_grouped['Neighborhood']

for ind in np.arange(CentralToronto_grouped.shape[0]):
    CentralToronto_venues_sorted.iloc[ind, 1:] = return_most_common_venues(CentralToronto_grouped.iloc[ind, :], num_top_venues)

CentralToronto_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Davisville,Pizza Place,Dessert Shop,Sandwich Place,Gym,Italian Restaurant,Sushi Restaurant,Coffee Shop,Café,Restaurant,Greek Restaurant
1,Davisville North,Department Store,Food & Drink Shop,Gym / Fitness Center,Hotel,Sandwich Place,Breakfast Spot,Park,Farmers Market,Fast Food Restaurant,Flower Shop
2,"Forest Hill North & West, Forest Hill Road Park",Trail,Park,Jewelry Store,Sushi Restaurant,Yoga Studio,Food & Drink Shop,Donut Shop,Farmers Market,Fast Food Restaurant,Flower Shop
3,Lawrence Park,Park,Swim School,Bus Line,Yoga Studio,Fried Chicken Joint,Farmers Market,Fast Food Restaurant,Flower Shop,Food & Drink Shop,Garden
4,"Moore Park, Summerhill East",Trail,Park,Tennis Court,Lawyer,Yoga Studio,Food & Drink Shop,Donut Shop,Farmers Market,Fast Food Restaurant,Flower Shop


### Statistical Analysis using K-means

We will now use a statistical analysis method called K-means to cluster our neighborhoods.  We will feed our data into a machine learning algorithm that will, essentially, show us similarities and differences between all of the neighborhoods.

###### Bronx Clustering

In [101]:
kclusters = 5
Bronx_grouped_clustering = Bronx_grouped.drop('Neighborhood', 1)
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(Bronx_grouped_clustering)
kmeans.labels_[0:10]

# Merge tables to get latitude and longitude
Bronx_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
Bronx_merged = Bronx_data
Bronx_merged = Bronx_merged.join(Bronx_venues_sorted.set_index('Neighborhood'), on='Neighborhood')
Bronx_merged.head() 

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Bronx,Wakefield,40.894705,-73.847201,1,Pharmacy,Donut Shop,Ice Cream Shop,Deli / Bodega,Dessert Shop,Laundromat,Sandwich Place,Gas Station,Women's Store,Fish Market
1,Bronx,Co-op City,40.874294,-73.829939,2,Bus Station,Fast Food Restaurant,Accessories Store,Bagel Shop,Pharmacy,Post Office,Park,Discount Store,Restaurant,Pizza Place
2,Bronx,Eastchester,40.887556,-73.827806,1,Caribbean Restaurant,Deli / Bodega,Diner,Cosmetics Shop,Metro Station,Convenience Store,Donut Shop,Seafood Restaurant,Bus Stop,Bus Station
3,Bronx,Fieldston,40.895437,-73.905643,0,Music Venue,River,Bus Station,Plaza,Donut Shop,Food & Drink Shop,Food,Fish Market,Fast Food Restaurant,Farmers Market
4,Bronx,Riverdale,40.890834,-73.912585,2,Bus Station,Park,Gym,Plaza,Baseball Field,Bank,Food Truck,Women's Store,Food,Fish Market


Cluster 1:

In [102]:
Bronx_merged.loc[Bronx_merged['Cluster Labels'] == 0, Bronx_merged.columns[[1] + list(range(5, Bronx_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,Fieldston,Music Venue,River,Bus Station,Plaza,Donut Shop,Food & Drink Shop,Food,Fish Market,Fast Food Restaurant,Farmers Market


Cluster 2:

In [103]:
Bronx_merged.loc[Bronx_merged['Cluster Labels'] == 1, Bronx_merged.columns[[1] + list(range(5, Bronx_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Wakefield,Pharmacy,Donut Shop,Ice Cream Shop,Deli / Bodega,Dessert Shop,Laundromat,Sandwich Place,Gas Station,Women's Store,Fish Market
2,Eastchester,Caribbean Restaurant,Deli / Bodega,Diner,Cosmetics Shop,Metro Station,Convenience Store,Donut Shop,Seafood Restaurant,Bus Stop,Bus Station
5,Kingsbridge,Pizza Place,Deli / Bodega,Bar,Mexican Restaurant,Sandwich Place,Bakery,Latin American Restaurant,Supermarket,Pharmacy,Donut Shop
6,Woodlawn,Deli / Bodega,Pizza Place,Playground,Pub,Food & Drink Shop,Bar,Italian Restaurant,Bakery,Cosmetics Shop,Park
7,Norwood,Pizza Place,Park,Chinese Restaurant,Bank,Pharmacy,Liquor Store,Deli / Bodega,Coffee Shop,Restaurant,Sandwich Place
9,Baychester,Donut Shop,Men's Store,Supermarket,Pizza Place,Convenience Store,Discount Store,Other Great Outdoors,Sandwich Place,Electronics Store,Fast Food Restaurant
10,Pelham Parkway,Italian Restaurant,Frozen Yogurt Shop,Deli / Bodega,Pizza Place,Chinese Restaurant,Sushi Restaurant,Home Service,Liquor Store,Gas Station,Metro Station
11,City Island,Seafood Restaurant,Harbor / Marina,Thrift / Vintage Store,Deli / Bodega,Boat or Ferry,Italian Restaurant,Diner,Smoke Shop,Pizza Place,Music Venue
12,Bedford Park,Diner,Deli / Bodega,Pizza Place,Sandwich Place,Chinese Restaurant,Mexican Restaurant,Park,Bus Station,Pub,Pharmacy
13,University Heights,Pizza Place,Convenience Store,Bakery,Burrito Place,Pharmacy,Fried Chicken Joint,Sandwich Place,Optical Shop,Donut Shop,Supermarket


Cluster 3:

In [104]:
Bronx_merged.loc[Bronx_merged['Cluster Labels'] == 2, Bronx_merged.columns[[1] + list(range(5, Bronx_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Co-op City,Bus Station,Fast Food Restaurant,Accessories Store,Bagel Shop,Pharmacy,Post Office,Park,Discount Store,Restaurant,Pizza Place
4,Riverdale,Bus Station,Park,Gym,Plaza,Baseball Field,Bank,Food Truck,Women's Store,Food,Fish Market
17,West Farms,Bus Station,Chinese Restaurant,Park,Coffee Shop,Outdoors & Recreation,Sandwich Place,Donut Shop,Basketball Court,Lounge,Bank
24,Morrisania,Bus Station,Discount Store,Donut Shop,Pizza Place,Fast Food Restaurant,Grocery Store,Fish Market,Ice Cream Shop,Pharmacy,Bowling Alley
25,Soundview,Chinese Restaurant,Grocery Store,Fried Chicken Joint,Latin American Restaurant,Bus Station,Breakfast Spot,Bus Stop,Basketball Court,Burger Joint,Liquor Store
26,Clason Point,Park,Convenience Store,Grocery Store,Pool,South American Restaurant,Boat or Ferry,Bus Stop,Business Service,Fish Market,Fast Food Restaurant
34,Spuyten Duyvil,Park,Pizza Place,Bank,Tennis Court,Tennis Stadium,Grocery Store,Thai Restaurant,Intersection,Pharmacy,Diner
42,Concourse,Grocery Store,Pizza Place,Bus Station,Bakery,Italian Restaurant,Supermarket,Fried Chicken Joint,Fast Food Restaurant,Electronics Store,Donut Shop
44,Edenwald,Fried Chicken Joint,Supermarket,Fish Market,Bus Station,Grocery Store,Eastern European Restaurant,Food & Drink Shop,Food,Fast Food Restaurant,Farmers Market
45,Claremont Village,Grocery Store,Bus Station,Chinese Restaurant,Bakery,Deli / Bodega,Gym,Food,Liquor Store,Park,Caribbean Restaurant


Cluster 4:

In [105]:
Bronx_merged.loc[Bronx_merged['Cluster Labels'] == 3, Bronx_merged.columns[[1] + list(range(5, Bronx_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
8,Williamsbridge,Convenience Store,Nightclub,Soup Place,Bar,Caribbean Restaurant,Eastern European Restaurant,Food & Drink Shop,Food,Fish Market,Fast Food Restaurant


Cluster 5:

In [106]:
Bronx_merged.loc[Bronx_merged['Cluster Labels'] == 4, Bronx_merged.columns[[1] + list(range(5, Bronx_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
39,Castle Hill,Market,Pizza Place,Diner,Pharmacy,Bank,Women's Store,Food & Drink Shop,Food,Fish Market,Fast Food Restaurant


###### Central Toronto Clustering

In [110]:
kclusters = 5
CentralToronto_grouped_clustering = CentralToronto_grouped.drop('Neighborhood', 1)
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(CentralToronto_grouped_clustering)
kmeans.labels_[0:10]

# Merge tables to get latitude and longitude
CentralToronto_venues_sorted.insert(0, 'Cluster', kmeans.labels_)
CentralToronto_merged = CentralToronto
CentralToronto_merged = CentralToronto_merged.join(CentralToronto_venues_sorted.set_index('Neighborhood'), on='Neighborhood')
CentralToronto_merged.head() 

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Cluster,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879,0,0,Park,Swim School,Bus Line,Yoga Studio,Fried Chicken Joint,Farmers Market,Fast Food Restaurant,Flower Shop,Food & Drink Shop,Garden
1,M4P,Central Toronto,Davisville North,43.712751,-79.390197,1,1,Department Store,Food & Drink Shop,Gym / Fitness Center,Hotel,Sandwich Place,Breakfast Spot,Park,Farmers Market,Fast Food Restaurant,Flower Shop
2,M4R,Central Toronto,"North Toronto West, Lawrence Park",43.715383,-79.405678,1,1,Clothing Store,Coffee Shop,Café,Gym / Fitness Center,Fast Food Restaurant,Diner,Mexican Restaurant,Park,Pet Store,Cosmetics Shop
3,M4S,Central Toronto,Davisville,43.704324,-79.38879,1,1,Pizza Place,Dessert Shop,Sandwich Place,Gym,Italian Restaurant,Sushi Restaurant,Coffee Shop,Café,Restaurant,Greek Restaurant
4,M4T,Central Toronto,"Moore Park, Summerhill East",43.689574,-79.38316,3,3,Trail,Park,Tennis Court,Lawyer,Yoga Studio,Food & Drink Shop,Donut Shop,Farmers Market,Fast Food Restaurant,Flower Shop


Cluster 1:

In [111]:
CentralToronto_merged.loc[CentralToronto_merged['Cluster Labels'] == 0, CentralToronto_merged.columns[[1] + list(range(5, CentralToronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Central Toronto,0,0,Park,Swim School,Bus Line,Yoga Studio,Fried Chicken Joint,Farmers Market,Fast Food Restaurant,Flower Shop,Food & Drink Shop,Garden


Cluster 2:

In [112]:
CentralToronto_merged.loc[CentralToronto_merged['Cluster Labels'] == 1, CentralToronto_merged.columns[[1] + list(range(5, CentralToronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Central Toronto,1,1,Department Store,Food & Drink Shop,Gym / Fitness Center,Hotel,Sandwich Place,Breakfast Spot,Park,Farmers Market,Fast Food Restaurant,Flower Shop
2,Central Toronto,1,1,Clothing Store,Coffee Shop,Café,Gym / Fitness Center,Fast Food Restaurant,Diner,Mexican Restaurant,Park,Pet Store,Cosmetics Shop
3,Central Toronto,1,1,Pizza Place,Dessert Shop,Sandwich Place,Gym,Italian Restaurant,Sushi Restaurant,Coffee Shop,Café,Restaurant,Greek Restaurant
5,Central Toronto,1,1,Pub,Coffee Shop,Sports Bar,Vietnamese Restaurant,Fried Chicken Joint,Light Rail Station,Liquor Store,Pizza Place,Restaurant,American Restaurant
8,Central Toronto,1,1,Sandwich Place,Café,Coffee Shop,Indian Restaurant,History Museum,Flower Shop,Liquor Store,Donut Shop,Middle Eastern Restaurant,Park


Cluster 3:

In [113]:
CentralToronto_merged.loc[CentralToronto_merged['Cluster Labels'] == 2, CentralToronto_merged.columns[[1] + list(range(5, CentralToronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,Central Toronto,2,2,Garden,Music Venue,Yoga Studio,Hotel,Gym / Fitness Center,Gym,Greek Restaurant,Gourmet Shop,Gas Station,Fried Chicken Joint


Cluster 4:

In [114]:
CentralToronto_merged.loc[CentralToronto_merged['Cluster Labels'] == 3, CentralToronto_merged.columns[[1] + list(range(5, CentralToronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,Central Toronto,3,3,Trail,Park,Tennis Court,Lawyer,Yoga Studio,Food & Drink Shop,Donut Shop,Farmers Market,Fast Food Restaurant,Flower Shop


Cluster 5:

In [115]:
CentralToronto_merged.loc[CentralToronto_merged['Cluster Labels'] == 4, CentralToronto_merged.columns[[1] + list(range(5, CentralToronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
7,Central Toronto,4,4,Trail,Park,Jewelry Store,Sushi Restaurant,Yoga Studio,Food & Drink Shop,Donut Shop,Farmers Market,Fast Food Restaurant,Flower Shop


###### Visualization (Bronx)

In [116]:
# Bronx Coordinates
bronx_address = 'Bronx, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(bronx_address)
latitude1 = location.latitude
longitude1 = location.longitude
print('The geograpical coordinates of the Bronx is: {}, {}.'.format(latitude1, longitude1))

The geograpical coordinates of the Bronx is: 40.8466508, -73.8785937.


In [117]:
map_Bronx = folium.Map(location=[latitude1, longitude1], zoom_start=11)

# add markers to map
for lat, lng, label in zip(Bronx_data['Latitude'], Bronx_data['Longitude'], Bronx_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_Bronx)  
    
map_Bronx

###### Visualization (Central Toronto)

In [118]:
# Central Toronto Coordinates
CentralToronto_address = 'Central Toronto, Toronto'

geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(CentralToronto_address)
latitude2 = location.latitude
longitude2 = location.longitude
print('The geograpical coordinates of Central Toronto is: {}, {}.'.format(latitude2, longitude2))

The geograpical coordinates of Central Toronto is: 43.6534817, -79.3839347.


In [119]:
map_CentralToronto = folium.Map(location=[latitude2, longitude2], zoom_start=11)

# add markers to map
for lat, lng, label in zip(CentralToronto['Latitude'], CentralToronto['Longitude'], CentralToronto['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_CentralToronto)  
    
map_CentralToronto

In [120]:
# Get list of venues in both the Bronx and Central Toronto
Bronx_column_list=Bronx_grouped.columns
list_of_Bronx_venues=Bronx_column_list.tolist()

CentralToronto_column_list=CentralToronto_grouped.columns
list_of_CentralToronto_venues=CentralToronto_column_list.tolist()

print(len(list_of_Bronx_venues[1:]))
print(len(list_of_CentralToronto_venues[1:]))

172
64


Visualizing Clusters in the Bronx

In [121]:
map_clusters1 = folium.Map(location=[latitude1, longitude1], zoom_start=11)


x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]


markers_colors = []
for lat, lon, poi, cluster in zip(Bronx_merged['Latitude'], Bronx_merged['Longitude'], Bronx_merged['Neighborhood'], Bronx_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters1)
       
map_clusters1

Visualizing clusters in Central Toronto:

In [122]:
map_clusters2 = folium.Map(location=[latitude2, longitude2], zoom_start=11)


x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]


markers_colors = []
for lat, lon, poi, cluster in zip(CentralToronto_merged['Latitude'], CentralToronto_merged['Longitude'], CentralToronto_merged['Neighborhood'], CentralToronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters2)
       
map_clusters2

Now, we will go ahead and stack the two datasets to further help us decide where it makes the most sense to put our bar.  

Let's do some data cleaning to make the columns in the dfs identical:

In [124]:
Bronx_merged2 = Bronx_merged.drop(columns=['Cluster Labels'], axis = 1)
CentralToronto_merged2 = CentralToronto_merged.drop(columns =['Postal Code', 'Cluster Labels'], axis =1)

Stack the dfs:

In [125]:
stack = pd.concat([Bronx_merged2, CentralToronto_merged2], axis=0).reset_index()
stack.drop(columns=['index'], axis = 1, inplace = True)
stack

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster
0,Bronx,Wakefield,40.894705,-73.847201,Pharmacy,Donut Shop,Ice Cream Shop,Deli / Bodega,Dessert Shop,Laundromat,Sandwich Place,Gas Station,Women's Store,Fish Market,
1,Bronx,Co-op City,40.874294,-73.829939,Bus Station,Fast Food Restaurant,Accessories Store,Bagel Shop,Pharmacy,Post Office,Park,Discount Store,Restaurant,Pizza Place,
2,Bronx,Eastchester,40.887556,-73.827806,Caribbean Restaurant,Deli / Bodega,Diner,Cosmetics Shop,Metro Station,Convenience Store,Donut Shop,Seafood Restaurant,Bus Stop,Bus Station,
3,Bronx,Fieldston,40.895437,-73.905643,Music Venue,River,Bus Station,Plaza,Donut Shop,Food & Drink Shop,Food,Fish Market,Fast Food Restaurant,Farmers Market,
4,Bronx,Riverdale,40.890834,-73.912585,Bus Station,Park,Gym,Plaza,Baseball Field,Bank,Food Truck,Women's Store,Food,Fish Market,
5,Bronx,Kingsbridge,40.881687,-73.902818,Pizza Place,Deli / Bodega,Bar,Mexican Restaurant,Sandwich Place,Bakery,Latin American Restaurant,Supermarket,Pharmacy,Donut Shop,
6,Bronx,Woodlawn,40.898273,-73.867315,Deli / Bodega,Pizza Place,Playground,Pub,Food & Drink Shop,Bar,Italian Restaurant,Bakery,Cosmetics Shop,Park,
7,Bronx,Norwood,40.877224,-73.879391,Pizza Place,Park,Chinese Restaurant,Bank,Pharmacy,Liquor Store,Deli / Bodega,Coffee Shop,Restaurant,Sandwich Place,
8,Bronx,Williamsbridge,40.881039,-73.857446,Convenience Store,Nightclub,Soup Place,Bar,Caribbean Restaurant,Eastern European Restaurant,Food & Drink Shop,Food,Fish Market,Fast Food Restaurant,
9,Bronx,Baychester,40.866858,-73.835798,Donut Shop,Men's Store,Supermarket,Pizza Place,Convenience Store,Discount Store,Other Great Outdoors,Sandwich Place,Electronics Store,Fast Food Restaurant,


Our goal now that we have the stacked dataframes is to find areas with the least amount of competition from other bars and pubs.  We can do this using a couple methods.  In this next step, we will look for areas that do not have 'Bar' or 'Pub' in their 1st-10th most common venues.

In [126]:
stack2 = stack[~stack['1st Most Common Venue'].str.contains('Bar|Pub|Sports Bar', na=False)]
stack3 = stack2[~stack2['2nd Most Common Venue'].str.contains('Bar|Pub|Sports Bar', na=False)]
stack4 = stack3[~stack3['3rd Most Common Venue'].str.contains('Bar|Pub|Sports Bar', na=False)]
stack5 = stack4[~stack4['4th Most Common Venue'].str.contains('Bar|Pub|Sports Bar', na=False)]
stack6 = stack5[~stack5['5th Most Common Venue'].str.contains('Bar|Pub|Sports Bar', na=False)]
stack7 = stack6[~stack6['6th Most Common Venue'].str.contains('Bar|Pub|Sports Bar', na=False)]
stack8 = stack7[~stack7['7th Most Common Venue'].str.contains('Bar|Pub|Sports Bar', na=False)]
stack9 = stack8[~stack8['8th Most Common Venue'].str.contains('Bar|Pub|Sports Bar', na=False)]
stack10 = stack9[~stack9['9th Most Common Venue'].str.contains('Bar|Pub|Sports Bar', na=False)]
Stack_NoBars = stack10[~stack10['10th Most Common Venue'].str.contains('Bar|Pub|Sports Bar', na=False)]
Stack_NoBars.reset_index(inplace = True)
Stack_NoBars.drop(columns =['index'], axis = 1, inplace = True)
Stack_NoBars

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,


Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster
0,Bronx,Wakefield,40.894705,-73.847201,Pharmacy,Donut Shop,Ice Cream Shop,Deli / Bodega,Dessert Shop,Laundromat,Sandwich Place,Gas Station,Women's Store,Fish Market,
1,Bronx,Co-op City,40.874294,-73.829939,Bus Station,Fast Food Restaurant,Accessories Store,Bagel Shop,Pharmacy,Post Office,Park,Discount Store,Restaurant,Pizza Place,
2,Bronx,Eastchester,40.887556,-73.827806,Caribbean Restaurant,Deli / Bodega,Diner,Cosmetics Shop,Metro Station,Convenience Store,Donut Shop,Seafood Restaurant,Bus Stop,Bus Station,
3,Bronx,Fieldston,40.895437,-73.905643,Music Venue,River,Bus Station,Plaza,Donut Shop,Food & Drink Shop,Food,Fish Market,Fast Food Restaurant,Farmers Market,
4,Bronx,Riverdale,40.890834,-73.912585,Bus Station,Park,Gym,Plaza,Baseball Field,Bank,Food Truck,Women's Store,Food,Fish Market,
5,Bronx,Norwood,40.877224,-73.879391,Pizza Place,Park,Chinese Restaurant,Bank,Pharmacy,Liquor Store,Deli / Bodega,Coffee Shop,Restaurant,Sandwich Place,
6,Bronx,Baychester,40.866858,-73.835798,Donut Shop,Men's Store,Supermarket,Pizza Place,Convenience Store,Discount Store,Other Great Outdoors,Sandwich Place,Electronics Store,Fast Food Restaurant,
7,Bronx,Pelham Parkway,40.857413,-73.854756,Italian Restaurant,Frozen Yogurt Shop,Deli / Bodega,Pizza Place,Chinese Restaurant,Sushi Restaurant,Home Service,Liquor Store,Gas Station,Metro Station,
8,Bronx,City Island,40.847247,-73.786488,Seafood Restaurant,Harbor / Marina,Thrift / Vintage Store,Deli / Bodega,Boat or Ferry,Italian Restaurant,Diner,Smoke Shop,Pizza Place,Music Venue,
9,Bronx,University Heights,40.855727,-73.910416,Pizza Place,Convenience Store,Bakery,Burrito Place,Pharmacy,Fried Chicken Joint,Sandwich Place,Optical Shop,Donut Shop,Supermarket,


Of the remaining 51 neighborhoods, let's check the number of bars and pubs of each using the venue data.

Let's start by counting the bars in each Bronx neighborhood:

In [127]:
BronxBars = Bronx_venues[['Neighborhood', 'Venue Category']]
BronxBars.head()

Unnamed: 0,Neighborhood,Venue Category
0,Wakefield,Dessert Shop
1,Wakefield,Ice Cream Shop
2,Wakefield,Pharmacy
3,Wakefield,Pharmacy
4,Wakefield,Gas Station


In [130]:
BronxBars2 = BronxBars[BronxBars['Venue Category'].str.contains('Bar|Pub|Sports Bar', na=False)]
BronxBars2

Unnamed: 0,Neighborhood,Venue Category
44,Eastchester,Juice Bar
66,Kingsbridge,Beer Bar
71,Kingsbridge,Pub
75,Kingsbridge,Bar
92,Kingsbridge,Bar
101,Kingsbridge,Sports Bar
108,Kingsbridge,Bar
120,Kingsbridge,Bar
130,Woodlawn,Pub
136,Woodlawn,Bar


In [131]:
# Getting rid of venues we dont need that were returned in the stt.contains function... (Hookah Bars, Juice Bars, Barber Shops):
BronxBars2 = BronxBars2.drop([44, 333, 556, 705, 818, 885, 905, 921, 947, 1010, 1080, 1207], axis = 0)

# Let's group by the neighborhood to count the number of bars in each neighborhood
BronxBars_Count = BronxBars2.groupby('Neighborhood').count()
BronxBars_Count


Unnamed: 0_level_0,Venue Category
Neighborhood,Unnamed: 1_level_1
Bedford Park,1
Belmont,2
City Island,1
Edgewater Park,3
Kingsbridge,7
Morris Park,2
Mount Eden,1
Pelham Bay,2
Schuylerville,1
Throgs Neck,2


Now let's do the same for Central Toronto

In [132]:
CentralToronto_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Lawrence Park,43.72802,-79.38879,Lawrence Park Ravine,43.726963,-79.394382,Park
1,Lawrence Park,43.72802,-79.38879,Zodiac Swim School,43.728532,-79.38286,Swim School
2,Lawrence Park,43.72802,-79.38879,TTC Bus #162 - Lawrence-Donway,43.728026,-79.382805,Bus Line
3,Davisville North,43.712751,-79.390197,Sherwood Park,43.716551,-79.387776,Park
4,Davisville North,43.712751,-79.390197,Summerhill Market North,43.715499,-79.392881,Food & Drink Shop


In [133]:
CentralTorontoBars = CentralToronto_venues[['Neighborhood', 'Venue Category']]
CentralTorontoBars.head()

Unnamed: 0,Neighborhood,Venue Category
0,Lawrence Park,Park
1,Lawrence Park,Swim School
2,Lawrence Park,Bus Line
3,Davisville North,Park
4,Davisville North,Food & Drink Shop


In [139]:
CentralTorontoBars = CentralTorontoBars[CentralTorontoBars['Venue Category'].str.contains('Bar|Pub|Sports Bar', na=False)]
CentralTorontoBars

Unnamed: 0,Neighborhood,Venue Category
13,"North Toronto West, Lawrence Park",Salon / Barbershop
74,"Summerhill West, Rathnelly, South Hill, Forest...",Sports Bar
75,"Summerhill West, Rathnelly, South Hill, Forest...",Pub
77,"Summerhill West, Rathnelly, South Hill, Forest...",Pub
101,"The Annex, North Midtown, Yorkville",Pub


In [140]:
# Getting rid of venues we dont need that were returned in the str.contains function... (Hookah Bars, Juice Bars, Barber Shops):
CentralTorontoBars = CentralTorontoBars.drop([13], axis = 0)

# Let's group by the neighborhood to count the number of bars in each neighborhood
CentralTorontoBars_Count = CentralTorontoBars.groupby('Neighborhood').count()
CentralTorontoBars_Count


Unnamed: 0_level_0,Venue Category
Neighborhood,Unnamed: 1_level_1
"Summerhill West, Rathnelly, South Hill, Forest Hill SE, Deer Park",3
"The Annex, North Midtown, Yorkville",1


In [141]:
# Stack the 2 dfs
stackedbars = pd.concat([BronxBars_Count, CentralTorontoBars_Count], axis=0).reset_index()
stackedbars = stackedbars.rename(columns ={'Venue Category': '# of Bars'})
stackedbars

Unnamed: 0,Neighborhood,# of Bars
0,Bedford Park,1
1,Belmont,2
2,City Island,1
3,Edgewater Park,3
4,Kingsbridge,7
5,Morris Park,2
6,Mount Eden,1
7,Pelham Bay,2
8,Schuylerville,1
9,Throgs Neck,2


All of these bars/pubs are potential competition to the bar the we are trying to open.  Therefore, we do not want to open any bars in these neighborhoods.  Let's take these neighborhoods away from our dataset with the remaining 51 neighborhoods.

Let's perform an exclusive left join to get the neighborhoods where there are no bars:

In [142]:
Remaining_Neighborhoods = pd.merge(Stack_NoBars, stackedbars, on=['Neighborhood','Neighborhood'], how="outer", indicator=True
              ).query('_merge=="left_only"').reset_index()
Remaining_Neighborhoods = Remaining_Neighborhoods.drop(columns=['index', '# of Bars', '_merge'])
Remaining_Neighborhoods.count()
# 45 Neighborhoods Remaining!

Remaining_Neighborhoods = Remaining_Neighborhoods[['Borough', 'Neighborhood', 'Latitude', 'Longitude']]
Bronx_Remaining = Remaining_Neighborhoods[(Remaining_Neighborhoods.Borough == 'Bronx')]
CentralToronto_Remaining = Remaining_Neighborhoods[(Remaining_Neighborhoods.Borough == 'Central Toronto')]
Bronx_Remaining.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


In [143]:
NYC_Demographics = NYC_Demographics[['Borough', 'NTA Name','NTA Code','Year', 'Population']]
NYC_Demographics2010 = NYC_Demographics[(NYC_Demographics.Year == 2010) & (NYC_Demographics.Borough == 'Bronx')]
NYC_Demographics2010 = NYC_Demographics2010.rename(columns = {'NTA Name': 'Neighborhood'})
NYC_Demographics2010 = NYC_Demographics2010[['Borough', 'Neighborhood', 'Population']].reset_index()
NYC_Demographics2010 = NYC_Demographics2010.drop(columns = ['index'])
NYC_Demographics2010.head()

Unnamed: 0,Borough,Neighborhood,Population
0,Bronx,Claremont-Bathgate,31078
1,Bronx,Eastchester-Edenwald-Baychester,34517
2,Bronx,Bedford Park-Fordham North,54415
3,Bronx,Belmont,27378
4,Bronx,Bronxdale,35538


Left Join the tables to see which neighborhoods we need to combine and which we do not:

In [144]:
Bronx_Join = pd.merge(Bronx_Remaining[['Borough', 'Neighborhood']], NYC_Demographics2010[['Population', 'Neighborhood']], on = 'Neighborhood', how ='left')
Bronx_Join.head()
# as we can see this does not work...

Unnamed: 0,Borough,Neighborhood,Population
0,Bronx,Wakefield,
1,Bronx,Co-op City,
2,Bronx,Eastchester,
3,Bronx,Fieldston,
4,Bronx,Riverdale,


Unfortunately, I am not an experienced data scientist and will have to manually fix this error that we have run into.
The error is that the demographics dataset has grouped some of the neighborhoods (most likely by location/proximity to one another).
This has resulted in our two dfs that we are working on not being matched for correct joins, BUT it is fixable!

We will now take these dataframes into excel for some manual cleaning.  This will involve:
    - Grouping Bronx_Remaining neighborhoods in the form that NYC_Demographics is in
    - Once we do this, we can perform a correct join and get the correct populations for the grouped neighborhoods!

In [145]:
Bronx_Remaining.to_csv('Bronx_Remaining.csv')
NYC_Demographics2010.to_csv('NYC_Demographics2010.csv')
print("DataFrames to csv success!")

DataFrames to csv success!


Finished cleaning the Bronx_Remaining df!  Let's bring it back and try another left join.

In [146]:
Bronx_RemainingCleaned = pd.read_csv('Bronx_Remaining(cleaned).csv')
Bronx_RemainingCleaned = Bronx_RemainingCleaned.drop(columns = ['Unnamed: 5', 'Index'])
Bronx_RemainingCleaned.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Allerton-Pelham Gardens,40.865788,-73.859319
1,Bronx,Eastchester-Edenwald-Baychester,40.866858,-73.835798
2,Bronx,Bronxdale,40.852723,-73.861726
3,Bronx,Co-Op City,40.874294,-73.829939
4,Bronx,Concourse-Concourse Village,40.834284,-73.915589


Fixed! Let's perform the left join:

In [154]:
BronxJoin = pd.merge(Bronx_RemainingCleaned[['Borough', 'Neighborhood']], 
                     NYC_Demographics2010[['Neighborhood', 'Population']],
                     on = 'Neighborhood', how = 'left')
BronxJoin=BronxJoin.drop(BronxJoin.index[4], axis = 0).reset_index()
BronxJoin = BronxJoin.drop(columns= ['index'])
BronxJoin.sort_values(by = ['Population'], ascending = False, inplace = True)
BronxJoin.head()

Unnamed: 0,Borough,Neighborhood,Population
11,Bronx,University Heights-Morris Heights,54188.0
12,Bronx,Mott Haven-Port Morris,52413.0
13,Bronx,Mount Hope,51807.0
3,Bronx,Co-Op City,43752.0
4,Bronx,East Tremont,43423.0


# Results

We will select the University Heights-Morris Heights of the Bronx as our neighborhood to create our new bar!  This neighborhood has been selected as there are no bars listed in this neighborhood from the fourquare api data, and it also has the highest population of the remaining neighborhoods.  These observations lead us to believe that there is a significant potential for profit in this area, thus, we will elect to establish our new, state of the art bar in University Heights-Morris Heights!

# Discussion

Although PO LLC. is generally satisfied with our decision to select University Heights-Morris Heights in the Bronx as the location for our new bar, there are some limitations and observations the firm has noticed in the selection process.  Primarily, it has become apparent how difficult it was for the firm to locate essential data on the internet.  For instance, the firm was stuck when they attempted to find socioeconomic data of the neighborhoods.  A dataset with socioeconomic data would have been more beneficial for the firms selection process, as they could have looked at significant factors, such as: mean income, median income, GDP of the neighborhoods, etc.

Given these limitations, it led PO LLC to make the important decision that they would have to settle for the remaining neighborhood with the largest population.  This made most sense to the firm, as generally, more people in an area would mean more money overall.  This is a controversial decision for the firm, but they believe this decision will pay off, and hope for success at their new bar.

# Conclusion

Although the decision to bring the hottest bar in NYC to University Heights-Morris Heights is a controversial one, PO LLC. is incredibly happy with their team of data scientists for finding the best possible spot to open their new bar. 

Thank you for reading this report, and we at PO LLC. hope to see you at our new bar soon!