# Final Report
- The Battle of Neighborhoods

## 1. Intro.

The purpose of this project is to help people explore better facilities around their neighbors. It will help you make a wise and efficient decision to choose a good neighbor among many other areas in Toran Saturday's Scarborough.

Many people are moving to different states in Canada, and many studies were needed on good housing prices and famous schools for their children. This project is for people looking for a better neighbor. You can easily access cafes, schools, supermarkets, medical stores, grocery stores, shopping malls, theaters, hospitals, and people you like.

The goal of this project is to create character analysis for those who migrated to Scarborough to find their best neighbors through cross-regional comparative analysis. Features include better schools based on intermediate housing prices and ratings, crime rates in certain areas, road connections, weather conditions, good management of emergencies, water resources in fresh and wastewater and excrement from sewer and recreational facilities.

Before moving to a new city, state, country, or place, it will help people recognize the region and its neighbors and start a new life.

Scarborough is a great place for new immigrants in Canada. As a result, Toronto is one of the most diverse and multicultural areas in the metropolitan Toronto area, with various religious groups and chapels. Immigration has been a hot topic over the past few years as more governments seek more restrictions on immigrants and refugees, but the general trend of immigration to Canada is increasing.

## 2. Data Section

I organized two data sets that contains zipcode and neighborhoods data with latitude and longtude for each city from multiple online sources.

The project uses the Four-square API as a major data collection source with a location API that can perform location searches, location sharing, and business details in millions of databases.

It is mined if you use credentials for the Foursquare API feature in a nearby area. Due to the http request limit, the number of places per neighborhood parameter is reasonably set to 100 and the radius parameter is set to 500.

To compare the similarities between the two cities, we decided to explore, classify, and cluster neighborhoods to find similar areas in big cities such as New York and Toronto. This requires clustering of data that is a form of uncoordinated machine learning.

We use Scarborough dataset that we scrapped from wikipedia.
- https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M

For the foursquare API data, we need data on different locations in different areas of the borough. We will thus use "Foursquare" location information to obtain this information. Foursquare is a location data provider with information about all locations and events within your region of interest. This information includes the location name, location, menu, and pictures. Therefore, the foursquare location platform is the only data source because all the specified information can be obtained through the API.

Find a list of neighbors and then connect to the Foursquare API to gather information about the locations within each neighborhood will be our process. For each neighbor, we will choose a radius of 100 meters.

### 2-1. Downloading and exploring NY city geographical coordinates dataset
- https://geo.nyu.edu/catalog/nyu_2451_34572

In [3]:
import numpy as np 

import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json 

!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim 
import requests 
from pandas.io.json import json_normalize 

import matplotlib.cm as cm
import matplotlib.colors as colors

!conda install -c conda-forge folium=0.5.0 --yes 
import folium
import csv 

print('Libraries imported.')

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... 
  - anaconda/win-64::ca-certificates-2020.1.1-0, anaconda/win-64::certifi-2019.11.28-py37_0, anaconda/win-64::openssl-1.1.1d-he774522_4
  - anaconda/win-64::certifi-2019.11.28-py37_0, anaconda/win-64::openssl-1.1.1d-he774522_4, defaults/win-64::ca-certificates-2020.1.1-0
  - anaconda/win-64::ca-certificates-2020.1.1-0, anaconda/win-64::certifi-2019.11.28-py37_0, defaults/win-64::openssl-1.1.1d-he774522_4
  - anaconda/win-64::certifi-2019.11.28-py37_0, defaults/win-64::ca-certificates-2020.1.1-0, defaults/win-64::openssl-1.1.1d-he774522_4
  - anaconda/win-64::ca-certificates-2020.1.1-0, defaults/win-64::certifi-2019.11.28-py37_0, defaults/win-64::openssl-1.1.1d-he774522_4
  - defaults/win-64::ca-certificates-2020.1.1-0, defaults/win-64::certifi-2019.11.28-py37_0, defaults/win-64::openssl-1.1.1d-he774522_4
  - anaconda/win-64::ca-certificates-2020.1.1-0, anaconda/win-64::openssl-1.

In [5]:
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

In [6]:
neighborhoods_data = newyork_data['features']

neighborhoods_data[0]

{'type': 'Feature',
 'id': 'nyu_2451_34572.1',
 'geometry': {'type': 'Point',
  'coordinates': [-73.84720052054902, 40.89470517661]},
 'geometry_name': 'geom',
 'properties': {'name': 'Wakefield',
  'stacked': 1,
  'annoline1': 'Wakefield',
  'annoline2': None,
  'annoline3': None,
  'annoangle': 0.0,
  'borough': 'Bronx',
  'bbox': [-73.84720052054902,
   40.89470517661,
   -73.84720052054902,
   40.89470517661]}}

In [7]:
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

neighborhoods = pd.DataFrame(columns=column_names)
neighborhoods

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude


In [10]:
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)
    
neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


In [11]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(neighborhoods['Borough'].unique()),
        neighborhoods.shape[0]
    )
)

The dataframe has 5 boroughs and 612 neighborhoods.


In [12]:
neighborhoods.to_csv('BON1_NYC_GEO.csv',index=False)

address = 'New York City, NY'

geolocator = Nominatim(user_agent="Jupyter")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of New York City are 40.7127281, -74.0060152.


In [13]:
map_NewYork = folium.Map(location=[latitude, longitude], zoom_start=10)

for lat, lng, borough, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Borough'], neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_NewYork)  
    
map_NewYork

### 2-2. Web scrapping the population data of NYC
- https://en.wikipedia.org/wiki/New_York_City

In [15]:
import numpy as np 

import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json 
!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim 
import requests 
from pandas.io.json import json_normalize 
import matplotlib.cm as cm
import matplotlib.colors as colors
import matplotlib.pyplot as plt

from bs4 import BeautifulSoup 

import csv 

print('Libraries imported.')

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... 
  - anaconda/win-64::ca-certificates-2020.1.1-0, anaconda/win-64::openssl-1.1.1d-he774522_4
  - anaconda/win-64::openssl-1.1.1d-he774522_4, defaults/win-64::ca-certificates-2020.1.1-0
  - anaconda/win-64::ca-certificates-2020.1.1-0, defaults/win-64::openssl-1.1.1d-he774522_4
  - defaults/win-64::ca-certificates-2020.1.1-0, defaults/win-64::openssl-1.1.1d-he774522_4done

# All requested packages already installed.

Libraries imported.


In [16]:
website_url = requests.get('https://en.wikipedia.org/wiki/Demographics_of_New_York_City').text
soup = BeautifulSoup(website_url,'lxml')
table = soup.find('table',{'class':'wikitable sortable'})

headers = [header.text for header in table.find_all('th')]

table_rows = table.find_all('tr')        
rows = []
for row in table_rows:
   td = row.find_all('td')
   row = [row.text for row in td]
   rows.append(row)

with open('BON2_POPULATION1.csv', 'w') as f:
   writer = csv.writer(f)
   writer.writerow(headers)
   writer.writerows(row for row in rows if row)

In [22]:
Pop_data=pd.read_csv('BON2_POPULATION1.csv')
Pop_data.drop(Pop_data.columns[[7,8,9,10,11]], axis=1,inplace=True)
print('Data downloaded!')

Data downloaded!


In [45]:
Pop_data.columns = Pop_data.columns.str.replace(' ', '')
Pop_data.columns = Pop_data.columns.str.replace('\'','')
Pop_data.columns = Pop_data.columns.str.replace("/", '_')
Pop_data.rename(columns={'persons_sq_mi' : 'Borough', 'persons_sq_km' : 'County'}, inplace=True)
Pop_data

Unnamed: 0,Borough,County,Estimate_2017,GrossDomesticProduct\r\n,square_miles,square_km,Borough.1,squarekm,persons_sq.mi,persons_km2\r\n
0,The Bronx\r,\r Bronx\r,"1,418,207\r",42.695\r\n,"30,100\r",42.10\r,109.04\r,,,
1,Brooklyn\r,\r Kings\r,"2,559,903\r",91.559\r\n,"35,800\r",70.82\r,183.42\r,,,
2,Manhattan\r,\r New York\r,"1,628,706\r",600.244\r\n,"368,500\r",22.83\r,59.13\r,,,
3,Queens\r,\r Queens\r,"2,253,858\r",93.310\r\n,"41,400\r",108.53\r,281.09\r,,,
4,Staten Island\r,\r Richmond\r,"476,143\r",14.514\r\n,"30,500\r",58.37\r,151.18\r,,,
5,City of New York,8336817,842.343,101000,302.64,783.83,27547,,,
6,State of New York,19453561,1731.910,89000,47214,122284,412,,,
7,Sources:[14] and see individual borough articl...,,,,,,,,,


In [46]:
Pop_data.rename(columns = {"New York City's five boroughsvte\r\n" : 'Borough',
                   'Jurisdiction\r\n':'County',
                   'Population\r\n':'Estimate_2017', 
                   "Land area\r\n" : 'square_miles',
                    'Density\r\n':'square_km'}, inplace=True)
Pop_data

Unnamed: 0,Borough,County,Estimate_2017,GrossDomesticProduct\r\n,square_miles,square_km,Borough.1,squarekm,persons_sq.mi,persons_km2\r\n
0,The Bronx\r,\r Bronx\r,"1,418,207\r",42.695\r\n,"30,100\r",42.10\r,109.04\r,,,
1,Brooklyn\r,\r Kings\r,"2,559,903\r",91.559\r\n,"35,800\r",70.82\r,183.42\r,,,
2,Manhattan\r,\r New York\r,"1,628,706\r",600.244\r\n,"368,500\r",22.83\r,59.13\r,,,
3,Queens\r,\r Queens\r,"2,253,858\r",93.310\r\n,"41,400\r",108.53\r,281.09\r,,,
4,Staten Island\r,\r Richmond\r,"476,143\r",14.514\r\n,"30,500\r",58.37\r,151.18\r,,,
5,City of New York,8336817,842.343,101000,302.64,783.83,27547,,,
6,State of New York,19453561,1731.910,89000,47214,122284,412,,,
7,Sources:[14] and see individual borough articl...,,,,,,,,,


In [47]:
Pop_data['Borough'] = Pop_data['Borough'].replace(to_replace='\n', value='', regex=True)
Pop_data['County'] = Pop_data['County'].replace(to_replace='\n', value='', regex=True)
Pop_data['Estimate_2017'] = Pop_data['Estimate_2017'].replace(to_replace='\n', value='', regex=True)
Pop_data['square_miles'] = Pop_data['square_miles'].replace(to_replace='\n', value='', regex=True)
Pop_data['square_km'] = Pop_data['square_km'].replace(to_replace='\n', value='', regex=True)
Pop_data['persons_sq.mi'] = Pop_data['persons_sq.mi'].replace(to_replace='\n', value='', regex=True)
Pop_data

Unnamed: 0,Borough,County,Estimate_2017,GrossDomesticProduct\r\n,square_miles,square_km,Borough.1,squarekm,persons_sq.mi,persons_km2\r\n
0,The Bronx\r,\r Bronx\r,"1,418,207\r",42.695\r\n,"30,100\r",42.10\r,109.04\r,,,
1,Brooklyn\r,\r Kings\r,"2,559,903\r",91.559\r\n,"35,800\r",70.82\r,183.42\r,,,
2,Manhattan\r,\r New York\r,"1,628,706\r",600.244\r\n,"368,500\r",22.83\r,59.13\r,,,
3,Queens\r,\r Queens\r,"2,253,858\r",93.310\r\n,"41,400\r",108.53\r,281.09\r,,,
4,Staten Island\r,\r Richmond\r,"476,143\r",14.514\r\n,"30,500\r",58.37\r,151.18\r,,,
5,City of New York,8336817,842.343,101000,302.64,783.83,27547,,,
6,State of New York,19453561,1731.910,89000,47214,122284,412,,,
7,Sources:[14] and see individual borough articl...,,,,,,,,,


In [48]:
Pop_data.loc[5:,['square_miles','square_km']] = Pop_data.loc[2:,['square_miles','square_km']].shift(1,axis=1)
Pop_data.loc[5:,['Estimate_2017','square_miles']] = Pop_data.loc[2:,['Estimate_2017','square_miles']].shift(1,axis=1)
Pop_data.loc[5:,['County','Estimate_2017']] = Pop_data.loc[2:,['County','Estimate_2017']].shift(1,axis=1)
Pop_data.loc[5:,['Borough','County']] = Pop_data.loc[2:,['Borough','County']].shift(1,axis=1)
Pop_data

Unnamed: 0,Borough,County,Estimate_2017,GrossDomesticProduct\r\n,square_miles,square_km,Borough.1,squarekm,persons_sq.mi,persons_km2\r\n
0,The Bronx\r,\r Bronx\r,"1,418,207\r",42.695\r\n,"30,100\r",42.10\r,109.04\r,,,
1,Brooklyn\r,\r Kings\r,"2,559,903\r",91.559\r\n,"35,800\r",70.82\r,183.42\r,,,
2,Manhattan\r,\r New York\r,"1,628,706\r",600.244\r\n,"368,500\r",22.83\r,59.13\r,,,
3,Queens\r,\r Queens\r,"2,253,858\r",93.310\r\n,"41,400\r",108.53\r,281.09\r,,,
4,Staten Island\r,\r Richmond\r,"476,143\r",14.514\r\n,"30,500\r",58.37\r,151.18\r,,,
5,,27547,8336817,101000,842.343,302.64,City of New York,,,
6,,412,19453561,89000,1731.910,47214,State of New York,,,
7,,,,,,,Sources:[14] and see individual borough articl...,,,


In [49]:
Pop_data = Pop_data.fillna('')
Pop_data

Unnamed: 0,Borough,County,Estimate_2017,GrossDomesticProduct\r\n,square_miles,square_km,Borough.1,squarekm,persons_sq.mi,persons_km2\r\n
0,The Bronx\r,\r Bronx\r,"1,418,207\r",42.695\r\n,"30,100\r",42.10\r,109.04\r,,,
1,Brooklyn\r,\r Kings\r,"2,559,903\r",91.559\r\n,"35,800\r",70.82\r,183.42\r,,,
2,Manhattan\r,\r New York\r,"1,628,706\r",600.244\r\n,"368,500\r",22.83\r,59.13\r,,,
3,Queens\r,\r Queens\r,"2,253,858\r",93.310\r\n,"41,400\r",108.53\r,281.09\r,,,
4,Staten Island\r,\r Richmond\r,"476,143\r",14.514\r\n,"30,500\r",58.37\r,151.18\r,,,
5,,27547,8336817,101000,842.343,302.64,City of New York,,,
6,,412,19453561,89000,1731.910,47214,State of New York,,,
7,,,,,,,Sources:[14] and see individual borough articl...,,,


In [50]:
i = Pop_data[((Pop_data.County == 'Sources: [2] and see individual borough articles'))].index
Pop_data.drop(i)

Unnamed: 0,Borough,County,Estimate_2017,GrossDomesticProduct\r\n,square_miles,square_km,Borough.1,squarekm,persons_sq.mi,persons_km2\r\n
0,The Bronx\r,\r Bronx\r,"1,418,207\r",42.695\r\n,"30,100\r",42.10\r,109.04\r,,,
1,Brooklyn\r,\r Kings\r,"2,559,903\r",91.559\r\n,"35,800\r",70.82\r,183.42\r,,,
2,Manhattan\r,\r New York\r,"1,628,706\r",600.244\r\n,"368,500\r",22.83\r,59.13\r,,,
3,Queens\r,\r Queens\r,"2,253,858\r",93.310\r\n,"41,400\r",108.53\r,281.09\r,,,
4,Staten Island\r,\r Richmond\r,"476,143\r",14.514\r\n,"30,500\r",58.37\r,151.18\r,,,
5,,27547,8336817,101000,842.343,302.64,City of New York,,,
6,,412,19453561,89000,1731.910,47214,State of New York,,,
7,,,,,,,Sources:[14] and see individual borough articl...,,,


### 2-3. Segmenting and Clustering
- Neighborhoods - Brooklyn and Manhattan

In [51]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
import matplotlib.pyplot as plt

# import k-means from clustering stage
from sklearn.cluster import KMeans

from sklearn.metrics import silhouette_score

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

usage: conda-script.py [-h] [-V] command ...
conda-script.py: error: unrecognized arguments: # uncomment this line if you haven't completed the Foursquare API lab


Libraries imported.


usage: conda-script.py [-h] [-V] command ...
conda-script.py: error: unrecognized arguments: # uncomment this line if you haven't completed the Foursquare API lab


In [52]:
NYC_Geo=pd.read_csv('BON1_NYC_GEO.csv')
print('Data downloaded!')

Data downloaded!


In [53]:
NYC_Geo.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


In [55]:
NYC_Geo['Borough'].value_counts().to_frame()

Unnamed: 0,Borough
Queens,162
Brooklyn,140
Staten Island,126
Bronx,104
Manhattan,80


In [56]:
BM_Geo = NYC_Geo.loc[(NYC_Geo['Borough'] == 'Brooklyn')|(NYC_Geo['Borough'] == 'Manhattan')]
BM_Geo = BM_Geo.reset_index(drop=True)
BM_Geo.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Manhattan,Marble Hill,40.876551,-73.91066
1,Brooklyn,Bay Ridge,40.625801,-74.030621
2,Brooklyn,Bensonhurst,40.611009,-73.99518
3,Brooklyn,Sunset Park,40.645103,-74.010316
4,Brooklyn,Greenpoint,40.730201,-73.954241


In [57]:
import time
start_time = time.time()

address = 'New York City, NY'

geolocator = Nominatim(user_agent="Jupyter")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(latitude, longitude))

print("--- %s seconds ---" % round((time.time() - start_time), 2))

The geograpical coordinate of New York City are 40.7127281, -74.0060152.
--- 0.9 seconds ---


In [59]:
map_BM = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(BM_Geo['Latitude'], BM_Geo['Longitude'], BM_Geo['Borough'], BM_Geo['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_BM)  
    
map_BM

In [61]:
CLIENT_ID = 'OLN1BAQQBHO234LKFIU1ZNGV4Z3O3P1GS5KIMTNPJHLX1MKL' # your Foursquare ID
CLIENT_SECRET = 'VDM5CGGVSUOGKMY21ETO4J1UAJH5QJEALQCJAIWUF2DJXR2T' # your Foursquare Secret
VERSION = '20181218' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: OLN1BAQQBHO234LKFIU1ZNGV4Z3O3P1GS5KIMTNPJHLX1MKL
CLIENT_SECRET:VDM5CGGVSUOGKMY21ETO4J1UAJH5QJEALQCJAIWUF2DJXR2T


In [62]:
def getNearbyVenues(names, latitudes, longitudes, LIMIT=200, radius=1000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [64]:
BM_venues = getNearbyVenues(names=BM_Geo['Neighborhood'],
                                  latitudes=BM_Geo['Latitude'],
                                  longitudes=BM_Geo['Longitude'],
                                  LIMIT=200)

print('The "BM_venues" dataframe has {} venues and {} unique venue types.'.format(
      len(BM_venues['Venue Category']),
      len(BM_venues['Venue Category'].unique())))

BM_venues.to_csv('BM_venues.csv', sep=',', encoding='UTF8')
BM_venues.head()

Marble Hill
Bay Ridge
Bensonhurst
Sunset Park
Greenpoint
Gravesend
Brighton Beach
Sheepshead Bay
Manhattan Terrace
Flatbush
Crown Heights
East Flatbush
Kensington
Windsor Terrace
Prospect Heights
Brownsville
Williamsburg
Bushwick
Bedford Stuyvesant
Brooklyn Heights
Cobble Hill
Carroll Gardens
Red Hook
Gowanus
Fort Greene
Park Slope
Cypress Hills
East New York
Starrett City
Canarsie
Flatlands
Mill Island
Manhattan Beach
Coney Island
Bath Beach
Borough Park
Dyker Heights
Gerritsen Beach
Marine Park
Clinton Hill
Sea Gate
Downtown
Boerum Hill
Prospect Lefferts Gardens
Ocean Hill
City Line
Bergen Beach
Midwood
Prospect Park South
Georgetown
East Williamsburg
North Side
South Side
Ocean Parkway
Fort Hamilton
Chinatown
Washington Heights
Inwood
Hamilton Heights
Manhattanville
Central Harlem
East Harlem
Upper East Side
Yorkville
Lenox Hill
Roosevelt Island
Upper West Side
Lincoln Square
Clinton
Midtown
Murray Hill
Chelsea
Greenwich Village
East Village
Lower East Side
Tribeca
Little Italy
Soho

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Marble Hill,40.876551,-73.91066,Bikram Yoga,40.876844,-73.906204,Yoga Studio
1,Marble Hill,40.876551,-73.91066,Arturo's,40.874412,-73.910271,Pizza Place
2,Marble Hill,40.876551,-73.91066,Tibbett Diner,40.880404,-73.908937,Diner
3,Marble Hill,40.876551,-73.91066,Sam's Pizza,40.879435,-73.905859,Pizza Place
4,Marble Hill,40.876551,-73.91066,Starbucks,40.877531,-73.905582,Coffee Shop


In [65]:

colnames = ['Neighborhood', 'Neighborhood Latitude', 'Neighborhood Longitude', 'Venue', 'Venue Latitude', 'Venue Longitude', 'Venue Category']
BM_venues = pd.read_csv('BM_venues.csv', skiprows=1, names=colnames)
BM_venues.columns = BM_venues.columns.str.replace(' ', '')
BM_venues.head()

Unnamed: 0,Neighborhood,NeighborhoodLatitude,NeighborhoodLongitude,Venue,VenueLatitude,VenueLongitude,VenueCategory
0,Marble Hill,40.876551,-73.91066,Bikram Yoga,40.876844,-73.906204,Yoga Studio
1,Marble Hill,40.876551,-73.91066,Arturo's,40.874412,-73.910271,Pizza Place
2,Marble Hill,40.876551,-73.91066,Tibbett Diner,40.880404,-73.908937,Diner
3,Marble Hill,40.876551,-73.91066,Sam's Pizza,40.879435,-73.905859,Pizza Place
4,Marble Hill,40.876551,-73.91066,Starbucks,40.877531,-73.905582,Coffee Shop


In [66]:
def Venues_Map(Borough_name, Borough_neighborhoods):
    
    # Use geopy library to get the latitude and longitude values 
    geolocator = Nominatim(user_agent="Jupyter")
    Borough_location = geolocator.geocode(Borough_name) #'Brooklyn, NY'
    Borough_latitude = Borough_location.latitude
    Borough_longitude = Borough_location.longitude
    print('The geographical coordinates of "{}" are {}, {}.'.format(Borough_name, Borough_latitude, Borough_longitude))
    
    # To verify the number of Boroughs and Neighborhoods in the extracted data
    print('The "{}" dataframe has {} different venue types and {} neighborhoods.'.format(
          Borough_name,
          len(Borough_neighborhoods['VenueCategory'].unique()),
          len(Borough_neighborhoods['Neighborhood'].unique())))
    
    # create map of city using latitude and longitude values
    map_Borough = folium.Map(location=[Borough_latitude, Borough_longitude], zoom_start=10)

    # add markers to map
    for lat, lng, venue, category in zip(Borough_neighborhoods['VenueLatitude'], Borough_neighborhoods['VenueLongitude'], Borough_neighborhoods['Venue'], Borough_neighborhoods['VenueCategory']):
        label = '{}, {}'.format(category, venue)
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, lng],
            radius=0.1,
            popup=label,
            color='red',
            fill=True,
            fill_color='#FF0000',
            fill_opacity=0.3).add_to(map_Borough)  

    return map_Borough

In [67]:
Venues_Map('New York City, NY', BM_venues)

GeocoderUnavailable: HTTPSConnectionPool(host='nominatim.openstreetmap.org', port=443): Max retries exceeded with url: /search?q=New+York+City%2C+NY&format=json&limit=1 (Caused by ConnectTimeoutError(<urllib3.connection.VerifiedHTTPSConnection object at 0x0000015482DEE9C8>, 'Connection to nominatim.openstreetmap.org timed out. (connect timeout=1)'))

In [None]:
BM_venues.groupby('VenueCategory')['Venue'].count().sort_values(ascending=False)

In [None]:
BM_venues.groupby('Neighborhood').count()

In [None]:
print('There are {} uniques categories.'.format(len(BM_venues['VenueCategory'].unique())))

In [None]:
BM_onehot = pd.get_dummies(BM_venues[['VenueCategory']], prefix="", prefix_sep="")

#column lists before adding neighborhood
column_names = ['Neighborhood'] + list(BM_onehot.columns)

# add neighborhood column back to dataframe
BM_onehot['Neighborhood'] = BM_venues['Neighborhood'] 

# move neighborhood column to the first column
BM_onehot = BM_onehot[column_names]

BM_onehot.head()

In [None]:
estaurant_List = []
search = 'Restaurant'
for i in BM_onehot.columns :
    if search in i:
        restaurant_List.append(i)

In [None]:
col_name = []
col_name = ['Neighborhood'] + restaurant_List
BM_restaurant = BM_onehot[col_name]
BM_restaurant = BM_restaurant.iloc[:,1::]

In [None]:
BM_restaurant_grouped = BM_restaurant.groupby('Neighborhood').sum().reset_index()

In [None]:
BM_restaurant_grouped['Total'] = BM_restaurant_grouped .sum(axis=1)

## 3. Conclusion
As already anticipated, according to the data analysis above, market competition is continuing, especially in New York. As a restaurant owner, customers should consider different tastes and preferences from different cultural backgrounds.

Once again, data analysis has demonstrated our expectations for the market environment and venue conditions in New York. Data provides restaurant owners with advice on menus and locations.