# Capstone Project: Opening New Restaurant in NYC  
#### Applied Data Science Capstone

### Table of Contents  
1. [Background](#background)
* [Business Problem](#business_problem)
* [Target Audience](#audience)
* [Data Requirement & Sources](#data)
* [Data Collection & Preprocessing](#data_preprocess)
* [Modeling](#modeling)
* [Analysis](#analysis)
* [Conclusion](#conclusion)


### 1. Background <a name="background"> </a>

New York City (NYC) is the most populous city in the United States (US) with <b>over 8 million population</b> spread over five boroughs, Brooklyn, Queens, Manhattan, Bronx, and Staten Island. It is well known as global capital of finance, media, and immigrant. NYC is also a global leader in entertainment, fashion, tourism, technology, education, arts, sports, politics, research, and many more industries. Many of the World’s most visited tourist attractions are located in NYC. Times Square and Broadway Theater District are few of them. <b>More than 62 million tourists visited in 2017</b>. You can find communities representing whole world, in fact more than 800 languages are spoken in NYC.
![image.png](attachment:image.png)
<p>NYC is also the home of the World’s largest stock exchanges. Many of the World’s largest financial and media companies are based in NYC. Recently, new technology and biotechnology companies are growing very fast. In brief, NYC is a power house for every business sector you can imagine. Obviously, restaurant industry has huge opportunity in NYC. After all, the 8+ million New Yorkers and 62+ million tourists need to eat every single day with demands for every possible cuisine on the earth. There is no other better place to open a restaurant than NYC neighborhoods.</p>


### 2. Business Problem <a name="business_problem"> </a>

<p>Being the most populous city in US and one of most visited cities in the world, is it easy to open a new restaurant successfully in NYC? Even though market for restaurant business is huge in NYC, there are already plenty of restaurants in popular neighborhoods serving every kind of cuisine from every corners of the world. The process of finding the best place for the new business is not as trivial as it may seem. It requires careful and thorough research and decision to find best possible real estate to give the new eatery best chance to succeed. It requires time, domain expertise, and capital.</p>
<p>	So, the problem every new entrepreneur may face is <b>how to find the best location to open a new restaurant in NYC with ease, precision, and quickly so that the new business has best chance to succeed?</b></p>
<p>There is a better solution for the new restaurant entrepreneurs and current restaurant owners who want to expand their business. The solution is machine learning models such as clustering. The main objective of this capstone is to develop a k-means clustering model and visualize the clusters in the NYC map by using NYC neighborhood datasets along with Foursquare API to help the prospective restaurant owners to select best possible NYC neighborhood for their new business quickly with greater precision. So, <b>this project will help simplify the tedious process of opening new restaurant saving the precious time for the restaurant owner.</b></p>


### 3. Target Audience <a name="audience"></a>

### 4. Data Requirements & Sources <a name="data"></a>

#### Requirements:
<p>To build the machine learning models, first we need the datasets representative of the business problem. For this project, we are going to use following datasets:</p>  

*  New York City neighborhood dataset which contain list of neighborhood names and their geographic coordinates in five boroughs, Brooklyn, Bronx, Manhattan, Queens, and Staten Island. This dataset defines the scope of this project which is the neighborhoods in five boroughs of NYC. This dataset also provides the geographic coordinates, latitude and longitude in order to plot the locations of the neighborhoods on the NYC map.  

* Census Demographics at NYC Neighborhood Tabulation Area (NTA) dataset. This dataset contains 2010 census population of NYC neighborhoods and it will be used to plot the population density of the neighborhoods on the NYC map. This is obviously very helpful to make decision to start new restaurant. After all, restaurants are for feeding the hungry customers and larger the population more the hungry customers the restaurant is going to get.  

* NYC crime dataset based on NYPD complaint dataset. This dataset contains all valid felony, misdemeanor, and violation crimes reported the New York City Police Department in 2019 and the geographic location of the crimes. So, we will plot the crime locations on the NYC map. This will add an additional factor to determine whether the neighborhood is good for the new restaurant.

* Venue dataset returned by the Foursquare API based on the NYC neighborhoods dataset. This data will be used to perform clustering of the NYC neighborhoods based on the most popular venues and the restaurants in the neighborhoods. By clustering the neighborhoods based on the most popular venues and restaurants, it will help make decision if the neighborhood is good for food related venues or if the neighborhood is already too saturated for food related venues. 

#### Sources:  

* https://cocl.us/new_york_dataset contains 306 New York City neighborhood names with their geographic coordinates spread over five boroughs. The data will be retrieved using the python package requests as json file and will be converted into pandas dataframe with the borough and neighborhood names along with their latitude and longitude as the columns.

* https://data.ny.gov/resource/rnsn-acs2.json contains 2010 census population of NYC neighborhoods in the five boroughs. This data will also be retrieved using requests package and convert into pandas dataframe.

* https://data.cityofnewyork.us/api/views/93vf-i5bz/rows.json?accessType=DOWNLOAD contains area of NYC neighborhoods in all boroughs.

* https://data.cityofnewyork.us/api/views/5uac-w243/rows.json?accessType=DOWNLOAD contains the crimes reported the New York City Police Department in 2019. This data will also be retrieved and converted similarly as other datasets.

* https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={} is Foursquare API and it will be called for every NYC neighborhoods. 


### 4. Data Collection & Preprocessing <a name="data_preprocess"></a>

In [67]:
import pandas as pd
import numpy as np
import requests
import folium

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

In [None]:
#Haversine formula

#### New York City neighborhood dataset

In [2]:
nyc = requests.get('https://cocl.us/new_york_dataset').json()

In [3]:
nyc.keys()

dict_keys(['type', 'totalFeatures', 'features', 'crs', 'bbox'])

In [40]:
nyc['features'][0].keys()

dict_keys(['type', 'id', 'geometry', 'geometry_name', 'properties'])

In [45]:
nyc['features'][0]['properties']

{'name': 'Wakefield',
 'stacked': 1,
 'annoline1': 'Wakefield',
 'annoline2': None,
 'annoline3': None,
 'annoangle': 0.0,
 'borough': 'Bronx',
 'bbox': [-73.84720052054902,
  40.89470517661,
  -73.84720052054902,
  40.89470517661]}

In [3]:
nyc_df = nyc['features']
dict_list = []
for data in nyc_df:
    borough = data['properties']['borough'] 
    n_name = data['properties']['name']
        
    n_latlng = data['geometry']['coordinates']
    n_lat = n_latlng[1]
    n_lng = n_latlng[0]
    # for consistency, all column names will be lowercase.
    dict_list.append({'borough': borough,
                     'neighborhood': n_name,
                     'latitude': n_lat,
                     'longitude': n_lng})
nyc_df = pd.DataFrame(dict_list)

In [4]:
nyc_df.head()

Unnamed: 0,borough,latitude,longitude,neighborhood
0,Bronx,40.894705,-73.847201,Wakefield
1,Bronx,40.874294,-73.829939,Co-op City
2,Bronx,40.887556,-73.827806,Eastchester
3,Bronx,40.895437,-73.905643,Fieldston
4,Bronx,40.890834,-73.912585,Riverdale


In [5]:
print('Total Borough: {}  & Total neighborhoods: {}'.format(
        len(nyc_df.borough.unique()), len(nyc_df.neighborhood.unique())))

Total Borough: 5  & Total neighborhoods: 302


In [6]:
gp=nyc_df.groupby('neighborhood').count()
gp.loc[gp.borough == 2]

Unnamed: 0_level_0,borough,latitude,longitude
neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Bay Terrace,2,2,2
Chelsea,2,2,2
Murray Hill,2,2,2
Sunnyside,2,2,2


#### NYC Neighborhoods 2010 population

In [9]:
nyc_pop_json = requests.get('https://data.ny.gov/resource/rnsn-acs2.json').json()

In [10]:
nyc_pop = pd.io.json.json_normalize(nyc_pop_json)

In [11]:
nyc_pop.columns

Index(['geographic_area_2010_census_fips_county_code',
       'geographic_area_borough',
       'geographic_area_neighborhood_tabulation_area_nta_code',
       'geographic_area_neighborhood_tabulation_area_nta_name',
       'total_population_2000_number', 'total_population_2010_number',
       'total_population_change_2000_2010_number',
       'total_population_change_2000_2010_percent'],
      dtype='object')

In [46]:
nyc_pop_df = nyc_pop.iloc[:,[0,1,2,3,5]]
nyc_pop_df.columns = ['fips_code', 'borough','nta_code', 'neighborhood', 'population']
nyc_pop_df.head()

Unnamed: 0,fips_code,borough,nta_code,neighborhood,population
0,5,Bronx,BX01,Claremont-Bathgate,31078
1,5,Bronx,BX03,Eastchester-Edenwald-Baychester,34517
2,5,Bronx,BX05,Bedford Park-Fordham North,54415
3,5,Bronx,BX06,Belmont,27378
4,5,Bronx,BX07,Bronxdale,35538


In [49]:
len(nyc_pop_df.neighborhood.unique()), len(nyc_pop_df.nta_code.unique())

(196, 196)

In [95]:
nyc_pop_df.borough.unique()

array(['Bronx', 'Brooklyn', 'Manhattan', 'Queens', 'Staten Island',
       "*Neighborhood Tabulation Areas, or NTAs, are aggregations of census tracts that are subsets of New York City's 55 Public Use Microdata Areas (PUMAs). ",
       'Primarily due to these constraints, NTA boundaries and their associated names may not definitively represent neighborhoods.'],
      dtype=object)

In [77]:
nyc_pop_df.fips_code.unique()

array(['5', '47', '61', '81', '85', nan], dtype=object)

In [72]:
len(nyc_pop_df.loc[nyc_pop_df.borough == 'Manhattan'])

29

In [71]:
len(nyc_df.loc[nyc_df.borough == 'Manhattan'])

40

#### NYC Neighborhoods area

In [50]:
nyc_area_json = requests.get('https://data.cityofnewyork.us/api/views/93vf-i5bz/rows.json?accessType=DOWNLOAD').json()

In [58]:
rows = []
rows_num = len(nyc_area_json['data'])
for row in range(rows_num):
    rows.append(np.array(nyc_area_json['data'][row])[[8,12]])
    
nyc_area_df = pd.DataFrame(rows, columns = ['geom','nta_code'])

In [68]:
# nyc_area_json['data'][0]

In [120]:
len(nyc_pop_df.loc[nyc_pop_df.borough == 'Bronx'])

38

In [62]:
nyc_area_df.head()

Unnamed: 0,geom,nta_code
0,MULTIPOLYGON (((-73.94732672160586 40.62916656...,BK43
1,MULTIPOLYGON (((-73.94193078816201 40.70072523...,BK75
2,MULTIPOLYGON (((-73.89138023380268 40.86170058...,BX40
3,MULTIPOLYGON (((-73.9760493559142 40.631275905...,BK88
4,MULTIPOLYGON (((-73.90855790522774 40.65209593...,BK96


In [63]:
len(nyc_area_df), len(nyc_area_df.nta_code.unique())

(195, 195)

In [64]:
nyc_area_df.iloc[0,0].split('(((')[1].split(',')[0]

'-73.94732672160586 40.62916656720947'

In [65]:
nyc_pop = pd.merge(nyc_pop_df, nyc_area_df, on = 'nta_code')

In [66]:
nyc_pop.head()

Unnamed: 0,fips_code,borough,nta_code,neighborhood,population,geom
0,5,Bronx,BX01,Claremont-Bathgate,31078,MULTIPOLYGON (((-73.89038954009592 40.85468905...
1,5,Bronx,BX03,Eastchester-Edenwald-Baychester,34517,MULTIPOLYGON (((-73.79322870948383 40.88282259...
2,5,Bronx,BX05,Bedford Park-Fordham North,54415,MULTIPOLYGON (((-73.88362518063384 40.86725758...
3,5,Bronx,BX06,Belmont,27378,MULTIPOLYGON (((-73.8830938237215 40.866602185...
4,5,Bronx,BX07,Bronxdale,35538,MULTIPOLYGON (((-73.86137924069509 40.87133651...


In [69]:
lat = nyc_df.loc[0, 'latitude']
lng = nyc_df.loc[0, 'longitude']

In [70]:
# create map of New York using latitude and longitude values
map_newyork = folium.Map(location=[lat, lng], zoom_start=10)

# add markers to map
for lat, lng, b, n in zip(nyc_df['latitude'], nyc_df['longitude'], 
                                           nyc_df['borough'], nyc_df['neighborhood']):
    label = '{}, {}'.format(n, b)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
map_newyork

#### NYC Crime data reported to NYPD in 2019

In [166]:
crime_json = requests.get('https://data.cityofnewyork.us/api/views/5uac-w243/rows.json?accessType=DOWNLOAD').json()

In [236]:
[d['name'] for d in crime_json['meta']['view']['columns'][8:]]

['CMPLNT_NUM',
 'ADDR_PCT_CD',
 'BORO_NM',
 'CMPLNT_FR_DT',
 'CMPLNT_FR_TM',
 'CMPLNT_TO_DT',
 'CMPLNT_TO_TM',
 'CRM_ATPT_CPTD_CD',
 'HADEVELOPT',
 'HOUSING_PSA',
 'JURISDICTION_CODE',
 'JURIS_DESC',
 'KY_CD',
 'LAW_CAT_CD',
 'LOC_OF_OCCUR_DESC',
 'OFNS_DESC',
 'PARKS_NM',
 'PATROL_BORO',
 'PD_CD',
 'PD_DESC',
 'PREM_TYP_DESC',
 'RPT_DT',
 'STATION_NAME',
 'SUSP_AGE_GROUP',
 'SUSP_RACE',
 'SUSP_SEX',
 'TRANSIT_DISTRICT',
 'VIC_AGE_GROUP',
 'VIC_RACE',
 'VIC_SEX',
 'X_COORD_CD',
 'Y_COORD_CD',
 'Latitude',
 'Longitude',
 'Lat_Lon',
 'Zip Codes',
 'Community Districts',
 'Borough Boundaries',
 'City Council Districts',
 'Police Precincts']

In [227]:
rows = []
crime_num = len(crime_json['data'])
for row in range(crime_num):
    rows.append(crime_json['data'][row][8:])
crime_df = pd.DataFrame(rows)
crime_df = crime_df.iloc[:,[0, 2,13, 15,32,33,35]]
crime_df.columns = ['crime_id','borough', 'crime_level', 'crime_desc', 'latitue', 'longitude', 'zip_code']

In [228]:
crime_df.head()

Unnamed: 0,crime_id,borough,crime_level,crime_desc,latitue,longitude,zip_code
0,314773184,BRONX,FELONY,ROBBERY,40.83802626900008,-73.88168118799997,11269
1,289837961,MANHATTAN,MISDEMEANOR,PETIT LARCENY,40.800334261000046,-73.94565697199994,13093
2,535744284,BROOKLYN,FELONY,FELONY ASSAULT,40.66983179600004,-73.93937555099996,17615
3,895678119,BRONX,MISDEMEANOR,PETIT LARCENY,40.87367103500002,-73.90801364899994,11272
4,299841674,MANHATTAN,MISDEMEANOR,PETIT LARCENY,40.76093528000007,-73.99452906599998,13094


#### Foursquare API for venues data

In [241]:
CLIENT_ID = 'FOB3MKIYUMLEZAF00B0YWFHKKZPGE2FT1L4GO1RPL4HD3NRS' # your Foursquare ID
CLIENT_SECRET = 'W3VRIBWFXKJDPR1PXLC1TQEJLT3VRKK2MJE15PLDI3M5E3KE' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius


In [239]:
lat = nyc_df.loc[0, 'latitude'] # neighborhood latitude value
lng = nyc_df.loc[0, 'longitude'] # neighborhood longitude value

In [242]:
# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    lat, 
    lng, 
    radius, 
    LIMIT)
url # display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=FOB3MKIYUMLEZAF00B0YWFHKKZPGE2FT1L4GO1RPL4HD3NRS&client_secret=W3VRIBWFXKJDPR1PXLC1TQEJLT3VRKK2MJE15PLDI3M5E3KE&v=20180605&ll=40.89470517661,-73.84720052054902&radius=500&limit=100'

In [245]:
results = requests.get(url).json()
results['response'].keys()

dict_keys(['suggestedFilters', 'headerLocation', 'headerFullLocation', 'headerLocationGranularity', 'totalResults', 'suggestedBounds', 'groups'])

In [43]:
for k in results['response'].keys():
    print(k, '---> ', results['response'][k])
    print('****************')

suggestedFilters --->  {'header': 'Tap to show:', 'filters': [{'name': 'Open now', 'key': 'openNow'}]}
****************
headerLocation --->  Wakefield
****************
headerFullLocation --->  Wakefield, Bronx
****************
headerLocationGranularity --->  neighborhood
****************
totalResults --->  10
****************
suggestedBounds --->  {'ne': {'lat': 40.899205181110005, 'lng': -73.84125857127495}, 'sw': {'lat': 40.89020517211, 'lng': -73.8531424698231}}
****************
groups --->  [{'type': 'Recommended Places', 'name': 'recommended', 'items': [{'reasons': {'count': 0, 'items': [{'summary': 'This spot is popular', 'type': 'general', 'reasonName': 'globalInteractionReason'}]}, 'venue': {'id': '4c537892fd2ea593cb077a28', 'name': 'Lollipops Gelato', 'location': {'address': '4120 Baychester Ave', 'crossStreet': 'Edenwald & Bussing Ave', 'lat': 40.894123150205274, 'lng': -73.84589162362325, 'labeledLatLngs': [{'label': 'display', 'lat': 40.894123150205274, 'lng': -73.845891623

In [247]:
len(results['response']['groups'])

1

In [246]:
results['response']['groups'][0].keys()

dict_keys(['type', 'name', 'items'])

In [63]:
len(results['response']['groups'][0]["type"])

18

In [65]:
results['response']['groups'][0]["name"]

'recommended'

In [67]:
results['response']['groups'][0]["type"]

'Recommended Places'

In [55]:
results['response']['groups'][0]["items"][0].keys()

dict_keys(['reasons', 'venue', 'referralId'])

In [248]:
len(results['response']['groups'][0]["items"])

10

In [56]:
results['response']['groups'][0]["items"][0]['reasons']

{'count': 0,
 'items': [{'summary': 'This spot is popular',
   'type': 'general',
   'reasonName': 'globalInteractionReason'}]}

In [59]:
results['response']['groups'][0]["items"][0]['venue'].keys()

dict_keys(['id', 'name', 'location', 'categories', 'photos'])

In [60]:
for k in results['response']['groups'][0]["items"][0]['venue'].keys():
    print(k, '=== ', results['response']['groups'][0]["items"][0]['venue'][k])

id ===  4c537892fd2ea593cb077a28
name ===  Lollipops Gelato
location ===  {'address': '4120 Baychester Ave', 'crossStreet': 'Edenwald & Bussing Ave', 'lat': 40.894123150205274, 'lng': -73.84589162362325, 'labeledLatLngs': [{'label': 'display', 'lat': 40.894123150205274, 'lng': -73.84589162362325}], 'distance': 127, 'postalCode': '10466', 'cc': 'US', 'city': 'Bronx', 'state': 'NY', 'country': 'United States', 'formattedAddress': ['4120 Baychester Ave (Edenwald & Bussing Ave)', 'Bronx, NY 10466', 'United States']}
categories ===  [{'id': '4bf58dd8d48988d1d0941735', 'name': 'Dessert Shop', 'pluralName': 'Dessert Shops', 'shortName': 'Desserts', 'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/dessert_', 'suffix': '.png'}, 'primary': True}]
photos ===  {'count': 0, 'groups': []}


In [None]:
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url # display URL

# NYC NTA

In [128]:
nta = requests.get('https://data.cityofnewyork.us/api/views/93vf-i5bz/rows.json?accessType=DOWNLOAD').json()

In [129]:
nta.keys()


dict_keys(['meta', 'data'])

In [15]:
rows = []
rows_num = len(nta['data'])
for row in range(rows_num):
    rows.append(nta['data'][row][8:])
nta_area = pd.DataFrame(rows, columns = ['geom', 'BoroCode', 'Borough', 'CountyFIPS', 'NTACode', 'Neighborhood',
       'Shape_Leng', 'Shape_Area'])

In [131]:
nta['data'][0]

['row-fdbg-rsfm-b28b',
 '00000000-0000-0000-D9DD-2BE071F8978C',
 0,
 1450733148,
 None,
 1450733148,
 None,
 '{ }',
 'MULTIPOLYGON (((-73.94732672160586 40.62916656720947, -73.94687439946745 40.626773668128536, -73.94642266064325 40.62438556728645, -73.94542442897415 40.62449560959823, -73.94497031348955 40.62210451293764, -73.94451833370505 40.61971364709508, -73.94551663574792 40.61960561195174, -73.94651373193835 40.61949401914997, -73.94606699612437 40.61710373918019, -73.945763594919 40.61550406422126, -73.94573691912035 40.61536340675665, -73.94657366828146 40.614789131137535, -73.94742608780814 40.61422460055368, -73.94772682569703 40.61416971618289, -73.94834517370126 40.614136496631644, -73.94930527478502 40.61403073451604, -73.95026514882969 40.61392435179741, -73.94999598103782 40.612491461663254, -73.94996296695332 40.612315727849385, -73.94993049319113 40.6121446508932, -73.94978380450684 40.611371750477694, -73.95070512926125 40.61125045222408, -73.95163235049947 40.61112

In [16]:
nta_area.head()

Unnamed: 0,geom,BoroCode,Borough,CountyFIPS,NTACode,Neighborhood,Shape_Leng,Shape_Area
0,MULTIPOLYGON (((-73.94732672160586 40.62916656...,3,Brooklyn,47,BK43,Midwood,27996.5912736,35799637.8103
1,MULTIPOLYGON (((-73.94193078816201 40.70072523...,3,Brooklyn,47,BK75,Bedford,29992.9191744,32629833.1149
2,MULTIPOLYGON (((-73.89138023380268 40.86170058...,2,Bronx,5,BX40,Fordham South,15878.2729212,6307283.62202
3,MULTIPOLYGON (((-73.9760493559142 40.631275905...,3,Brooklyn,47,BK88,Borough Park,39247.227722,54005019.2286
4,MULTIPOLYGON (((-73.90855790522774 40.65209593...,3,Brooklyn,47,BK96,Rugby-Remsen Village,30957.8533949,32706946.4035


In [132]:
dff = pd.read_csv('nynta.csv')

FileNotFoundError: [Errno 2] File b'nynta.csv' does not exist: b'nynta.csv'

In [53]:
len(dff), len(nta_area)

(195, 195)

In [58]:
dff.head()


Unnamed: 0,the_geom,BoroCode,BoroName,CountyFIPS,NTACode,NTAName,Shape_Leng,Shape_Area
0,MULTIPOLYGON (((-73.94732672160586 40.62916656...,3,Brooklyn,47,BK43,Midwood,27996.5912736,35799637.8103
1,MULTIPOLYGON (((-73.94193078816201 40.70072523...,3,Brooklyn,47,BK75,Bedford,29992.9191744,32629833.1149
2,MULTIPOLYGON (((-73.89138023380268 40.86170058...,2,Bronx,5,BX40,Fordham South,15878.2729212,6307283.62202
3,MULTIPOLYGON (((-73.9760493559142 40.631275905...,3,Brooklyn,47,BK88,Borough Park,39247.227722,54005019.2286
4,MULTIPOLYGON (((-73.90855790522774 40.65209593...,3,Brooklyn,47,BK96,Rugby-Remsen Village,30957.8533949,32706946.4035


In [88]:
dff[dff['CountyFIPS']== 25]

Unnamed: 0,the_geom,BoroCode,BoroName,CountyFIPS,NTACode,NTAName,Shape_Leng,Shape_Area


# Crime Dataset

In [2]:
crime = requests.get('https://data.cityofnewyork.us/api/views/5uac-w243/rows.json?accessType=DOWNLOAD').json()

In [4]:
crime['data']

[['row-jqjg_evtp-tn4b',
  '00000000-0000-0000-0620-6B6BF1A1C9F4',
  0,
  1579297663,
  None,
  1579297833,
  None,
  '{ }',
  '314773184',
  '48',
  'BRONX',
  '2019-12-31T00:00:00',
  '18:00:00',
  None,
  None,
  'COMPLETED',
  None,
  None,
  '0.0',
  'N.Y. POLICE DEPT',
  '105',
  'FELONY',
  None,
  'ROBBERY',
  None,
  'PATROL BORO BRONX',
  '386.0',
  'ROBBERY,PERSONAL ELECTRONIC DEVICE',
  'STREET',
  '2019-12-31T00:00:00',
  None,
  'UNKNOWN',
  'UNKNOWN',
  'U',
  None,
  '45-64',
  'WHITE HISPANIC',
  'F',
  '1016990',
  '244612',
  '40.838026269000075',
  '-73.88168118799997',
  [None, '40.838026269000075', '-73.88168118799997', None, False],
  '11269',
  '35',
  '5',
  '43',
  '31'],
 ['row-tedt.nvuf~r8u3',
  '00000000-0000-0000-9756-1909A41F0A3A',
  0,
  1579297663,
  None,
  1579297833,
  None,
  '{ }',
  '289837961',
  '25',
  'MANHATTAN',
  '2019-12-30T00:00:00',
  '20:30:00',
  '2019-12-31T00:00:00',
  '10:00:00',
  'COMPLETED',
  None,
  None,
  '0.0',
  'N.Y. POLICE

In [78]:
rows = []
crime_num = len(crime['data'])
for row in range(crime_num):
    rows.append(crime['data'][row][8:-6])
crime_df = pd.DataFrame(rows)

In [89]:
crime_df.head()

Unnamed: 0,0,2,13,15,32,33
0,314773184,BRONX,FELONY,ROBBERY,40.83802626900008,-73.88168118799997
1,289837961,MANHATTAN,MISDEMEANOR,PETIT LARCENY,40.800334261000046,-73.94565697199994
2,535744284,BROOKLYN,FELONY,FELONY ASSAULT,40.66983179600004,-73.93937555099996
3,895678119,BRONX,MISDEMEANOR,PETIT LARCENY,40.87367103500002,-73.90801364899994
4,299841674,MANHATTAN,MISDEMEANOR,PETIT LARCENY,40.76093528000007,-73.99452906599998


In [80]:
crime_df = crime_df.iloc[:,[0, 2,13, 15,32,33]]

In [82]:
crime_df.head(), len(crime_df)

(          0          2            13              15                  32  \
 0  314773184      BRONX       FELONY         ROBBERY  40.838026269000075   
 1  289837961  MANHATTAN  MISDEMEANOR   PETIT LARCENY  40.800334261000046   
 2  535744284   BROOKLYN       FELONY  FELONY ASSAULT   40.66983179600004   
 3  895678119      BRONX  MISDEMEANOR   PETIT LARCENY   40.87367103500002   
 4  299841674  MANHATTAN  MISDEMEANOR   PETIT LARCENY   40.76093528000007   
 
                    33  
 0  -73.88168118799997  
 1  -73.94565697199994  
 2  -73.93937555099996  
 3  -73.90801364899994  
 4  -73.99452906599998  , 461711)

In [84]:
crime['data'][0]

['row-jqjg_evtp-tn4b',
 '00000000-0000-0000-0620-6B6BF1A1C9F4',
 0,
 1579297663,
 None,
 1579297833,
 None,
 '{ }',
 '314773184',
 '48',
 'BRONX',
 '2019-12-31T00:00:00',
 '18:00:00',
 None,
 None,
 'COMPLETED',
 None,
 None,
 '0.0',
 'N.Y. POLICE DEPT',
 '105',
 'FELONY',
 None,
 'ROBBERY',
 None,
 'PATROL BORO BRONX',
 '386.0',
 'ROBBERY,PERSONAL ELECTRONIC DEVICE',
 'STREET',
 '2019-12-31T00:00:00',
 None,
 'UNKNOWN',
 'UNKNOWN',
 'U',
 None,
 '45-64',
 'WHITE HISPANIC',
 'F',
 '1016990',
 '244612',
 '40.838026269000075',
 '-73.88168118799997',
 [None, '40.838026269000075', '-73.88168118799997', None, False],
 '11269',
 '35',
 '5',
 '43',
 '31']