## New York City Data: Restaurants with Operating Licenses

### Here, we import, cleanse and form a Pandas dataframe ('manhattan_restaurants) for the further analysis of the New York City data on stree cafes in Manhattan with operating licenses. We use the stree cafe data because the City has not published in readily available form any data regarding restaurants in general.  In so doing, we implicitly assume that areas in Manhattan that experience rapid development of restaurants of the type envisioned in this project also experience rapid development of sidewalk cafes (usually if not always as an integral part of the restaurant).


In [1]:
import pandas as pd
import numpy as np
import requests
from pandas.io.json import json_normalize

In [2]:
df=pd.read_csv('/users/richardkornblith/Data_Science/NYCHR/Data_for_NYCHR/mansc_lic_csv.csv')
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1633 entries, 0 to 1632
Data columns (total 18 columns):
License Type                     1633 non-null object
License Expiration Date          1633 non-null object
License Status                   1633 non-null object
License Creation Date            1633 non-null object
Industry                         1633 non-null object
Business Name                    1633 non-null object
Address Building                 1633 non-null object
Address Street Name              1633 non-null object
Secondary Address Street Name    16 non-null object
Address City                     1633 non-null object
Address ZIP                      1633 non-null int64
Address Borough                  1633 non-null object
Community Board                  1609 non-null float64
Council District                 1609 non-null float64
Census Tract                     1586 non-null float64
Longitude                        1626 non-null float64
Latitude                    

In [3]:
# We clean up the building numbers to facilitate using the USCB Geocode API
df.iloc[1599,6] = '54'
df.iloc[1631,6] = '83'
df.iloc[1632,6] = '176'


#### We need to obtain any missing census tracts in df.  For this, we will use the geocoding API provided by the USCB.  For this, we first isolate the instances in 'df' having tracts to be found into a new dataframe, 'tract_tbf'.  We preserve the original index numbers to facilitate finalization.

In [4]:
tract_tbf = df[df['Census Tract'].isnull()]
tract_tbf.reset_index(drop=False, inplace=True)
tract_tbf.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 47 entries, 0 to 46
Data columns (total 19 columns):
index                            47 non-null int64
License Type                     47 non-null object
License Expiration Date          47 non-null object
License Status                   47 non-null object
License Creation Date            47 non-null object
Industry                         47 non-null object
Business Name                    47 non-null object
Address Building                 47 non-null object
Address Street Name              47 non-null object
Secondary Address Street Name    1 non-null object
Address City                     47 non-null object
Address ZIP                      47 non-null int64
Address Borough                  47 non-null object
Community Board                  46 non-null float64
Council District                 46 non-null float64
Census Tract                     0 non-null float64
Longitude                        47 non-null float64
Latitude     


#### To find the missing census tracts, we use the USCB API for geocoding geographicals.  Documentation may be found at 'https://geocoding.geo.census.gov/geocoder/Geocoding_Services_API.pdf'.  For convenience, we will isolate the three columns of tract_tbf needed for this process.  We then iterate through slimmed_tbf to obtain the missing tracts and insert them into a cleansed dataframe 'manhattan_restaurants'.  


In [5]:
slimmed_tbf = tract_tbf[['index', 'Address Building', 'Address Street Name']]
slimmed_tbf.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 47 entries, 0 to 46
Data columns (total 3 columns):
index                  47 non-null int64
Address Building       47 non-null object
Address Street Name    47 non-null object
dtypes: int64(1), object(2)
memory usage: 1.2+ KB


In [14]:
# This iteration takes some time: be patient!

manhattan_restaurants = df
for i in range(len(slimmed_tbf)):
    index = slimmed_tbf.loc[i,'index']
    print(index)
    Street = slimmed_tbf.loc[i,'Address Building']+' '+slimmed_tbf.loc[i,'Address Street Name']
    City = 'New York'
    State = 'NY'
    url="https://geocoding.geo.census.gov/geocoder/geographies/address?street={}\
        &city={}&state={}&benchmark=Public_AR_Census2010&vintage=Census2010_Census2010&\
        layers=14&format=json".format(Street, City, State)
    tract = requests.get(url).json()
    geog = tract['result']['addressMatches'][0]['geographies']['Census Blocks']
    census_tr = json_normalize(geog)
    tract_found = float(census_tr['TRACT'])/100
    manhattan_restaurants.loc[index,['Census Tract']] = tract_found


1586
165.0
      Census Tract
1586         165.0
1587
52.0
      Census Tract
1587          52.0
1588
7.0
      Census Tract
1588           7.0
1589
18.0
      Census Tract
1589          18.0
1590
73.0
      Census Tract
1590          73.0
1591
37.0
      Census Tract
1591          37.0
1592
67.0
      Census Tract
1592          67.0
1593
81.0
      Census Tract
1593          81.0
1594
133.0
      Census Tract
1594         133.0
1595
245.0
      Census Tract
1595         245.0
1596
137.0
      Census Tract
1596         137.0
1597
67.0
      Census Tract
1597          67.0
1598
157.0
      Census Tract
1598         157.0
1599
79.0
      Census Tract
1599          79.0
1600
135.0
      Census Tract
1600         135.0
1601
71.0
      Census Tract
1601          71.0
1602
163.0
      Census Tract
1602         163.0
1603
41.0
      Census Tract
1603          41.0
1604
18.0
      Census Tract
1604          18.0
1605
138.0
      Census Tract
1605         138.0
1606
195.0
      Census Tract
160

In [15]:
print(manhattan_restaurants.info())    
manhattan_restaurants

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1633 entries, 0 to 1632
Data columns (total 18 columns):
License Type                     1633 non-null object
License Expiration Date          1633 non-null object
License Status                   1633 non-null object
License Creation Date            1633 non-null object
Industry                         1633 non-null object
Business Name                    1633 non-null object
Address Building                 1633 non-null object
Address Street Name              1633 non-null object
Secondary Address Street Name    16 non-null object
Address City                     1633 non-null object
Address ZIP                      1633 non-null int64
Address Borough                  1633 non-null object
Community Board                  1609 non-null float64
Council District                 1609 non-null float64
Census Tract                     1633 non-null float64
Longitude                        1626 non-null float64
Latitude                    

Unnamed: 0,License Type,License Expiration Date,License Status,License Creation Date,Industry,Business Name,Address Building,Address Street Name,Secondary Address Street Name,Address City,Address ZIP,Address Borough,Community Board,Council District,Census Tract,Longitude,Latitude,Location
0,Business,12/15/19,Active,2/17/15,Sidewalk Cafe,DISHFUL INC,189,E BROADWAY,,NEW YORK,10002,Manhattan,103.0,1.0,6.00,-73.988796,40.714170,"(40.71416977953143, -73.98879597867962)"
1,Business,12/15/19,Inactive,3/21/17,Sidewalk Cafe,"OBBM, LLC",88,BROAD ST,,NEW YORK,10004,Manhattan,101.0,1.0,9.00,-74.011588,40.704055,"(40.70405528945349, -74.01158823524035)"
2,Business,4/15/19,Active,5/3/13,Sidewalk Cafe,BROADWATER & PEARL ASSOCIATES LLC,54,PEARL ST,,NEW YORK,10004,Manhattan,101.0,1.0,9.00,-74.011375,40.703495,"(40.7034953786261, -74.01137534608392)"
3,Business,6/30/15,Inactive,4/3/08,Sidewalk Cafe,PEARLSTONE BURGER CORPORATION,77,PEARL ST,,NEW YORK,10004,Manhattan,101.0,1.0,9.00,-74.010297,40.703954,"(40.70395385590203, -74.01029703120159)"
4,Business,12/15/19,Active,5/29/13,Sidewalk Cafe,"BILL'S DTM NY, LLC",85,WEST ST,,NEW YORK,10006,Manhattan,101.0,1.0,13.00,-74.014875,40.709646,"(40.70964599574448, -74.01487516042008)"
5,Business,4/15/19,Active,5/1/09,Sidewalk Cafe,RECTOR STREET FOOD ENTERPRISES LTD.,11,RECTOR ST,,NEW YORK,10006,Manhattan,101.0,1.0,13.00,-74.013504,40.707931,"(40.70793068772063, -74.01350416281929)"
6,Business,4/15/19,Active,3/24/09,Sidewalk Cafe,CAFE CASANO LLC,38,WEST ST,,NEW YORK,10004,Manhattan,101.0,1.0,13.00,-74.015967,40.706907,"(40.70690657551593, -74.01596741732033)"
7,Business,12/15/19,Active,3/24/09,Sidewalk Cafe,OSTERIA CASANO LLC,28,WEST ST,,NEW YORK,10004,Manhattan,101.0,1.0,13.00,-74.015946,40.706961,"(40.706961473792255, -74.01594578949806)"
8,Business,4/15/20,Active,4/18/18,Sidewalk Cafe,ZVAH INC.,37,CANAL ST,,NEW YORK,10002,Manhattan,103.0,1.0,16.00,-73.991072,40.714659,"(40.71465854603709, -73.99107206822782)"
9,Business,9/15/19,Active,3/4/13,Sidewalk Cafe,"PLAN A GROUP, LLC",138,DIVISION ST,,NEW YORK,10002,Manhattan,103.0,1.0,16.00,-73.991577,40.714469,"(40.71446919557813, -73.99157710648002)"
