## New York City Data: Restaurants with Operating Licenses

### Here, we import, cleanse and form a Pandas dataframe ('manhattan_rstrs') for the further analysis of the New York City data on stree cafes in Manhattan with operating licenses. We use the stree cafe data because the City has not published in readily available form any data regarding restaurants in general.  In so doing, we implicitly assume that areas in Manhattan that experience rapid development of restaurants of the type envisioned in this project also experience rapid development of sidewalk cafes (usually if not always as an integral part of the restaurant).


In [4]:
import pandas as pd
import numpy as np
import requests
from pandas.io.json import json_normalize

In [5]:
df=pd.read_csv('/users/richardkornblith/Data_Science/NYCHR/Data_for_NYCHR/mansc_lic_csv.csv')
print(df.info())
df.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1633 entries, 0 to 1632
Data columns (total 18 columns):
License Type                     1633 non-null object
License Expiration Date          1633 non-null object
License Status                   1633 non-null object
License Creation Date            1633 non-null object
Industry                         1633 non-null object
Business Name                    1633 non-null object
Address Building                 1633 non-null object
Address Street Name              1633 non-null object
Secondary Address Street Name    16 non-null object
Address City                     1633 non-null object
Address ZIP                      1633 non-null int64
Address Borough                  1633 non-null object
Community Board                  1609 non-null float64
Council District                 1609 non-null float64
Census Tract                     1586 non-null float64
Longitude                        1626 non-null float64
Latitude                    

Unnamed: 0,License Type,License Expiration Date,License Status,License Creation Date,Industry,Business Name,Address Building,Address Street Name,Secondary Address Street Name,Address City,Address ZIP,Address Borough,Community Board,Council District,Census Tract,Longitude,Latitude,Location
0,Business,12/15/19,Active,2/17/15,Sidewalk Cafe,DISHFUL INC,189,E BROADWAY,,NEW YORK,10002,Manhattan,103.0,1.0,6.0,-73.988796,40.71417,"(40.71416977953143, -73.98879597867962)"
1,Business,12/15/19,Inactive,3/21/17,Sidewalk Cafe,"OBBM, LLC",88,BROAD ST,,NEW YORK,10004,Manhattan,101.0,1.0,9.0,-74.011588,40.704055,"(40.70405528945349, -74.01158823524035)"
2,Business,4/15/19,Active,5/3/13,Sidewalk Cafe,BROADWATER & PEARL ASSOCIATES LLC,54,PEARL ST,,NEW YORK,10004,Manhattan,101.0,1.0,9.0,-74.011375,40.703495,"(40.7034953786261, -74.01137534608392)"
3,Business,6/30/15,Inactive,4/3/08,Sidewalk Cafe,PEARLSTONE BURGER CORPORATION,77,PEARL ST,,NEW YORK,10004,Manhattan,101.0,1.0,9.0,-74.010297,40.703954,"(40.70395385590203, -74.01029703120159)"
4,Business,12/15/19,Active,5/29/13,Sidewalk Cafe,"BILL'S DTM NY, LLC",85,WEST ST,,NEW YORK,10006,Manhattan,101.0,1.0,13.0,-74.014875,40.709646,"(40.70964599574448, -74.01487516042008)"


In [6]:
# We clean up the building numbers to facilitate using the USCB Geocode API
df.iloc[1599,6] = '54'
df.iloc[1631,6] = '83'
df.iloc[1632,6] = '176'



#### We need to obtain any missing census tracts in df.  For this, we will use the geocoding API provided by the USCB.  For this, we first isolate the instances in 'df' having tracts to be found into a new dataframe, 'tract_tbf'.  We preserve the original index numbers to facilitate finalization.

In [7]:
tract_tbf = df[df['Census Tract'].isnull()]
tract_tbf.reset_index(drop=False, inplace=True)
print(tract_tbf.info())
tract_tbf.head()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 47 entries, 0 to 46
Data columns (total 19 columns):
index                            47 non-null int64
License Type                     47 non-null object
License Expiration Date          47 non-null object
License Status                   47 non-null object
License Creation Date            47 non-null object
Industry                         47 non-null object
Business Name                    47 non-null object
Address Building                 47 non-null object
Address Street Name              47 non-null object
Secondary Address Street Name    1 non-null object
Address City                     47 non-null object
Address ZIP                      47 non-null int64
Address Borough                  47 non-null object
Community Board                  46 non-null float64
Council District                 46 non-null float64
Census Tract                     0 non-null float64
Longitude                        47 non-null float64
Latitude     

Unnamed: 0,index,License Type,License Expiration Date,License Status,License Creation Date,Industry,Business Name,Address Building,Address Street Name,Secondary Address Street Name,Address City,Address ZIP,Address Borough,Community Board,Council District,Census Tract,Longitude,Latitude,Location
0,1586,Business,12/15/21,Inactive,2/21/19,Sidewalk Cafe,"FRANK MAC'S PLACE, LLC",425,AMSTERDAM AVE,,NEW YORK,10024,Manhattan,107.0,6.0,,-73.977651,40.784055,"(40.78405492776559, -73.977651491007)"
1,1587,Business,12/15/21,Inactive,2/21/19,Sidewalk Cafe,Thessabul LLC,250,PARK AVE S,,NEW YORK,10003,Manhattan,105.0,2.0,,-73.987818,40.73823,"(40.73823021846895, -73.98781763204387)"
2,1588,Business,4/15/21,Inactive,2/21/19,Sidewalk Cafe,WB CAFE INC.,134,W BROADWAY,,NEW YORK,10013,Manhattan,101.0,1.0,,-74.008254,40.716767,"(40.716766572576006, -74.00825363796221)"
3,1589,Business,9/15/21,Inactive,2/21/19,Sidewalk Cafe,"BANTER NOLITA, LLC",65,RIVINGTON ST,,NEW YORK,10002,Manhattan,103.0,1.0,,-73.990151,40.720606,"(40.720606365884876, -73.99015134185616)"
4,1590,Business,12/15/21,Inactive,2/6/19,Sidewalk Cafe,HUDSON & CHARLES DINETTE INC,522,HUDSON ST,,NEW YORK,10014,Manhattan,102.0,3.0,,-74.006257,40.733908,"(40.73390770672981, -74.00625677004092)"



#### To find the missing census tracts, we use the USCB API for geocoding geographicals.  Documentation may be found at 'https://geocoding.geo.census.gov/geocoder/Geocoding_Services_API.pdf'.  For convenience, we will isolate the three columns of tract_tbf needed for this process.  We then iterate through slimmed_tbf to obtain the missing tracts and insert them into a cleansed dataframe 'manhattan_restaurants'. 

N.B.:  On occasion and on a random basis, the USCB geocoder API has failed to return a result.  In that case you may try using the lat/lng rather than street address; or do multiple runs of the module, using try/except/else but each time saving the additional results that were obtained and thus reducing the instances in tract_tbf, until all census tracts have been obtained. 


In [8]:
slimmed_tbf = tract_tbf[['index', 'Address Building', 'Address Street Name']]
slimmed_tbf.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 47 entries, 0 to 46
Data columns (total 3 columns):
index                  47 non-null int64
Address Building       47 non-null object
Address Street Name    47 non-null object
dtypes: int64(1), object(2)
memory usage: 1.2+ KB


In [9]:
# # This iteration takes some time: be patient!

# manhattan_restaurants = df
# for i in range(len(slimmed_tbf)):
#     index = slimmed_tbf.loc[i,'index']
#     Street = slimmed_tbf.loc[i,'Address Building']+' '+slimmed_tbf.loc[i,'Address Street Name']
#     City = 'New York'
#     State = 'NY'
#     url="https://geocoding.geo.census.gov/geocoder/geographies/address?street={}\
#         &city={}&state={}&benchmark=Public_AR_Census2010&vintage=Census2010_Census2010&\
#         layers=14&format=json".format(Street, City, State)
#     tract = requests.get(url).json()
#     geog = tract['result']['addressMatches'][0]['geographies']['Census Blocks']
#     census_tr = json_normalize(geog)
#     tract_found = float(census_tr['TRACT'])/100
#     manhattan_restaurants.loc[index,['Census Tract']] = tract_found


In [11]:
# manhattan_restaurants.info()

In [12]:
#Let's grab them while we have all of the census tracts!
# manhattan_restaurants.to_csv('/users/richardkornblith/Data_Science/NYCHR/Data_for_NYCHR/manhattan_restaurants_csv.csv')


In [30]:
#let's now import the saved file and work from it; we also will take this occasionto convert the date column to datetime objects:
man_restaurants = pd.read_csv('/users/richardkornblith/Data_Science/NYCHR/Data_for_NYCHR/manhattan_restaurants_csv.csv',\
                              parse_dates=['License Creation Date'])
man_restaurants.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1633 entries, 0 to 1632
Data columns (total 19 columns):
Unnamed: 0                       1633 non-null int64
License Type                     1633 non-null object
License Expiration Date          1633 non-null object
License Status                   1633 non-null object
License Creation Date            1633 non-null datetime64[ns]
Industry                         1633 non-null object
Business Name                    1633 non-null object
Address Building                 1633 non-null object
Address Street Name              1633 non-null object
Secondary Address Street Name    16 non-null object
Address City                     1633 non-null object
Address ZIP                      1633 non-null int64
Address Borough                  1633 non-null object
Community Board                  1609 non-null float64
Council District                 1609 non-null float64
Census Tract                     1633 non-null float64
Longitude             

In [31]:
man_restaurants.drop(['Unnamed: 0'], axis=1, inplace=True)

In [32]:
#Noting that there are at least seven instances lacking longitude and/or latitude, let's see which they are and whether they are material:
mrs_null_latlng = man_restaurants[man_restaurants['Latitude'].isnull()]
mrs_null_latlng

Unnamed: 0,License Type,License Expiration Date,License Status,License Creation Date,Industry,Business Name,Address Building,Address Street Name,Secondary Address Street Name,Address City,Address ZIP,Address Borough,Community Board,Council District,Census Tract,Longitude,Latitude,Location
46,Business,7/15/19,Active,2018-02-12,Sidewalk Cafe,ELEVEN FOOD AND BEVERAGE INC.,11,6TH AVE,,NEW YORK,10013,Manhattan,,,33.0,,,
474,Business,9/15/20,Active,2014-02-27,Sidewalk Cafe,"E & E RESTAURANT 2, LLC",581,2ND AVENUE,,NEW YORK,10016,Manhattan,,,70.0,,,
586,Business,3/15/19,Active,2013-11-01,Sidewalk Cafe,WEST 12TH STREET RESTAURANT GROUP LLC,235,WEST 12TH STREET,,NEW YORK,10014,Manhattan,,,77.0,,,
606,Business,4/15/18,Active,2014-02-27,Sidewalk Cafe,SLJ BAR LLC,63,GANSEVOORT STREET,,NEW YORK,10014,Manhattan,102.0,3.0,79.0,,,
909,Business,4/15/20,Active,2014-01-16,Sidewalk Cafe,1462 SECOND RESTAURANT LLC,1462,2ND AVENUE,,NEW YORK,10075,Manhattan,108.0,5.0,134.0,,,
1050,Business,4/1/19,Active,2014-12-16,Sidewalk Cafe,PARM UPPER WEST LLC,235,COLUMBUS AVENUE,,NEW YORK,10023,Manhattan,,,157.0,,,
1444,Business,12/15/15,Inactive,2005-02-22,Sidewalk Cafe,STANTON RESTAURANT CORP.,82,STANTON STREET,,NEW YORK,10002,Manhattan,,,3001.0,,,


In [35]:

#Since the only instance of a restaurant that was first licensed within the time period of interest is Eleven Food and Beverage, 
#we fill in the missing information manually and drop the others
Eleven_longitude = -73.712640
Eleven_latitude = 42.708223
Eleven_Location = '-73.723640, 42.708223'
man_restaurants.loc[46, ['Longitude','Latitude','Location']] = [Eleven_longitude, Eleven_latitude,Eleven_Location]
man_restaurants_slim = man_restaurants.drop([474,586,606,909,1050,1444])
print(man_restaurants_slim.columns)
man_restaurants_slim.info()


Index(['License Type', 'License Expiration Date', 'License Status',
       'License Creation Date', 'Industry', 'Business Name',
       'Address Building', 'Address Street Name',
       'Secondary Address Street Name', 'Address City', 'Address ZIP',
       'Address Borough', 'Community Board', 'Council District',
       'Census Tract', 'Longitude', 'Latitude', 'Location'],
      dtype='object')
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1627 entries, 0 to 1632
Data columns (total 18 columns):
License Type                     1627 non-null object
License Expiration Date          1627 non-null object
License Status                   1627 non-null object
License Creation Date            1627 non-null datetime64[ns]
Industry                         1627 non-null object
Business Name                    1627 non-null object
Address Building                 1627 non-null object
Address Street Name              1627 non-null object
Secondary Address Street Name    16 non-null object
Add

In [39]:
#The rows are gone. Now, let's isolate the desired columns into a new database:

manhattan_rstrs = man_restaurants_slim[['License Type',
       'License Creation Date', 'Business Name', 'Address Building', 'Address Street Name', 'Address ZIP',
       'Census Tract', 'Longitude', 'Latitude', 'Location']]

In [40]:
print(manhattan_rstrs.info())
manhattan_rstrs.head()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1627 entries, 0 to 1632
Data columns (total 10 columns):
License Type             1627 non-null object
License Creation Date    1627 non-null datetime64[ns]
Business Name            1627 non-null object
Address Building         1627 non-null object
Address Street Name      1627 non-null object
Address ZIP              1627 non-null int64
Census Tract             1627 non-null float64
Longitude                1627 non-null float64
Latitude                 1627 non-null float64
Location                 1627 non-null object
dtypes: datetime64[ns](1), float64(3), int64(1), object(5)
memory usage: 139.8+ KB
None


Unnamed: 0,License Type,License Creation Date,Business Name,Address Building,Address Street Name,Address ZIP,Census Tract,Longitude,Latitude,Location
0,Business,2015-02-17,DISHFUL INC,189,E BROADWAY,10002,6.0,-73.988796,40.71417,"(40.71416977953143, -73.98879597867962)"
1,Business,2017-03-21,"OBBM, LLC",88,BROAD ST,10004,9.0,-74.011588,40.704055,"(40.70405528945349, -74.01158823524035)"
2,Business,2013-05-03,BROADWATER & PEARL ASSOCIATES LLC,54,PEARL ST,10004,9.0,-74.011375,40.703495,"(40.7034953786261, -74.01137534608392)"
3,Business,2008-04-03,PEARLSTONE BURGER CORPORATION,77,PEARL ST,10004,9.0,-74.010297,40.703954,"(40.70395385590203, -74.01029703120159)"
4,Business,2013-05-29,"BILL'S DTM NY, LLC",85,WEST ST,10006,13.0,-74.014875,40.709646,"(40.70964599574448, -74.01487516042008)"
