## README
The goal of this notebook is to create a mapping relation between the stations and the boroughs via reverse geocoding. The result is already saved in 'station_to_boro.csv'. You do NOT need to run this notebook. 
If you want to run this notebook, 
1). be sure to install geopy beforehand
2). bear in mind that it takes about 30 minutes to run the notebook since there are about 1800 stations and I can only call the API once per second
3). it may run into timed-out errors and you have to start over or start from where you timed out

In [1]:
import pandas as pd
from geopy.geocoders import Nominatim
from geopy.extra.rate_limiter import RateLimiter

In [5]:
geolocator = Nominatim(user_agent="bikeshare")
reverse = RateLimiter(geolocator.reverse, min_delay_seconds=1, max_retries=0)

In [96]:
stations = pd.read_csv('stations.csv')

In [97]:
locations=[]
for index, row in stations.iterrows():
    locations.append(reverse("{}, {}".format(row['lat'],row['lng'])).raw['address'])
pd.DataFrame(locations[:10])

Unnamed: 0,house_number,road,suburb,city,state,ISO3166-2-lvl4,postcode,country,country_code,amenity,neighbourhood,county,building
0,356.0,Bergen Street,Brooklyn,City of New York,New York,US-NY,11217,United States,us,,,,
1,,Bedford Avenue,Brooklyn,City of New York,New York,US-NY,11226,United States,us,Citi Bike,,,
2,,West 20th Street,Manhattan,City of New York,New York,US-NY,10011,United States,us,Citi Bike - West 20th Street & 8th Avenue,Chelsea District,New York County,
3,529.0,2nd Avenue,Manhattan,City of New York,New York,US-NY,10016,United States,us,,Manhattan Community Board 6,New York County,
4,,Cumberland Street,Brooklyn,City of New York,New York,US-NY,11238,United States,us,Citi Bike - Cumberland Street & Lafayette Avenue,,,
5,,Pearl Street,Manhattan,City of New York,New York,US-NY,10038,United States,us,Citi Bike - Saint James Place & Pearl Street,Manhattan Community Board 3,New York County,
6,,West 85th Street,Manhattan,City of New York,New York,US-NY,10024,United States,us,,Manhattan Community Board 7,New York County,Rossleigh Court
7,,East 2nd Street,Manhattan,City of New York,New York,US-NY,10009,United States,us,Citi Bike - East 2nd Street & Avenue B,East Village,New York County,
8,1003.0,East 174th Street,The Bronx,City of New York,New York,US-NY,10460,United States,us,,Charlotte Gardens,,
9,,Hanson Place,Brooklyn,City of New York,New York,US-NY,11217,United States,us,Citi Bike - Hanson Place & Atlantic Terminal,,,


In [99]:
coord_to_boro = pd.DataFrame(locations)
len(coord_to_boro)

1814

In [100]:
coord_to_boro.head()

Unnamed: 0,house_number,road,suburb,city,state,ISO3166-2-lvl4,postcode,country,country_code,amenity,...,residential,historic,commercial,administrative,city_block,craft,industrial,emergency,landuse,office
0,356.0,Bergen Street,Brooklyn,City of New York,New York,US-NY,11217,United States,us,,...,,,,,,,,,,
1,,Bedford Avenue,Brooklyn,City of New York,New York,US-NY,11226,United States,us,Citi Bike,...,,,,,,,,,,
2,,West 20th Street,Manhattan,City of New York,New York,US-NY,10011,United States,us,Citi Bike - West 20th Street & 8th Avenue,...,,,,,,,,,,
3,529.0,2nd Avenue,Manhattan,City of New York,New York,US-NY,10016,United States,us,,...,,,,,,,,,,
4,,Cumberland Street,Brooklyn,City of New York,New York,US-NY,11238,United States,us,Citi Bike - Cumberland Street & Lafayette Avenue,...,,,,,,,,,,


In [101]:
selected = coord_to_boro[['suburb', 'postcode', 'neighbourhood']]

In [102]:
selected.columns = ['boro', 'zipcode', 'neighborhood']

In [103]:
stations.reset_index(inplace = True, drop = True)
station_to_boro = pd.concat([stations, selected], axis = 1)
station_to_boro.head()

Unnamed: 0.1,Unnamed: 0,station_id,station_name,lat,lng,boro,zipcode,neighborhood
0,542945,4322.06,Bergen St & 4 Ave,40.682564,-73.979898,Brooklyn,11217,
1,793837,4066.15,Bedford Ave & Bergen St,40.676368,-73.952918,Brooklyn,11226,
2,815335,6224.05,W 20 St & 8 Ave,40.743453,-74.00004,Manhattan,10011,Chelsea District
3,847416,6122.09,2 Ave & E 29 St,40.741724,-73.978093,Manhattan,10016,Manhattan Community Board 6
4,15398,4428.02,Cumberland St & Lafayette Ave,40.687534,-73.972652,Brooklyn,11238,


In [104]:
station_to_boro = station_to_boro.drop(columns = "Unnamed: 0")

In [105]:
station_to_boro.boro.isna().sum()

257

In [106]:
station_to_boro[station_to_boro.boro.isna()].zipcode.unique()

array(['10458', '10451', '10452', '10454', '10457', '10453', '10455',
       '10456', '10468', '10459', '10039', '10474', '10467', '10463',
       '10472', '10460'], dtype=object)

In [107]:
station_to_boro.boro.unique()

array(['Brooklyn', 'Manhattan', 'The Bronx', 'Queens', 'Kings County',
       nan, 'Queens County'], dtype=object)

In [108]:
station_to_boro.boro = station_to_boro.boro.replace('Queens County', 'Queens')
station_to_boro.boro = station_to_boro.boro.replace('Kings County', 'Brooklyn')

In [109]:
station_to_boro.zipcode = pd.to_numeric(station_to_boro.zipcode)

In [110]:
station_to_boro.zipcode.describe()

count     1809.000000
mean     10663.470978
std        553.987517
min      10000.000000
25%      10032.000000
50%      10460.000000
75%      11218.000000
max      11415.000000
Name: zipcode, dtype: float64

In [111]:
station_to_boro[(station_to_boro.zipcode >= 10451) & (station_to_boro.zipcode <= 10475)] = 'The Bronx'

In [112]:
station_to_boro[station_to_boro.boro.isna()].zipcode.unique()

array([10039.0], dtype=object)

In [113]:
station_to_boro[station_to_boro.zipcode == 10039] = 'Manhattan'

In [114]:
station_to_boro.to_csv('station_to_boro.csv')

## Reference:
* https://towardsdatascience.com/reverse-geocoding-with-nyc-bike-share-data-cdef427987f8
* https://stackoverflow.com/questions/35491223/inverting-a-dictionary-with-list-values
* https://bklyndesigns.com/new-york-city-zip-code/