# Understanding locations

Locations are identified by `id`s. 

In [2]:
import pandas as pd
from geopy.geocoders import Nominatim
import numpy as np

In [5]:
location_data = pd.read_csv('/Users/haekim/dev/taxis-and-ubers/data/metadata/taxi_zone_lookup.csv')
location_data

Unnamed: 0,LocationID,Borough,Zone,service_zone
0,1,EWR,Newark Airport,EWR
1,2,Queens,Jamaica Bay,Boro Zone
2,3,Bronx,Allerton/Pelham Gardens,Boro Zone
3,4,Manhattan,Alphabet City,Yellow Zone
4,5,Staten Island,Arden Heights,Boro Zone
...,...,...,...,...
260,261,Manhattan,World Trade Center,Yellow Zone
261,262,Manhattan,Yorkville East,Yellow Zone
262,263,Manhattan,Yorkville West,Yellow Zone
263,264,Unknown,,


We can see we only have the boroughs and zones, so we don't have any longitude and latitude data. We'd like to ideally have this, to be able to present heatmaps of pick ups and drop offs. So we use `geopy` to update the metadata with the longitude and latitude 

The API doesn't like the zones with `/`'s, so we have to just choose one of the names to pass into the geolocator.

In [6]:
location_data['Zone'] = location_data['Zone'].apply(
    lambda x: x.split('/')[0] if x is not np.nan and '/' in x else x
)

Now some of these locations may not be geolocated - obviously 'Outside of NYC' will not be geolocated. To us, this is not that large of a concern (as we want local answers).

Now let us declare a latitude and longitude column.

In [30]:
geolocator = Nominatim(user_agent='taxis-and-ubers')

In [32]:
def get_coords(zone):
    if zone is None:
        return None
    loc = geolocator.geocode(f"{zone['Zone']}, {zone['Borough']}")
    return (float(loc.longitude), float(loc.latitude)) if loc is not None else None

# Apply row-wise
location_data['Coordinate'] = location_data[['Zone', 'Borough']].apply(get_coords, axis=1)

Now we have coordinates!

In [33]:
location_data['Coordinate'].iloc[0][1]

40.68906405

In [34]:
location_data

Unnamed: 0,LocationID,Borough,Zone,service_zone,Coordinate
0,1,EWR,Newark Airport,EWR,"(-74.17725485035348, 40.68906405)"
1,2,Queens,Jamaica Bay,Boro Zone,"(-73.8354124, 40.6039936)"
2,3,Bronx,Allerton,Boro Zone,"(-73.8673652, 40.8654299)"
3,4,Manhattan,Alphabet City,Yellow Zone,"(-73.9795833, 40.7251022)"
4,5,Staten Island,Arden Heights,Boro Zone,"(-74.1916031653169, 40.563699850000006)"
...,...,...,...,...,...
260,261,Manhattan,World Trade Center,Yellow Zone,"(-74.012527, 40.7119004)"
261,262,Manhattan,Yorkville East,Yellow Zone,"(-73.96129216673927, 40.776654449999995)"
262,263,Manhattan,Yorkville West,Yellow Zone,
263,264,Unknown,,,"(46.7981241, -19.6650426)"


In [35]:
location_data.to_parquet('/Users/haekim/dev/taxis-and-ubers/data/metadata/taxi_zone_lookup_extra.parquet')