## Python 2.7 

### This file reads longitude and latitude in green taxi data, and classiy if they belong to airport related trips

* The longitude and latitude data in green taxi data set are different from EPSG: 4326 standard
* In order to classify trips (airport or not) successfully, projection is needed. 
* I intended to use package "pyproj" to implement such projection. However, pyproj is not compatible in Python3. Therefore I took a detour here in python 2.7
* ***Need to download "taxi_zones" shape files at the bottom of  http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml to perfrom this classification***

#### Projection

In [5]:
import fiona 
import pyproj
from functools import partial
from shapely.ops import transform
from shapely.geometry import Point,shape
import pandas as pd
import numpy as np

df = pd.read_csv("green_tripdata_2015-09.csv",header=0)
# build airport geolocation dictionary
airports = {}
with fiona.open('taxi_zones/taxi_zones.shp') as shapes:
    # define a projection
    project = partial(pyproj.transform, pyproj.Proj(shapes.crs), pyproj.Proj("+init=EPSG:4326"))
    found = 0
    for s in shapes:
        shapeID = s['properties']['OBJECTID']
        if shapeID == 138:
            airports['LGA'] = transform(project,shape(s['geometry']))
            found+=1
        elif shapeID == 1:
            airports['EWR'] = transform(project,shape(s['geometry']))
            found+=1
        elif shapeID == 132:
            airports['JFK'] = transform(project,shape(s['geometry']))
            found+=1
        if found==3:
            break

#### Classify pickup locations

In [9]:

points = [Point(lo,la) for lo, la in zip(df['Pickup_longitude'],df['Pickup_latitude'])]
pickZone = np.empty(df.shape[0],dtype=str)
for i,point in enumerate(points):
    pickZone[i] = "N"
    for airport,zone in airports.items():
        if zone.contains(point):
            pickZone[i] = airport

#### Classify dropoff locations

In [8]:
points = [Point(lo,la) for lo, la in zip(df['Dropoff_longitude'],df['Dropoff_latitude'])]
dropZone = np.empty(df.shape[0],dtype=str)
for i,point in enumerate(points):
    dropZone[i] = "N"
    for airport,zone in airports.items():
        if zone.contains(point):
            dropZone[i] = airport

#### save to csv

In [23]:
dfw = pd.DataFrame(np.array([pickZone,dropZone])).T
dfw.to_csv("airportCode.csv")