## Proximity Ratio
To evaluate the basket algorithm, I am proposing a new feature called the "proximity ratio," which is the number of trips less than 2 miles vs the number of trips between 2 and 10 miles. Trips over 10 miles are outside the city and the volume shouldn’t really vary based on the origin.

This notebook creates a list of blockgroups and their proximity ratio.

In [1]:
import numpy as np
import pandas as pd

In [3]:
# load the trip data set
data_dir = './data/'
df_trip = pd.read_csv(data_dir + 'Trip_Household_Merged.csv', low_memory=False)

# Filter trips that end outside seattle
df_trip = df_trip[df_trip['uv_origin'] != "Outside Seattle"]

# create bins, apply weights
df_trip['trip_path_distance'] = df_trip['trip_path_distance'].astype(float)
df_trip['dist_under_2'] = np.where(df_trip['trip_path_distance']<2, df_trip['trip_wt_final'], 0)
df_trip['dist_2_to_10'] = np.where((df_trip['trip_path_distance']>=2) & 
                                   (df_trip['trip_path_distance']<10), df_trip['trip_wt_final'], 0)

# aggregate by blockgroup origin
df_proximity = df_trip.groupby(['bg_origin'], as_index=False).agg({'dist_under_2':sum,
                                                                    'dist_2_to_10':sum,
                                                                  'bg_dest':'count'})
# filter blockgroups with trips less than 25
df_proximity = df_proximity[df_proximity['bg_dest'] >= 50]
df_proximity['proximity_ratio'] = df_proximity['dist_under_2']/df_proximity['dist_2_to_10']

# remove outliers
df_proximity = df_proximity[df_proximity['proximity_ratio'] < 10]


df_proximity = df_proximity.reset_index(drop=True)
print (df_proximity)
df_proximity.to_csv(data_dir + 'Proximity_Ratio.csv', mode='w', header=True, index=False)


        bg_origin  dist_under_2  dist_2_to_10  bg_dest  proximity_ratio
0    530330001003   8636.736578   4049.570709       61         2.132754
1    530330001005   1079.773483   7744.964000       63         0.139416
2    530330004011   3201.180992   3355.850350      193         0.953911
3    530330004013   1427.187521   2681.835932       97         0.532168
4    530330006001   3010.226331   7991.339601       78         0.376686
5    530330007001   1514.094519   5981.283601      344         0.253139
6    530330012003   4452.666913   4594.651998      139         0.969098
7    530330012004   2424.525530  16768.834479      174         0.144585
8    530330012005   3972.589899   4958.457237       98         0.801175
9    530330013002   1728.767936    881.783022       83         1.960537
10   530330013003   1002.233416   1092.355963       55         0.917497
11   530330016003   3070.480493   2092.303582       57         1.467512
12   530330017011   5068.655348   6979.477215      162         0