# Searching for Extreme Prontoers

This is a work-in-progress, analyzing the open data from [Pronto Cycle Share's Data Challenge](http://www.prontocycleshare.com/datachallenge).

See [ProntoDate.ipynb](ProntoData.ipynb) for an intro to the data; this notebook uses pieces that were derived there.

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import seaborn as sns; sns.set()

In [None]:
trips = pd.read_csv('2015_trip_data.csv',
                    parse_dates=['starttime', 'stoptime'],
                    infer_datetime_format=True)
trips['date'] = trips.starttime.map(pd.Timestamp.date).astype('datetime64')
trips['minutes'] = trips.tripduration / 60

In [None]:
stations = pd.read_csv('2015_station_data.csv')
pronto_shop = dict(id=54, name="Pronto shop",
                   terminal="Pronto shop",
                   lat=47.6173156, long=-122.3414776,
                   dockcount=100, online='10/13/2014')
stations = stations.append(pronto_shop, ignore_index=True)
distances = pd.read_csv('station_distances.csv', index_col='terminal')

In [None]:
trips['distance'] = [distances.loc[ind] for ind in
                     zip(trips.from_station_id, trips.to_station_id)]
trips['distance'] /= 1609.34  # convert meters to miles
trips['speed'] = trips.distance * 60 / trips.minutes

## Stations

In [None]:
g = sns.FacetGrid(trips, col="usertype", hue='usertype', size=6)
g.map(plt.scatter, "distance", "speed", s=4, alpha=0.2)
g.axes[0, 0].axis([0, 10, 0, 25])

In [None]:
long_trips = trips[(trips.distance > 6.5) & (trips.distance < 7.0)]
long_trips['to_station_id'].value_counts().head()

In [None]:
ride_distance = distances.loc['UD-01', 'BT-03']
subset = trips[trips.distance == ride_distance / 1609.34].copy()

In [None]:
subset['starttime'] = subset.starttime.map(lambda t: t.time())
subset['inbound'] = subset.to_station_id == 'UD-01'

plt.scatter(subset.starttime.values,
            subset.minutes.values,
            c=subset.inbound);

In [None]:
subset = subset[subset.minutes < 50]
subset.date.min(), subset.date.max()

Just for reference, here is Google's suggested route between these stations:

![Alt text](gmap-blakely.png)

<small>[(source)](https://www.google.com/maps/dir/47.666145%09-122.301491/47.615829%09-122.348564/@47.6410289,-122.3619496,13z/data=!3m1!4b1!4m10!4m9!1m3!2m2!1d-122.301491!2d47.666145!1m3!2m2!1d-122.348564!2d47.615829!3e1)</small>

Given the pattern here, my guess is that a UW student got a summer internship downtown, and rode Pronto to and from work most days.

In [None]:
plt.scatter(subset.starttime.values,
            subset.minutes.values,
            c=subset.inbound);

In [None]:
ns_in_day = 24 * 60 * 60 * 1E9
subset['daynumber'] = (subset.date - subset.date.iloc[0]).astype(int) / ns_in_day

g = sns.FacetGrid(subset, col="inbound", hue='inbound', size=6)
g.map(sns.regplot, "daynumber", "minutes");

In [None]:
groups = trips.groupby(['from_station_id', 'to_station_id'])
paired = groups.aggregate({'distance':'mean', 'trip_id':'count'})
paired.rename(columns={'trip_id': 'count'}, inplace=True)

In [None]:
countmat = paired['count'].unstack()
total = countmat + countmat.T
total.values.flat[::total.shape[0] + 1] /= 2
paired['total'] = total.stack()

In [None]:
paired.head()

In [None]:
ax = plt.axes(yscale='log')
ax.plot(paired['distance'], paired['total'], '.k')
plt.xlabel('distance between stations')
plt.ylabel('number of trips');

In [None]:
paired.query('distance > 4 & total > 90')