### Load NYC Taxi data
We're going to look at the distribution of trip distances for a potion of the NYC taxi data set for January 2015.

In [None]:
import os
import pandas as pd
import numpy as np
import plotly.graph_objs as go
from ipywidgets import VBox, HBox

In [None]:
pkl_path = 'data/nyc_taxi.pkl'
if os.path.exists(pkl_path):
    print('Loading saved dataset file... ', end='')
    df = pd.read_pickle(pkl_path)
    print('done')
else:
    print('Downloading and saving dataset (thanks to the datashader project for making this example dataset available!)... ', end='')
    df = pd.read_csv('http://s3.amazonaws.com/datashader-data/nyc_taxi.zip', compression='zip')
    df.to_pickle(pkl_path)
    print('done')

In [None]:
df.head()

Extract only those trips that carried at least 4 passengers. Also discard outlying trips fo more than 10 miles for visualization purposes.

In [None]:
df_cleaned = df.loc[np.logical_and(
    df.passenger_count >= 4,
    df.trip_distance.between(0, 10, inclusive=False)
)]
len(df_cleaned)

We're left with almost 1.2 million data points

## Distribution of trip distance

Initialize an empty figure with fixed x-axis range

In [None]:
fig1 = go.FigureWidget(layout={
    'xaxis': {'range': [-0.1, 10]}
})
fig1

Add a histogram trace with predefined bins (This take a couple of seconds)

In [None]:
hist = fig1.add_histogram(x=df_cleaned['trip_distance'], 
                          xbins={'start': -0.05, 'size': 0.1, 'end': 10})

Update the axis titles (This happens immediately)

In [None]:
fig1.layout.xaxis.title = 'Trip distance (mi.)'

In [None]:
fig1.layout.yaxis.title = 'Frequency'

### Plot pickup locations

Create an empty figure with hidden axes

In [None]:
fig2 = go.FigureWidget(
    layout={'width': 400, 'height': 400, 'hovermode': False,
            'xaxis': {'tickvals': []},
            'yaxis': {'tickvals': []},
            'margin': {'b': 0, 't': 0, 'l': 0, 'r': 0}
           })

fig2

Add a `scattergl` trace of the `x` and `y` coordinates of all 1.2 million pickup locations

In [None]:
scatter = fig2.add_scattergl(
    x=df_cleaned['pickup_x'],
    y=df_cleaned['pickup_y'],
    mode='markers', 
    marker={'size': 4, 'opacity': 0.1})

Constrain the aspect ratio so that view isn't distorted on zoom

In [None]:
fig2.layout.yaxis.scaleanchor = 'x'

## Install Selection Callback

Install selection callback function to update the trip distance histogram based on only the selected trips

In [None]:
def update_hist(trace, points, state):
    if points.point_inds:
        hist.x = df_cleaned['trip_distance'].iloc[points.point_inds]
    else:
        hist.x = df_cleaned['trip_distance']

scatter.on_selection(update_hist)