The goal of this kernel is to build an interactive plot of dropoff locations represented in the data. I'm using the [PyViz](http://pyviz.org/) tools, which are designed for large scale plotting and interactivity in the web browser. I'll also use Dask as a substitute for Pandas to take advantage of multiprocessing and if needed, out-of-memory storage. 

The end result of this setup is great in that you can access all 55M+ datapoints. The libraries will dynamically adjust the plot to show the right amount of detail as you zoom in and out. And you can put training data beside testing data and the views will stay synced.  It doesn't fully work once the notebook is committed but it should work in edit mode if you fork it. 

Here is an example of what you can see. The first picture is the bird's eye view, and the second is a zoom-in to Chelsea Piers.

![BirdsEye](https://s3.amazonaws.com/nonwebstorage/NYbird.jpg) 
![Zoomed](https://s3.amazonaws.com/nonwebstorage/NYzoomed.jpg)

First some imports...

In [None]:
import dask.dataframe as dd
import colorcet as cc
import geoviews as gv
import holoviews as hv
from holoviews.operation.datashader import datashade, dynspread
hv.extension('bokeh')

### Data Prep
We can load the data with Dask. I'll bring it in-memory now since we have enough RAM here. There's some missing data and bad values for the coordinates, but we can filter those out when we plot. Note that it only takes about 30s to load all the data!

In [None]:
%%time
vartypes = {'dropoff_latitude': 'float32', 'dropoff_longitude': 'float32'} #, 'passenger_count': 'int8'}
ddf = dd.read_csv('../input/train.csv', usecols=list(vartypes.keys()), dtype=vartypes).persist()
ddf['set'] = "Train"
ddtest = dd.read_csv('../input/test.csv', usecols=list(vartypes.keys()), dtype=vartypes).persist()
ddtest['set'] = "Test"

### Viz

Now we can plot our 55.4M dropoff locations for both train and test. This time I'm using a light-themed map instead of the dark/fire theme above. You'll see that when zooming in, the pixels don't adjust like in the pictures above. Fork the notebook or run it locally and it should work.

In [None]:
%%opts RGB [width=600 height=550 xaxis=None yaxis=None] (alpha=0.3)
points = gv.Points(ddf, kdims=['dropoff_longitude', 'dropoff_latitude'], label='Train Set')
spots = datashade(points, cmap=cc.kbc, normalization='eq_hist', x_range=(-74.3, -73.5), y_range=(40.5, 41.2))  #fire
points2 = gv.Points(ddtest, kdims=['dropoff_longitude', 'dropoff_latitude'], label='Test Set')
spots2 = datashade(points2, cmap=cc.kbc, normalization='eq_hist', x_range=(-74.3, -73.5), y_range=(40.5, 41.2), min_alpha=200)  #fire
tiles = gv.tile_sources.CartoLight #CartoDark
hv.Layout(tiles*dynspread(spots) + tiles*dynspread(spots2)).cols(1)

From here you can zoom in and find things that might help (or hurt) your predictive model, like pickups that appear to be in the water or Central Park. I find it easiest to use wheel zoom and pan tools together to move quickly around. Good luck!