# Plotting non-geographic data

Most of the datashader examples use geographic data, because it is so easily interpreted.  But datashading will help exploration of any data dimensions.  Here let's plot `trip_distance` versus `fare_amount` for the 12-million-point NYC taxi dataset from nyc_taxi.ipynb. 

## Load NYC Taxi data

(takes a dozen seconds or so...)

In [None]:
import pandas as pd

df = pd.read_csv('data/nyc_taxi.csv',usecols=['trip_distance','fare_amount','tip_amount','passenger_count'])
x_range = (0.0,20.0)
y_range = (0.0,40.0)
df.tail()

## Define a simple plot

In [None]:
from bokeh.plotting import figure, output_notebook, show
from bokeh.tile_providers import STAMEN_TONER

output_notebook()

def base_plot():
    p = figure(tools='pan,wheel_zoom,box_zoom,reset', 
               plot_width=800, plot_height=500, 
               x_range=x_range, y_range=y_range)
    p.xgrid.grid_line_color = None
    p.ygrid.grid_line_color = None
    p.xaxis.axis_label="distance"
    p.yaxis.axis_label="fare"
    return p
    
options = dict(line_color='black', fill_color='blue', size=5)

## 1000 points reveals the expected linear relationship

In [None]:
samples = df.sample(n=1000)
p = base_plot()
p.circle(x=samples['trip_distance'], y=samples['fare_amount'], **options)
show(p)

## 10,000 points show more detailed, systematic patterns in fares and times
  
Perhaps there are different metering options, along with granularity in how times and fares are counted; in any case, the times and fares do not uniformly populate any region of this space:

In [None]:
options = dict(line_color='blue', fill_color='blue', size=1, alpha=0.05)
samples = df.sample(n=10000)
p = base_plot()
p.circle(x=samples['trip_distance'], y=samples['fare_amount'], **options)
show(p)

## Datashader reveals additional detail, especially when zooming in

You can now see that there are a lot of points below the linear boundary, representing long trips for very little cost (presumably GPS errors?).

In [None]:
import datashader as ds
from datashader.callbacks import InteractiveImage

In [None]:
p = base_plot()
pipeline = ds.Pipeline(df, ds.Point("trip_distance", "fare_amount"))
InteractiveImage(p, pipeline)

Fares are discretized to the nearest 50 cents, making patterns less visible, but there is both an upward trend in tips as fares increase (as expected), but also a large number of tips higher than the fare itself, which is surprising:

In [None]:
p = base_plot()
p.xaxis.axis_label="fare"
p.yaxis.axis_label="tip"
pipeline = ds.Pipeline(df, ds.Point("fare_amount", "tip_amount"))
InteractiveImage(p, pipeline)

Interestingly, tips go down when there are more passengers:

In [None]:
import datashader as ds
from datashader.callbacks import InteractiveImage
from bokeh.models import Range1d

p = base_plot()
p.xaxis.axis_label="passengers"
p.yaxis.axis_label="tip"
p.x_range = Range1d(0.0,7.0)
p.y_range = Range1d(0.0,60.0)

pipeline = ds.Pipeline(df, ds.Point("passenger_count", "tip_amount"))
InteractiveImage(p, pipeline)

Here for this inherently discrete data, instead of plotting each data point as a point, it might be more visible as a horizontal line segment.  Currently only points are supported, but additional glyphs will be added in later versions of the library.