## DataShader

Datashader is a graphics pipeline system for creating meaningful representations of large datasets quickly and flexibly. Datashader breaks the creation of images into a series of explicit steps that allow computations to be done on intermediate representations. This approach allows accurate and effective visualizations to be produced automatically without trial-and-error parameter tuning, and also makes it simple for data scientists to focus on particular data and relationships of interest in a principled way.

The computation-intensive steps in this process are written in Python but transparently compiled to machine code using Numba and flexibly distributed across cores and processors using Dask , providing a highly optimized rendering pipeline that makes it practical to work with extremely large datasets even on standard hardware.


### Requirements

`pip install datashader`

`pip install holoviews`

`pip install Cython`

`pip install geoviews`

### Example: NYC taxi trips 

To illustrate how this process works, we will demonstrate some of the key features of Datashader using a standard "big-data" example: millions of taxi trips from New York City, USA. First let's import the libraries we are going to use and then read the dataset

The dataset can be downloaded here: 

In [None]:
import datashader as ds
import pandas as pd
from colorcet import fire
from datashader import transfer_functions as tf

df = pd.read_csv('data/nyc_taxi.csv', usecols=['dropoff_x', 'dropoff_y'])
df.head()

In [None]:
agg = ds.Canvas().points(df, 'dropoff_x', 'dropoff_y')
tf.set_background(tf.shade(agg, cmap=fire),"black")

In [None]:
import holoviews as hv
import geoviews as gv
from holoviews.operation.datashader import datashade
hv.extension('bokeh')

url = 'https://server.arcgisonline.com/ArcGIS/rest/services/World_Imagery/MapServer/tile/{Z}/{Y}/{X}.jpg'
tile_opts  = dict(width=1000,height=600,xaxis=None,yaxis=None,bgcolor='black',show_grid=False)
map_tiles  = gv.WMTS(url).opts(style=dict(alpha=0.5), plot=tile_opts)
points     = hv.Points(df, ['dropoff_x', 'dropoff_y'])
taxi_trips = datashade(points, x_sampling=1, y_sampling=1, cmap=fire, width=1000, height=600)

map_tiles * taxi_trips