# cuDatashader vs Datashader comparison for simple points plotting
This notebook lets you expereince cuDataShader, which is a GPU acclerated version of DataShader.  Other cuDataShader notebooks can be found in the cuDataShader [repository]().  We hope to merge cuDataShader into the DataShader project itself.  Hooray Open Source cross collaboration!  

This notebook installs cuDataShader, which at the time of writing, doesn't have a conda package.  This may change in the future and the notebook will be updated.  We also have this notebook in [Colab](https://colab.research.google.com/drive/1bFIBg54zS9RmU58VwjJMAaqJ1xP27BXj) 

In [None]:
!nvidia-smi

## Install cuDataShader and other dependancies

In [None]:
!git clone https://github.com/rapidsai/cuDataShader.git

In [None]:
!ls
%cd cuDataShader
!pip install -e .

In [None]:
!pip install pyproj
!pip install datashader
## Ignore the restart warning

## Let's get the Taxi Data

In [None]:
## You can change the data set used by changing the year and the month: "yellow_tripdata_<YYYY>-<MM>.csv".  See working example below
!wget -O nyc_taxi.csv https://s3.amazonaws.com/nyc-tlc/trip+data/yellow_tripdata_2015-01.csv 

## Let's begin!

In [None]:
import matplotlib.pyplot as plt
import numpy as np

import pandas as pd
import cudf

import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0" # Choose GPU

pdf = pd.read_csv('nyc_taxi.csv', usecols=['dropoff_latitude','dropoff_longitude', 'passenger_count']) # Load into Pandas DF and select rows
pdf['passenger_count'] = pdf['passenger_count'].astype(np.float64) # Convert aggregation column
pdf.tail()

Then transform datapoints from GPS coordinates (longitudes, latitudes) to actually plottable 2D points :

In [None]:
from pyproj import Proj, transform

inProj = Proj(init='epsg:4326') # Latitude and longitudes
outProj = Proj(init='epsg:3857') # 2D projected points

x, y = transform(inProj, outProj, pdf['dropoff_longitude'].values, pdf['dropoff_latitude'].values) # Apply transformation

pdf['dropoff_x'] = x
pdf['dropoff_y'] = y

pdf.drop(['dropoff_latitude', 'dropoff_longitude'], axis=1, inplace=True)
pdf = pdf[~pdf.isin([np.nan, np.inf, -np.inf]).any(1)]

pdf = pdf[(pdf.dropoff_x > -8239910.23) & (pdf.dropoff_x < -8229529.24) & (pdf.dropoff_y > 4968481.34) & (pdf.dropoff_y < 4983152.92)] # Filter over Manhattan
#pdf = pdf.sample(frac=0.1) # Sample a fraction of the dataset

pdf.tail()

In [None]:
pdf.to_pickle('pdf.pkl') # Backup the dataset (prevent users from waiting above computations)

If you launched above computations once, you can start from there for now on

In [None]:
import matplotlib.pyplot as plt
import numpy as np

import pandas as pd
import cudf

import time

import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0" # Choose GPU

pdf = pd.read_pickle('pdf.pkl') # Load backup
gdf = cudf.from_pandas(pdf) # Convert to cuDF DataFrame

In [None]:
print("Dataframe has {:,} rows".format(pdf.shape[0]))

## Render image with regular Datashader

In [None]:
import datashader as ds
from datashader import transfer_functions as tf
from datashader.colors import Hot

t0 = time.time() # Save start time
cvs = ds.Canvas(plot_width=750, plot_height=625, x_range=(-8239910.23,-8229529.24), y_range=(4968481.34,4983152.92)) # Create canvas
agg = cvs.points(pdf, 'dropoff_x', 'dropoff_y', ds.count('passenger_count')) # Perform aggregation
img = tf.shade(agg, cmap=Hot, how='eq_hist') # Produce image from aggregation
ds_time = time.time()-t0 # Compute elapsed time
print("{} ms".format(round(ds_time * 1000))) # Display elapsed time

img # Display image

## Render image with GPU accelerated cuDatashader (exact same usage/syntax)

In [None]:
# Overloading Datashader functions
import cudatashader as ds
from cudatashader import transfer_functions as tf
from cudatashader.colors import Hot

t0 = time.time() # Save start time
cvs = ds.Canvas(plot_width=750, plot_height=625, x_range=(-8239910.23,-8229529.24), y_range=(4968481.34,4983152.92)) # Create canvas
agg = cvs.points(gdf, 'dropoff_x', 'dropoff_y', ds.count('passenger_count')) # Perform aggregation
img2 = tf.shade(agg, cmap=Hot, how='eq_hist') # Produce image from aggregation
cuviz_time = time.time()-t0 # Compute elapsed time
print("{} ms".format(round(cuviz_time * 1000))) # Display elapsed time

img2 # Display image

In [None]:
print('GPU speedup: {:.2f}'.format(ds_time/cuviz_time))