# Progressive Loading and Visualization

This notebook shows the simplest code to download all the New York Yellow Taxi trips from 2015. They were all geolocated and the trip data is stored in multiple CSV files.
We visualize progressively the pickup locations (where people have been picked up by the taxis).

First, we define a few constants, where the file is located and the desired resolution.

In [1]:
# We make sure the libraries are reloaded when modified
%load_ext autoreload
%autoreload 2

In [1]:
import warnings
warnings.filterwarnings("ignore")
LARGE_TAXI_FILE = "https://www.aviz.fr/nyc-taxi/yellow_tripdata_2015-01.csv.bz2"
RESOLUTION=512

In [2]:
import progressivis as pv

# Create a csv loader filtering out data outside NYC
csv = pv.ThreadedCSVLoader(LARGE_TAXI_FILE, usecols=['pickup_longitude', 'pickup_latitude'])  # , filter_=filter_)
csv.on_ending(lambda mod, rn : print(f"Finished in {round(csv.scheduler().timer())} second(s)"))

# Create a Quantile module to get rid of the 3% outliers both sides
quantiles = pv.Quantiles()
quantiles.input.table = csv.output.result
# Create a module to compute the 2D histogram of the two columns specified
# with the given resolution
histogram2d = pv.Histogram2D('pickup_longitude', 'pickup_latitude', xbins=RESOLUTION, ybins=RESOLUTION)
# Connect the module to the csv results and the min,max bounds to rescale
histogram2d.input.table = csv.output.result
histogram2d.input.min = quantiles.output.result[0.03]
histogram2d.input.max = quantiles.output.result[0.97]
# Create a module to create an heatmap image from the histogram2d
heatmap = pv.Heatmap()
# Connect it to the histogram2d
heatmap.input.array = histogram2d.output.result

Unexpected slot hint 0.03 for Slot(quantiles_1[result]->histogram2_d_1[min])
Unexpected slot hint 0.97 for Slot(quantiles_1[result]->histogram2_d_1[max])


In [3]:
heatmap.display_notebook()

VBox(children=(HBox(children=(IntProgress(value=0, description='0/0', max=1000), Button(description='Save', ic…

In [4]:
# Start the scheduler
csv.scheduler().task_start()

<Task pending name='Task-5' coro=<Scheduler.start() running at /home/poli/JDF2016/github/progressivis/progressivis/core/scheduler.py:277>>

Starting scheduler
# Scheduler added module(s): ['heatmap_1', 'histogram2_d_1', 'quantiles_1', 'threaded_csv_loader_1']


In [6]:
csv.scheduler()

Id,Class,State,Last Update,Order
threaded_csv_loader_1,threaded_csv_loader,state_ready,0,0
quantiles_1,quantiles,state_ready,0,1
histogram2_d_1,histogram2_d,state_ready,0,2
heatmap_1,heatmap,state_ready,0,3


In [7]:
# csv.scheduler().task_stop()