# Progressive Loading and Visualization

This notebook shows the simplest code to download and visualize all the New York Yellow Taxi trips from January 2015. 
The trip data is stored in multiple CSV files, containing geolocated taxi trips.
We visualize progressively the pickup locations (where people have been picked up by the taxis).

In [1]:
# We make sure the libraries are reloaded when modified, and avoid warning messages
# %load_ext autoreload
# %autoreload 2
import warnings
warnings.filterwarnings("ignore")

In [2]:
# Some constants we'll need: the data file to download and final image size
LARGE_TAXI_FILE = "https://www.aviz.fr/nyc-taxi/yellow_tripdata_2015-01.csv.bz2"
RESOLUTION=512

## Create Modules
First, create the four modules we need.

In [3]:
from progressivis import CSVLoader, Histogram2D, Quantiles, Heatmap

# Create a CSVLoader module, a Quantile module, a Histogram2D module, and a Heatmap module.

# The CSV Loader only loads two columns of interest here with the 'usecols' keyword.
csv = CSVLoader(LARGE_TAXI_FILE, usecols=['pickup_longitude', 'pickup_latitude'])
quantiles = Quantiles()
# This Histogram2D column will compute a 2D histogram from the 2 columns with a resolution
histogram2d = Histogram2D('pickup_longitude', 'pickup_latitude', xbins=RESOLUTION, ybins=RESOLUTION)
heatmap = Heatmap()

## Connect Modules

Then, connect the modules.

In [4]:
# Now, connect the modules to create the Dataflow graph
# Quantiles inputs a table and outputs the quantiles of all the numeric columns
quantiles.input.table = csv.output.result 

# Histogram inputs a table, the minimum values for the columns, and the maximum values.
histogram2d.input.table = csv.output.result
histogram2d.input.min = quantiles.output.result[0.03]  # 0.03 quantile
histogram2d.input.max = quantiles.output.result[0.97]  # 0.97 quantile
heatmap.input.array = histogram2d.output.result

## Display the Heatmap

In [5]:
heatmap.display_notebook()

VBox(children=(HBox(children=(IntProgress(value=0, description='0/0', max=1000), Button(description='Save', ic…

## Start the scheduler

In [6]:
csv.scheduler.task_start()

<Task pending name='Task-9' coro=<Scheduler.start() running at /home/fekete/src/progressivis/progressivis/core/scheduler.py:277>>

Starting scheduler
# Scheduler added module(s): ['csv_loader_1', 'heatmap_1', 'histogram2_d_1', 'quantiles_1']


## Show the modules
printing the scheduler shows all the modules and their states

In [7]:
csv.scheduler

Id,Class,State,Last Update,Order
csv_loader_1,csv_loader,state_ready,73,0
quantiles_1,quantiles,state_blocked,74,1
histogram2_d_1,histogram2_d,state_blocked,75,2
heatmap_1,heatmap,state_blocked,72,3


## Stop the scheduler
To stop the scheduler, uncomment the next cell and run it

In [9]:

# csv.scheduler.task_stop()

<Task pending name='Task-12' coro=<Scheduler.stop() running at /home/fekete/miniforge3/envs/pvenv/lib/python3.13/site-packages/progressivis/core/scheduler.py:616>>