# Progressive Loading and Visualization

This notebook shows a simple code to download and visualize all the New York Yellow Taxi trips from January 2015, knowing the bounds of NYC.
The trip data is stored in multiple CSV files, containing geolocated taxi trips.
We visualize progressively the pickup locations (where people have been picked up by the taxis).

In [1]:
# We make sure the libraries are reloaded when modified, and avoid warning messages
# %load_ext autoreload
# %autoreload 2
import warnings
warnings.filterwarnings("ignore")

In [2]:
# Some constants we'll need: the data file to download and final image size
LARGE_TAXI_FILE = "https://www.aviz.fr/nyc-taxi/yellow_tripdata_2015-01.csv.bz2"
RESOLUTION=512

## Define NYC Bounds
If we know the bounds, this will simplify the code.
See https://en.wikipedia.org/wiki/Module:Location_map/data/USA_New_York_City

In [3]:
from dataclasses import dataclass
@dataclass
class Bounds:
    top: float = 40.92
    bottom: float = 40.49
    left: float = -74.27
    right: float = -73.68

bounds = Bounds()

## Create Modules
First, create the four modules we need.

In [4]:
from progressivis import CSVLoader, Histogram2D, ConstDict, Heatmap, PDict

# Create a CSVLoader module, two min/max constant modules, a Histogram2D module, and a Heatmap module.

csv = CSVLoader(LARGE_TAXI_FILE, usecols=['pickup_longitude', 'pickup_latitude'])
min = ConstDict(PDict({'pickup_longitude': bounds.left, 'pickup_latitude': bounds.bottom}))
max = ConstDict(PDict({'pickup_longitude': bounds.right, 'pickup_latitude': bounds.top}))
histogram2d = Histogram2D('pickup_longitude', 'pickup_latitude', xbins=RESOLUTION, ybins=RESOLUTION)
heatmap = Heatmap()

## Connect Modules

Then, connect the modules.

In [5]:
histogram2d.input.table = csv.output.result
histogram2d.input.min = min.output.result
histogram2d.input.max = max.output.result
heatmap.input.array = histogram2d.output.result

## Display the Heatmap

In [6]:
heatmap.display_notebook()

VBox(children=(HBox(children=(IntProgress(value=0, description='0/0', max=1000), Button(description='Save', ic…

## Start the scheduler

In [7]:
csv.scheduler.task_start()

<Task pending name='Task-9' coro=<Scheduler.start() running at /home/fekete/src/progressivis/progressivis/core/scheduler.py:277>>

Starting scheduler
# Scheduler added module(s): ['const_dict_1', 'const_dict_2', 'csv_loader_1', 'heatmap_1', 'histogram2_d_1']
Leaving run loop


## Show the modules
printing the scheduler shows all the modules and their states

In [8]:
csv.scheduler

Id,Class,State,Last Update,Order
csv_loader_1,csv_loader,state_ready,47,0
histogram2_d_1,histogram2_d,state_blocked,48,3
heatmap_1,heatmap,state_blocked,49,4


## Stop the scheduler
To stop the scheduler, uncomment the next cell and run it

In [10]:

# csv.scheduler.task_stop()

<Task pending name='Task-12' coro=<Scheduler.stop() running at /home/fekete/src/progressivis/progressivis/core/scheduler.py:616>>