# Progressive Loading and Visualization

This notebook shows the simplest code to download all the New York Yellow Taxi trips from 2015. They were all geolocated and the trip data is stored in multiple CSV files.
We visualize progressively the pickup locations (where people have been picked up by the taxis).

First, we define a few constants, where the file is located, the desired resolution, and the bounds of New York City.

In [1]:
%load_ext autoreload
%autoreload 2

LARGE_TAXI_FILE = "https://www.aviz.fr/nyc-taxi/yellow_tripdata_2015-01.csv.bz2"
RESOLUTION=512

# See https://en.wikipedia.org/wiki/Module:Location_map/data/USA_New_York_City
bounds = {
	"top": 40.92,
	"bottom": 40.49,
	"left": -74.27,
	"right": -73.68,
}

In [2]:
from progressivis.io import CSVLoader
from progressivis.stats import Histogram2D, Min, Max, Quantiles
from progressivis.vis import Heatmap

# Function to filter out trips outside of NYC.
# Since there are outliers in the files.
def filter_(df):
    lon = df['pickup_longitude']
    lat = df['pickup_latitude']
    return df[
        (lon>bounds["left"]) &
        (lon<bounds["right"]) &
        (lat>bounds["bottom"]) &
        (lat<bounds["top"])
    ]

# Create a csv loader filtering out data outside NYC
csv = CSVLoader(LARGE_TAXI_FILE, index_col=False, usecols=['pickup_longitude', 'pickup_latitude'])  # , filter_=filter_)

# Create a module to compute the min value progressively
# min = Min()
# Connect it to the output of the csv module
# min.input.table = csv.output.result
# Create a module to compute the max value progressively
# max = Max()
# Connect it to the output of the csv module
# max.input.table = csv.output.result

quantiles = Quantiles()
quantiles.input.table = csv.output.result
# Create a module to compute the 2D histogram of the two columns specified
# with the given resolution
histogram2d = Histogram2D('pickup_longitude', 'pickup_latitude', xbins=RESOLUTION, ybins=RESOLUTION)
# Connect the module to the csv results and the min,max bounds to rescale
histogram2d.input.table = csv.output.result
histogram2d.input.min = quantiles.output.result[0.03]  # min.output.result
histogram2d.input.max = quantiles.output.result[0.97]  # max.output.result
# Create a module to create an heatmap image from the histogram2d
heatmap = Heatmap()
# Connect it to the histogram2d
heatmap.input.array = histogram2d.output.result

Unexpected slot hint 0.03 for Slot(quantiles_1[result]->histogram2_d_1[min])
Unexpected slot hint 0.97 for Slot(quantiles_1[result]->histogram2_d_1[max])


In [3]:
heatmap.display_notebook()

Image(value=b'\x00', height='512', width='512')

In [4]:
# Start the scheduler
csv.scheduler().task_start()

<Task pending name='Task-5' coro=<Scheduler.start() running at /home/fekete/src/progressivis/progressivis/core/scheduler.py:273>>

Starting scheduler
# Scheduler added module(s): ['csv_loader_1', 'heatmap_1', 'histogram2_d_1', 'quantiles_1']
Leaving run loop


In [5]:
csv.scheduler()

Id,Class,State,Last Update,Order
csv_loader_1,csv_loader,state_ready,813,0
quantiles_1,quantiles,state_blocked,814,1
histogram2_d_1,histogram2_d,state_ready,811,2
heatmap_1,heatmap,state_blocked,812,3


In [6]:
csv.scheduler().task_stop()

<Task pending name='Task-8' coro=<Scheduler.stop() running at /home/fekete/src/progressivis/progressivis/core/scheduler.py:610>>