# Yellow Cab Pickups - NYC in January 2010
We're going to use DuckDB with the H3 extension to visualize them. 

This notebook is inspired by [this Fused DuckDB UDF](https://github.com/fusedio/udfs/blob/main/public/DuckDB_H3_Example_Tile/DuckDB_H3_Example_Tile.py). It's one in a series, if you want to learn more about DuckDB with H3 look at the public and community UDFs they have on GitHub.

In [19]:
! pip install keplergl pandas duckdb -q

In [10]:
import duckdb
con = duckdb.connect()
con.sql(""" INSTALL h3 FROM community;
            LOAD h3;
            INSTALL spatial; 
            LOAD spatial;
            INSTALL httpfs;
            LOAD httpfs;""")

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

Find this URL and others on this page: [https://home.nyc.gov/site/tlc/about/tlc-trip-record-data.page](https://home.nyc.gov/site/tlc/about/tlc-trip-record-data.page). Keep in mind TLC doesn't have lat long for recent files. You can see what's in the parquet by copying and running in the cell below.

In [11]:
con.sql("from read_parquet('https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2010-01.parquet')")

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

┌───────────┬─────────────────────┬─────────────────────┬─────────────────┬────────────────────┬────────────────────┬─────────────────┬───────────┬────────────────────┬────────────────────┬──────────────────┬──────────────┬───────────────────┬───────────┬─────────┬────────────┬──────────────┬──────────────┐
│ vendor_id │   pickup_datetime   │  dropoff_datetime   │ passenger_count │   trip_distance    │  pickup_longitude  │ pickup_latitude │ rate_code │ store_and_fwd_flag │ dropoff_longitude  │ dropoff_latitude │ payment_type │    fare_amount    │ surcharge │ mta_tax │ tip_amount │ tolls_amount │ total_amount │
│  varchar  │       varchar       │       varchar       │      int64      │       double       │       double       │     double      │  varchar  │      varchar       │       double       │      double      │   varchar    │      double       │  double   │ double  │   double   │    double    │    double    │
├───────────┼─────────────────────┼─────────────────────┼────────────────

This is a DuckDB SQL query that will create cells from the coordinates of pick-ups and aggregate them to cells for visualization via a Pandas DataFrame.

In [15]:
url = 'https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2010-01.parquet'
min_cnt = 10   # There's a lot of noise if you include everything 
resolution = 11

query = f"""
WITH to_cells as (
SELECT
    h3_h3_to_string(h3_latlng_to_cell(pickup_latitude, pickup_longitude, {resolution})) AS cell_id, 
    h3_cell_to_boundary_wkt(cell_id) boundary,
    count(1) as cnt
FROM read_parquet('{url}')
GROUP BY 1 
HAVING cnt >= {min_cnt})
SELECT
    boundary, -- Kepler in Jupyter doesn't like the H3 string so we have to use the Polygon boundary
    cnt
FROM to_cells

"""

df = con.sql(query).df()

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

Run the next two cells to get my configuration and display the map.

In [23]:
kepler_config = {'version': 'v1', 'config': {'visState': {'filters': [], 'layers': [{'id': 'twpodtg', 'type': 'geojson', 'config': {'dataId': 'Building Denisty', 'label': 'Building Denisty', 'color': [221, 178, 124], 'highlightColor': [252, 242, 26, 255], 'columns': {'geojson': 'boundary'}, 'isVisible': True, 'visConfig': {'opacity': 0.8, 'strokeOpacity': 0.8, 'thickness': 0.5, 'strokeColor': [136, 87, 44], 'colorRange': {'name': 'Global Warming', 'type': 'sequential', 'category': 'Uber', 'colors': ['#5A1846', '#900C3F', '#C70039', '#E3611C', '#F1920E', '#FFC300']}, 'strokeColorRange': {'name': 'Global Warming', 'type': 'sequential', 'category': 'Uber', 'colors': ['#5A1846', '#900C3F', '#C70039', '#E3611C', '#F1920E', '#FFC300']}, 'radius': 10, 'sizeRange': [0, 10], 'radiusRange': [0, 50], 'heightRange': [0, 500], 'elevationScale': 5, 'enableElevationZoomFactor': True, 'stroked': False, 'filled': True, 'enable3d': False, 'wireframe': False}, 'hidden': False, 'textLabel': [{'field': None, 'color': [255, 255, 255], 'size': 18, 'offset': [0, 0], 'anchor': 'start', 'alignment': 'center'}]}, 'visualChannels': {'colorField': {'name': 'cnt', 'type': 'integer'}, 'colorScale': 'quantile', 'strokeColorField': None, 'strokeColorScale': 'quantile', 'sizeField': None, 'sizeScale': 'linear', 'heightField': None, 'heightScale': 'linear', 'radiusField': None, 'radiusScale': 'linear'}}], 'interactionConfig': {'tooltip': {'fieldsToShow': {'Building Denisty': [{'name': 'cnt', 'format': None}]}, 'compareMode': False, 'compareType': 'absolute', 'enabled': True}, 'brush': {'size': 0.5, 'enabled': False}, 'geocoder': {'enabled': False}, 'coordinate': {'enabled': False}}, 'layerBlending': 'normal', 'splitMaps': [], 'animationConfig': {'currentTime': None, 'speed': 1}}, 'mapState': {'bearing': 0, 'dragRotate': False, 'latitude': 40.727807886704745, 'longitude': -73.88863872660393, 'pitch': 0, 'zoom': 10.032621379351138, 'isSplit': False}, 'mapStyle': {'styleType': 'dark', 'topLayerGroups': {}, 'visibleLayerGroups': {'label': False, 'road': False, 'border': False, 'building': True, 'water': True, 'land': True, '3d building': False}, 'threeDBuildingColor': [9.665468314072013, 17.18305478057247, 31.1442867897876], 'mapStyles': {}}}}

In [24]:
# Load an empty map
from keplergl import KeplerGl

# Create an instance of KeplerGl
map_1 = KeplerGl()

# Add your data to the map
map_1.add_data(data=df, name='Building Denisty')

map_1.config = kepler_config
# Display the map
map_1

User Guide: https://docs.kepler.gl/docs/keplergl-jupyter


KeplerGl(config={'version': 'v1', 'config': {'visState': {'filters': [], 'layers': [{'id': 'twpodtg', 'type': …

Run this and swap for my `kepler_config` if you want to save your changes.

In [None]:
print(map_1.config)