# Plotting massive data sets

**This notebook is experimental in pydeck's beta version. It may not work on all devices.**

This notebook plots 1.6 million points of LIDAR points around the Carnegie Mellon University campus. ([Source](https://github.com/ajduberstein/oakland_point_cloud)) The data points are labeled. With pydeck, we can render these points and interact with them. 

### Cleaning the data

First we need to import the data. We should expect about 1.6M points.

In [None]:
import pandas as pd

URL = 'https://raw.githubusercontent.com/ajduberstein/oakland_point_cloud/master/%s'
DATA_URL_1 = URL % 'lidar_chunks_1.csv'
DATA_URL_2 = URL % 'lidar_chunks_2.csv'
LOOKUP_URL = URL % 'ground_truth_label.csv'
lidar = pd.concat([pd.read_csv(DATA_URL_1), pd.read_csv(DATA_URL_2)])
lookup = pd.read_csv(LOOKUP_URL)
lidar = lidar.merge(lookup)

In [None]:
print('Number of points:', lidar.count()[0])

Number of points: 1614395


It does not appear to be in a standard coordinate format, so we'll scale it to make it easy to plot on a map. We'll also color objects by label type. The `data_utils.assign_random_colors` assigns a random RGB value to a vector of data labels.

In [None]:
from pydeck import data_utils

color_lookup = data_utils.assign_random_colors(lidar['label_name'])
lidar['rgb'] = lidar.apply(lambda row: color_lookup.get(row['label_name']), axis=1)
# Scaling the points using min-max scaling
lidar[['x', 'y', 'z']] -= lidar[['x', 'y', 'z']].max()
lidar[['x', 'y', 'z']] /= lidar[['x', 'y', 'z']].min()
lidar[['x', 'y']] /= 1000
lidar.head()

Unnamed: 0,x,y,z,label_id,confidence,label_name,rgb
0,0.000402,0.000472,0.680353,1400,2,facade,"[114, 92, 116]"
1,0.000402,0.000472,0.684557,1400,2,facade,"[114, 92, 116]"
2,0.000402,0.000472,0.680213,1400,2,facade,"[114, 92, 116]"
3,0.000402,0.000472,0.684557,1400,2,facade,"[114, 92, 116]"
4,0.000402,0.000472,0.688481,1400,2,facade,"[114, 92, 116]"


### Plotting the data

We'll define a single `PointCloudLayer` and plot it.

pydeck by default expects the input of `get_position` to be a string name indicating a single position value. For convenience, you can pass in a string indicating the X/Y/Z coordinate, here `get_position='[x, y, z]'`.

We'll zoom to the approximate center of the data by taking a mean of a few hundred points in pandas.

This example may take 10-15 seconds to render.

In [None]:
from pydeck import (
    Deck,
    Layer,
    ViewState,
    View
)

point_cloud = Layer(
    'PointCloudLayer',
    lidar[['x', 'y', 'z', 'label_name', 'color']].sample(10000),
    # You can specify the XYZ coordinate in a list as a string
    get_position='[x, y, z]',
    coordinate_system='COORDINATE_SYSTEM.METERS',
    get_normal=[0, 0, 1],
    get_color='color',
    radius_pixels=4)

r = Deck(
    point_cloud,
    initial_view_state=ViewState(
        fov=2,
        rotation_x=0,
        max_zoom=100,
        rotation_orbit=0,
        orbit_axis='Y',
        zoom=1,
        distance=10,
        min_distance=1,
        max_distance=100
    ),
    map_style=None
)
r.show()

DeckGLWidget(json_input='{"initialViewState": {"bearing": 60, "latitude": 0.00048716936362052445, "longitude":…

#### Citations:

Contextual Classification with Functional Max-Margin Markov Networks. 
Daniel Munoz, J. Andrew (Drew) Bagnell, Nicolas Vandapel, and Martial Hebert. 
IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), June, 2009.