## Big Data Visualization
Valkyrie and Icesat 2 data are by nature big data sets that require some special considerations when working with it. The main constraint if you don't have a super computer is memory. The average granule size is in the 10s of MB for IceSat 2 and could be Gigabytes in Valkyrie depending on the order/subsetting. 

This is when libraries like Dask, Vaex and others come into play. This notebook will show you how to use some basic plotting techniques using matpplotlib and geopandas + Vaex to work effectively with lidar data from Valkyrie and ATL data from IceSat 2.


In [2]:
import warnings
warnings.filterwarnings("ignore")
import glob
import geopandas
import pandas as pd
import h5py
import vaex
import dask.dataframe as dd
import dask.array as da
import numpy as np

gibs = 'https://gibs.earthdata.nasa.gov/wmts/epsg3413/best/1.0.0/WMTSCapabilities.xml'

INFO:MainThread:numexpr.utils:NumExpr defaulting to 8 threads.


## Loading the data

In [3]:
%%time

f = h5py.File('data/atm1b_data_2020-07-11T20-39.hdf5', 'r')
print(list(f.keys()))
df_data = {
    'latitude': f['latitude'],
    'longitude': f['longitude'],
    'elevation': f['elevation'],
    'time': pd.to_datetime(f['utc_datetime'])
}
df = pd.DataFrame(data=df_data)
df.describe()

['azimuth', 'elevation', 'gps_pdop', 'gps_time', 'latitude', 'longitude', 'passive_footprint_latitude', 'passive_footprint_longitude', 'passive_footprint_synthesized_elevation', 'passive_signal', 'pitch', 'pulse_width', 'rcv_sigstr', 'rel_time', 'roll', 'utc_datetime', 'xmt_sigstr']
CPU times: user 12.8 s, sys: 1.22 s, total: 14 s
Wall time: 14.6 s


Unnamed: 0,latitude,longitude,elevation
count,6676308.0,6676308.0,6676308.0
mean,69.07662,-49.59268,469.678
std,0.1055781,0.3483853,329.8578
min,68.84219,-50.65171,20.784
25%,68.99714,-49.73917,167.037
50%,69.10129,-49.50787,470.706
75%,69.16428,-49.38291,755.904
max,69.26839,-48.96453,1052.697


In [3]:
%%time
df = vaex.open('data/atm1b_data_2020-07-11T20-39.hdf5')
# We're parsing the utc_datetime from Valkyrie into a data type that Vaex understands.
df['date'] = df.utc_datetime.values.astype('datetime64[ns]')
my_df = df['longitude', 'latitude', 'elevation', 'date']
# vaex.vrange() is like numpy.arange but uses 0-memory no matter the length.
df.add_column('index', vaex.vrange(0, len(df)))
# We are going to create a "decimated" dataframe with only 1/100 of the size of the original to plot the big picture faster.
df_decimated = df[(df.index % 100 == 0)]
my_df.describe()

CPU times: user 3.19 s, sys: 318 ms, total: 3.51 s
Wall time: 2.87 s


Unnamed: 0,longitude,latitude,elevation,date
dtype,float64,float64,float64,datetime64[ns]
count,6676308,6676308,6676308,6676308
,0,0,0,0
mean,-49.59267675460983,69.07661954488486,469.6780485657682,1970-01-01T00:12:36.724270279
std,0.348385,0.105578,329.858,2.69286e+12
min,-50.6517,68.8422,20.784,2016-05-16T12:51:49.886722048
max,-48.9645,69.2684,1052.7,2016-05-16T15:27:35.735558144


## Visualizing the big picture

In [6]:
my_df.widget.heatmap(my_df.longitude, 
               my_df.latitude,
               what=vaex.stat.mean(my_df.elevation),
               shape=512, 
               figsize=(10,6),
               limits='minmax',
               colormap='inferno')

Heatmap(children=[ToolsToolbar(interact_value=None, supports_normalize=False, template='<template>\n  <v-toolb…

In [6]:
%matplotlib widget
import vaex
from ipywidgets import widgets
import matplotlib.pyplot as plt
import cartopy.crs as ccrs

plt.figure(figsize=(10,8), dpi= 90)
ax = plt.axes(projection=ccrs.NorthPolarStereo(central_longitude=0)) 
ax.coastlines(resolution='50m', color='black', linewidth=1)
ax.set_extent([-50, -40, 60, 90], ccrs.PlateCarree())
plt.scatter(df_decimated.longitude.values,
            df_decimated.latitude.values,
            c=df_decimated.elevation.values,
            cmap='viridis',
            vmin=100,vmax=1000,
            transform=ccrs.PlateCarree())
plt.colorbar(label='elevation', shrink=0.5, extend='both')

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

<matplotlib.colorbar.Colorbar at 0x7f53b2eb0890>

In [7]:
%matplotlib widget
from ipywidgets import widgets
from ipywidgets import interact, interactive, fixed
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt


fig = plt.figure(figsize=(10,6))
ax = fig.add_subplot(111, projection='3d')
ax.view_init(70, 70)

def plot_func(alontrack):
    step = 5000 # same as density
    m = int(alontrack * step)
    ax.clear()
    ax.scatter(df.longitude.values[m:m+step],
               df.latitude.values[m:m+step],
               df.elevation.values[m:m+step],
               c=df.elevation.values[m:m+step],
               cmap='viridis', s=1)
    ax.axis('tight')
    
    

interact(plot_func, alontrack = widgets.FloatSlider(value=0,
                                                    description='Along Track Steps',
                                                    min=0,
                                                    max=90,
                                                    step=0.3,
                                                    layout={'width': '100%'}))

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

interactive(children=(FloatSlider(value=0.0, description='Along Track Steps', layout=Layout(width='100%'), max…

<function __main__.plot_func(alontrack)>

## Time Series