# Visualizing Flight Test Data Interactively With Open Source Tools
---
## Society of Flight Test Engineers 49th Annual International Symposium
### 9 October 2018, Savannah GA
### Luke Starnes (GTRI)

# Agenda
* OSS Value Proposition
* ADS-B Background
* Tooling Overview
* Examples

# OSS Value Proposition


* Proprietary data analysis tools are expensive and create “vendor lock”
* Walled Garden

 ![](images/walled_garden.jpg)

* open source tools are a superior choice for today’s flight test analysis problems
* open interfaces
* widespread compatibility (community of interoperable tools)
* seamless migration between tools (no “vendor lock”)
* flexibility and agility

# Open Flight Data as Lens for Talking OSS Tooling


<div align="center"><table><tr><td><img src='images/adsb.png'></td><td><img src='images/lens.png'></td><td><img src='images/osi_logo.png'></td></tr></table></div>


# ADS-B Background
* Automatic Dependent Surveillance-Broadcast
* Airfract system for broadcasting identification and position data
* Facilitated by uibiquity of GPS
* Driven by cost of maintaining ATC radars
* ADSB mandated in US starting Jan 1, 2020
 * required for aircraft operating about 10k', around airports, or off Gulf of Mexico
* European mandate starts Jan 1, 2019

* ADSB is Line of Sight - requires network of ground stations to recieve reports (min ~100NM)

<div align="center"><img src="images/adsb_ground_stations.png"></div>

* Transmissions are unencrypted
* Thus a preponderance of...

<div align="center"><img src="images/prostick.jpg"></div>

<div align="center"><img src="images/planefinder.png"></div>
<div align="center"><sup>Source: [planefinder.net](https://planefinder.net/)</sup></div>

Other simlar sites: include [flightradar24.com](https://www.flightradar24.com/), [flightaware.com](https://flightaware.com/), and [adsbexchange.com](https://www.adsbexchange.com/).

* ADSB-B Exchange ([adsbexchange.com](https://www.adsbexchange.com/)) provides public access to their worldwide dataset (begins June 9, 2016)
<div align="center"><img src="images/adsbexchange_logo_full.png"></div>
* Data made available as JSON
* Each day is a single zip file with 1,440 JSON files (1 file per minute)

# OSS Tool Stack
* __Hierarchical Data Format 5 (HDF5)__ - multiplatform, effient, fast data storage format, metadata support
* __Pandas__ - robust tool for accessing, transforming, and analyzing tabular data
* __Luigi__ - pipelining tool for managing complex pipelines with inter-dependent steps
* __Jupyter__ - (*this*) web tool for integrating code, documentation, and visualization into narrative notebook
* __Bokeh__ - browser-based interactive visualization tool
* __Datashader__ - plotting tool for visualizing large datasets (points >> pixels)

# Pandas

In [4]:
import pandas as pd
import os

# THIS ASSUMES YOU PRE MAKE FILE... BUT DISCONNECT CURRENTLY

In [31]:
# %%time
# row_len = 500_000
# h5_dir = r'c:\adsb'
# h5_files = [os.path.join(h5_dir, f) for f in os.listdir(h5_dir) if f.endswith('.h5')]
# h5_file = h5_files[2]
# pickle_name = f'{os.path.basename(h5_file)}-{row_len}.p'
# pickle_path = os.path.join(os.getcwd(), 'data', pickle_name)
# if os.path.exists(pickle_path):
#     df = pd.read_pickle(pickle_path)
# else:
#     with pd.HDFStore(h5_file) as store:
#         df = store.select('data', stop = row_len)
#         df.to_pickle(pickle_path)
df = pd.read_pickle('data/2018-06-11.h5-500000.p')

In [32]:
print(df.shape)
df.head()

(500000, 62)


Unnamed: 0,Alt,AltT,Bad,CMsgs,CNum,Call,CallSus,Cos,Cou,EngMount,...,To,Trak,TrkH,Trt,Type,Vsi,VsiT,WTC,Year,Cot
0,40000.0,0,False,4,,,False,,United States,0,...,,,False,1,,,0,0,,0
1,12728.0,0,False,3,,DAL66,False,,United States,0,...,LSZH,72.0,False,2,,0.0,0,0,,0
2,29300.0,0,False,3,1110.0,FE001,False,,Taiwan,0,...,,230.5,False,2,AT76,0.0,0,2,2013.0,0
3,33000.0,0,False,3,17000658.0,QXE2544,False,,United States,0,...,,70.0,False,2,,64.0,0,0,2017.0,0
4,36000.0,0,False,3,,DAL20,False,,United States,0,...,LFPG,72.0,False,2,,0.0,0,0,,0


In [33]:
df['Man'].value_counts()[:10]

Boeing                  126646
Airbus                   96326
Embraer                  28717
Bombardier               26020
Cessna                    7452
McDonnell Douglas         5128
Textron Aviation          3024
Gulfstream Aerospace      2825
Piper                     2565
Dassault                  1867
Name: Man, dtype: int64

In [34]:
df.groupby('Op').agg({'Icao': pd.Series.nunique}).sort_values('Icao', ascending=False)[:10]

Unnamed: 0_level_0,Icao
Op,Unnamed: 1_level_1
American Airlines,448
Southwest Airlines,441
United Airlines,336
Delta Air Lines,317
WELLS FARGO BANK NORTHWEST NA TRUSTEE - S,308
"DELTA AIR LINES INC - ATLANTA, GA",210
Private,201
JetBlue Airways,133
LATAM Airlines,108
Air Canada,107


In [37]:
airlines_filter = df['Op'].isin(('Southwest', 'American', 
      'Delta', 'United','JetBlue', 'Spirit', 'Frontier'))
table = df[airlines_filter].groupby(('Op','Type')).agg({'Icao': pd.Series.nunique}).unstack().T
table['Total'] = table.sum(skipna=True, axis=1).map(int)
table.fillna('').sort_values('Total', ascending=False)[:10]

  This is separate from the ipykernel package so we can avoid doing imports until


ValueError: No axis named 1 for object type <class 'pandas.core.series.Series'>

# Bokeh

# Why Visualization is important
<div align="center"><img src="images/anscombe's_quartet.png"></div>
<div align="center">Anscombe's quartet - dataset consisting of four sets of points which are all statistically similar, but visually varied.</div>


# Bokeh Examples
* [Heat Maps](/notebooks/GitHub/sfte2018-adsb/Bokeh - Heat Map.ipynb)
* [Flight Data with Map Tiles](/notebooks/GitHub/sfte2018-adsb/Flight Data with Map Tiles.ipynb)
* [Flight Data with Google Maps](/notebooks/GitHub/sfte2018-adsb/Flight Data with Google Maps.ipynb)

# Datashader

# Datashader Examples
* [Worldwide Viz with Datashades](/notebooks/GitHub/sfte2018-adsb/Worldwide Viz with Datashader.ipynb)
* [Interactive Datashader](/notebooks/GitHub/sfte2018-adsb/Interactive Datashader.ipynb)