# Visualizing Flight Test Data Interactively With Open Source Tools
---
## Society of Flight Test Engineers 49th Annual International Symposium
### 9 October 2018, Savannah GA
### Luke Starnes (GTRI)

# Agenda
* OSS Value Proposition
* ADS-B Background
* Tooling Overview
* Examples

# OSS Value Proposition


* Proprietary data analysis tools are expensive and create “vendor lock”
* Walled Garden

 ![](images/walled_garden.jpg)

# OSS Value Proposition

* open source tools are a superior choice for today’s flight test analysis problems
* open interfaces
* widespread compatibility (community of interoperable tools)
* seamless migration between tools (no “vendor lock”)
* flexibility and agility

# Open Flight Data as Lens for Talking OSS Tooling


<div align="center"><table><tr><td><img src='images/adsb.png'></td><td><img src='images/lens.png'></td><td><img src='images/osi_logo.png'></td></tr></table></div>


# ADS-B Background
* Automatic Dependent Surveillance-Broadcast
* Airfract system for broadcasting identification and position data
* Facilitated by uibiquity of GPS
* Driven by cost of maintaining ATC radars
* ADSB mandated in US starting Jan 1, 2020
 * required for aircraft operating about 10k', around airports, or off Gulf of Mexico
* European mandate starts Jan 1, 2019

* ADSB is Line of Sight - requires network of ground stations to recieve reports (min ~100NM)

<div align="center"><img src="images/adsb_ground_stations.png"></div>

* Transmissions are unencrypted
Thus a preponderance of...

<div align="center"><img src="images/prostick.jpg" height=30% width=30%></div>

* ADSB is Line of Sight - requires network of ground stations to recieve reports (min ~100NM)
* Transmissions are unencrypted
* Thus a preponderance of... 

<div align="center"><img src="images/planefinder.png"></div>
<div align="center"><sup>Source: [planefinder.net](https://planefinder.net/)</sup></div>

Other simlar sites: include [flightradar24.com](https://www.flightradar24.com/), [flightaware.com](https://flightaware.com/), and [adsbexchange.com](https://www.adsbexchange.com/).

* ADSB-B Exchange ([adsbexchange.com](https://www.adsbexchange.com/)) provides public access to their worldwide dataset (begins June 9, 2016)
<div align="center"><img src="images/adsbexchange_logo_full.png"></div>
* Data made available as JSON
* Each day is a single zip file with 1,440 JSON files (1 file per minute)

# OSS Tool Stack
* __Hierarchical Data Format 5 (HDF5)__ - multiplatform, effient, fast data storage format, metadata support
* __Pandas__ - robust tool for accessing, transforming, and analyzing tabular data
* __Luigi__ - pipelining tool for managing complex pipelines with inter-dependent steps
* __Jupyter__ - (*this*) web tool for integrating code, documentation, and visualization into narrative notebook
* __Bokeh__ - browser-based interactive visualization tool
* __Datashader__ - plotting tool for visualizing large datasets (points >> pixels)

# Pandas

In [2]:
import pandas as pd
import os

In [3]:
%%time
row_len = 500_000
h5_dir = r'c:\adsb'
h5_file = os.path.join(h5_dir, '2018-06-16.h5')
pickle_name = f'{os.path.basename(h5_file)}-{row_len}.p'
pickle_path = os.path.join(os.getcwd(), 'data', pickle_name)
if os.path.exists(pickle_path):
    df = pd.read_pickle(pickle_path)
else:
    with pd.HDFStore(h5_file, mode='r') as store:
        df = store.select('data', stop = row_len, columns=['Man', 'Icao', 'Type', 'Op'])
    main_ops = ['Southwest', 'American', 'Delta', 'SkyWest', 'Air Canada', 'Alaska', 
                'Virgin', 'United','JetBlue', 'Spirit', 'Frontier', 'Wells Fargo']
    for o in main_ops:
        df.loc[df.Op.fillna('-').str.lower().str.contains(o.lower()), 'Op'] = o
    df.to_pickle(pickle_path)

Wall time: 644 ms


In [4]:
print(df.shape)
df.dropna(how='any').head()

(500000, 4)


Unnamed: 0,Man,Icao,Type,Op
1,Raytheon Aircraft Company,A3286B,BE40,"MOSER AVIATION LLC - ENGLEWOOD, CO"
3,Boeing,AB1FFE,B739,Delta
6,Robinson,A56D30,R44,Robinson Helicopter Company
10,Airbus,424356,A320,Aeroflot Russian Airlines
13,McDonnell Douglas,AD8563,MD83,Wells Fargo


In [51]:
df['Man'].value_counts()[:10]

Boeing                          120932
Airbus                           92420
Embraer                          27713
Bombardier                       26085
Cessna                            9963
McDonnell Douglas                 5777
Gulfstream Aerospace              2578
Beech                             2491
Piper                             2365
Avions de Transport Regional      2350
Name: Man, dtype: int64

In [53]:
df.groupby('Op').agg({'Icao': pd.Series.nunique}).sort_values('Icao', ascending=False)[:10]

Unnamed: 0_level_0,Icao
Op,Unnamed: 1_level_1
American,649
Delta,573
United,537
Southwest,483
Wells Fargo,327
Private,234
Air Canada,226
JetBlue,150
Virgin,132
SkyWest,126


In [47]:
airlines_filter = df['Op'].isin(df.Op.value_counts().index[:10])
table = df[airlines_filter].groupby(['Op','Type']).agg({'Icao': pd.Series.nunique}).unstack().T
table['Total'] = table.sum(skipna=True, axis=1).map(int)
table.sort_values('Total', ascending=False).fillna('')[:10]

Unnamed: 0_level_0,Op,Air Canada,American,Delta,JetBlue,Private,SkyWest,Southwest,United,Virgin,Wells Fargo,Total
Unnamed: 0_level_1,Type,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Icao,B738,,172.0,52.0,,,,142.0,48.0,53.0,21,488
Icao,B737,,,4.0,,,,317.0,17.0,2.0,18,358
Icao,A320,36.0,24.0,37.0,83.0,,,,57.0,35.0,10,282
Icao,A321,12.0,141.0,40.0,37.0,,,,,,4,234
Icao,A319,22.0,55.0,33.0,,,,,45.0,7.0,13,175
Icao,B739,,,64.0,,,,,80.0,,1,145
Icao,CRJ9,11.0,7.0,60.0,,,10.0,,,,25,113
Icao,E145,,50.0,,,,,,7.0,,36,93
Icao,B752,,9.0,50.0,,,,,29.0,,4,92
Icao,CRJ7,,13.0,10.0,,,38.0,,9.0,,21,91


# Bokeh

# Why Visualization is important
<div align="center"><img src="images/anscombe's_quartet.png"></div>
<div align="center">Anscombe's quartet - dataset consisting of four sets of points which are all statistically similar, but visually varied.</div>


# Bokeh Examples
* [Heat Maps](/notebooks/GitHub/sfte2018-adsb/Bokeh - Heat Map.ipynb)
* [Flight Data with Map Tiles](/notebooks/GitHub/sfte2018-adsb/Flight Data with Map Tiles.ipynb)
* [Flight Data with Google Maps](/notebooks/GitHub/sfte2018-adsb/Flight Data with Google Maps.ipynb)

# Datashader - The Why

<div align="center"><img src="images/datashader-plotting-pitfalls.png"></div>


* overplotting - occlusion
* oversatuation - after employing transparency, you still have cases where details are not fully discernable. oversaturation obscures spatial differences in density
* undersampling - make dots really small. But this can hide important detail
* undersaturation - by trying to fix above problems, its easy to end up with case of 
* Underutinzied range - the need to appropriatly apply color ad a function of count in a way to relays most info
* Nonuniform colormapping - the improtance of using color palett that shows detail

# Datashader Examples
* [Worldwide Viz with Datashades](/notebooks/GitHub/sfte2018-adsb/Worldwide Viz with Datashader.ipynb)
* [Interactive Datashader](/notebooks/GitHub/sfte2018-adsb/Interactive Datashader.ipynb)

# Conclusion

* open source tools are a superior choice for today’s flight test analysis problems
* open interfaces
* widespread compatibility (community of interoperable tools)
* seamless migration between tools (no “vendor lock”)
* flexibility and agility

### Slides / Notebooks available here:

* https://github.com/slstarnes/sfte2018-adsb