# OpenSky flight trajectories

Flight path information for commercial flights is available for some regions of the USA and Europe from the crowd-sourced [OpenSky Network](https://opensky-network.org/).  OpenSky collects data from a large number of users monitoring public air-traffic control information.  Here we will use a subset of the data that was polled from their REST API at an interval of 1 minute over 4 days (September 5-13, 2016), using the scripts shown at the end of this notebook.  Unfortunately, we are not allowed to redistribute this data (1.1GB as a database, 600MB in HDF5), but you can run the scripts at the end of this notebook to collect some yourself, or else you can contact Open Sky asking for a copy of the dataset.

We'll only use some of the fields provided by Open Sky, out of: *icao24, callsign, origin, time_position, time_velocity, longitude, latitude, altitude, on_ground, velocity, heading, vertical_rate, sensors, timestamp*

If you are able to get a copy of the data, you can create an environment with all the packages required to run this notebook using `conda env create opensky.ipynb`, and can then switch to it using `source activate opensky` so that you can launch jupyter notebook.

Here, we'll load the data and declare that some fields are categorical (which isn't information that HDF5 stores):

In [None]:
%%time
import pandas as pd

flightpaths = pd.read_hdf('data/opensky.h5', 'flights')
flightpaths['origin']    = flightpaths.origin.astype('category')
flightpaths['on_ground'] = flightpaths.on_ground.astype('category')
flightpaths['ascending'] = flightpaths.ascending.astype('category')

In [None]:
flightpaths.tail()

The default database has about 10 million points, with some metadata for each.  

Now let's define a datashader-based processing pipeline to render images:

In [None]:
import datashader as ds
import datashader.transfer_functions as tf
from colorcet import fire
from matplotlib.colors import rgb2hex
from matplotlib.cm import get_cmap

import numpy as np
from cartopy import crs

plot_width  = 850
plot_height = 600
x_range = (-2.0e6, 2.5e6)
y_range = (4.1e6, 7.8e6)

def categorical_color_key(ncats,cmap):
    """Generate a color key from the given colormap with the requested number of colors"""
    mapper = get_cmap(cmap)
    return [str(rgb2hex(mapper(i))) for i in np.linspace(0, 1, ncats)]

def create_image(x_range=x_range, y_range=y_range, w=plot_width, h=plot_height, 
                 aggregator=ds.count(), categorical=None, black=False, cmap="blue"):
    opts={}
    if categorical and cmap:
        opts['color_key'] = categorical_color_key(len(flightpaths[aggregator.column].unique()),cmap)       

    cvs = ds.Canvas(plot_width=w, plot_height=h, x_range=x_range, y_range=y_range)
    agg = cvs.line(flightpaths, 'longitude', 'latitude',  aggregator)
    img = tf.shade(agg, cmap=cmap, **opts)
        
    if black: img = tf.set_background(img, 'black')
    return img

We can use this function to get a dump of all of the trajectory information:

In [None]:
%%time
create_image(aggregator=ds.count(), cmap=fire, black=True)

This plot shows all of the trajectories in this database, overlaid in a way that avoids [overplotting](https://anaconda.org/jbednar/plotting_pitfalls/notebook).  With this "fire" color map, a single trajectory shows up as black, while increasing levels of overlap show up as brighter colors.  

A static image on its own like this is difficult to interpret, but if we overlay it on a map we can see where these flights originate, and can zoom in to see detail in specific regions:

In [None]:
from datashader.bokeh_ext import InteractiveImage
from bokeh.plotting import figure, output_notebook
from bokeh.tile_providers import WMTSTileSource

output_notebook()

def base_plot(tools='pan,wheel_zoom,reset',plot_width=plot_width, plot_height=plot_height,**plot_args):
    p = figure(tools=tools, plot_width=plot_width, plot_height=plot_height,
        x_range=x_range, y_range=y_range, outline_line_color=None,
        min_border=0, min_border_left=0, min_border_right=0,
        min_border_top=0, min_border_bottom=0, **plot_args)
    
    p.axis.visible = False
    p.xgrid.grid_line_color = None
    p.ygrid.grid_line_color = None
    
    return p

ArcGIS=WMTSTileSource(url='http://server.arcgisonline.com/ArcGIS/rest/services/'
                      'World_Street_Map/MapServer/tile/{Z}/{Y}/{X}.png')

In [None]:
p = base_plot()
p.add_tile(ArcGIS)
InteractiveImage(p, create_image, aggregator=ds.count())

E.g. try zooming in on London in the above figure, which has a lot of structure not visible in the initial rendering but visible on a zoom. Note that zooming in will only reveal more detail in the datashader plot if you are working with a live server; a static HTML view (e.g. on Anaconda Cloud) will dynamically update the underlying map plot, but not the data.  

We can use the metadata associated with each trajectory to show additional information.  For instance, we can color each flight by its country of origin, using the key:

* **UK** - Orange
* **Germany** - Blue
* **Netherland** - Teal
* **Switzerland** - Yellow
* **France** - Purple
* **Norway** - Green
* **USA** - Red

(There are actually more than a hundred different origins, so this key is only approximate.)

In [None]:
p = base_plot()
p.add_tile(ArcGIS)
InteractiveImage(p, create_image, categorical=True, aggregator=ds.count_cat('origin'), cmap='hsv_r')

Or we can label ascending (Blue) vs. descending flights (Red), which is particularly informative when zooming in on specific airports:

In [None]:
p = base_plot()
p.add_tile(ArcGIS)
InteractiveImage(p, create_image, aggregator=ds.count_cat('ascending'), cmap=None)

Or we can show velocity, which of course decreases (dark colors) when approaching or leaving airports:

In [None]:
p = base_plot()
p.add_tile(ArcGIS)
InteractiveImage(p, create_image, aggregator=ds.mean('velocity'), cmap=fire)

The flight patterns associated with each airport are clearly visible in these close-ups of various cities, where the circular holding pattern for landings (red) is clearly visible for the various airports in London:

In [None]:
import holoviews as hv
hv.notebook_extension()

In [None]:
%%output dpi=140
%%opts RGB [xaxis=None yaxis=None] Layout [hspace=0.1 vspace=0 sublabel_format=None]

def to_rgb(img):
    return np.flipud(img.view(dtype=np.uint8).reshape(img.shape[:2] + (4,)))

cities = {'Frankfurt' : (8.6821, 50.1109),
          'London'    : (-0.1278, 51.5074), 
          'Paris'     : (2.3522, 48.8566),
          'Amsterdam' : (4.8952, 52.3702),
          'Zurich'    : (8.5417, 47.3769),
          'Munich'    : (11.5820, 48.1351)}

radius = 150000

mercator_cities = {city: crs.GOOGLE_MERCATOR.transform_point(lon, lat, crs.PlateCarree()) 
                   for city, (lon, lat) in cities.items()}
city_ranges = {city: dict(x_range=(lon-radius, lon+radius), y_range=(lat-radius, lat+radius))
               for city, (lon, lat) in mercator_cities.items()}

hv.Layout([hv.RGB(to_rgb(create_image(aggregator=ds.count_cat('ascending'), black=True,
                                      categorical=True, w=300, h=300, cmap=None, **ranges).data), group=city)
                    for city, ranges in sorted(city_ranges.items())]).cols(3)

Or colorized by flight origin:

In [None]:
%%output dpi=140
%%opts RGB [xaxis=None yaxis=None] Layout [hspace=0.1 vspace=0 sublabel_format=None]

hv.Layout([hv.RGB(to_rgb(create_image(aggregator=ds.count_cat('origin'), black=True,
                                      categorical=True, w=300, h=300, cmap='hsv_r', **ranges).data), group=city)
                    for city, ranges in sorted(city_ranges.items())]).display('all').cols(3)

The patterns for a single city can make a nice wallpaper for your desktop if you wish:

In [None]:
city,ranges = "Zurich",city_ranges["Zurich"]
create_image(aggregator=ds.count_cat('origin'), black=False,
                           categorical=True, w=800, h=800, cmap='hsv_r', **ranges)

As you can see, datashader makes it quite easy to explore even large databases of trajectory information, without trial and error parameter setting and experimentation.  These examples have millions of datapoints, but it could work with [billions](http://anaconda.org/jbednar/osm/notebook) just as easily, covering long time ranges or large geographic areas. Check out the other [datashader notebooks](http://anaconda.org/jbednar/notebooks) for other examples!


## Downloading and preparing the data

This data was obtained by running a cron job with the following script running at one-minute intervals over a four-day period:

```python
import json
import sqlite3
import requests
import pandas as pd

DB='data/flight.db'
conn = sqlite3.connect(DB)
api_url = 'https://opensky-network.org/api/states/all'

cols = ['icao24', 'callsign', 'origin', 'time_position',
        'time_velocity', 'longitude', 'latitude',
        'altitude', 'on_ground', 'velocity', 'heading',
        'vertical_rate', 'sensors']

req = requests.get(api_url)
content = json.loads(req.content)
states = content['states']
df = pd.DataFrame(states, columns=cols)
df['timestamp'] = content['time']
df.to_sql('flights', conn, index=False, if_exists='append')
```

The resulting `flight.db` file was then transformed into Web Mercator coordinates, split per flight, and exported to HDF5 format, using the code below.  This process took about 7 minutes on a MacBook Pro laptop.

```python
import sqlite3
import pandas as pd
import numpy as np
import holoviews as hv
from cartopy import crs

def transform_coords(df):
    df = df.copy()
    lons = np.array(df['longitude'])
    lats = np.array(df['latitude'])
    coords = crs.GOOGLE_MERCATOR.transform_points(crs.PlateCarree(), lons, lats)
    df['longitude'] = coords[:, 0]
    df['latitude']  = coords[:, 1]
    return df

def split_flights(dataset):
    df = dataset.data.copy().reset_index(drop=True)
    df = df[np.logical_not(df.time_position.isnull())]
    empty=df[:1].copy()
    empty.loc[0, :] = (np.NaN,)*14
    paths = []
    for gid, group in df.groupby('icao24'):
        times = group.time_position
        for split_df in np.split(group.reset_index(drop=True), np.where(times.diff()>600)[0]):
            if len(split_df) > 20:
                paths += [split_df, empty]
    split = pd.concat(paths,ignore_index=True)
    split['ascending'] = split.vertical_rate>0
    return split

# Load the data from a SQLite database and apply the transforms
DB='data/flight.db'
conn = sqlite3.connect(DB)
dataset = hv.Dataset(transform_coords(pd.read_sql("SELECT * from flights", conn)))
flightpaths = split_flights(dataset)

# Remove unused columns
flightpaths=flightpaths[['longitude', 'latitude', 'origin', 'on_ground', 'ascending','velocity']]
flightpaths['origin']=flightpaths.origin.astype(str)

# Save as hdf5 format
flightpaths.to_hdf("data/opensky.h5","flights")
```