# Exploring AIS vessel-traffic data

This [Jupyter](https://jupyter.org) notebook demonstrates how to use the [Datashader](https://datashader.org)-based rendering in [HoloViews](https://holoviews.org) to explore and analyze US Coast Guard [Automatic Identification System (AIS)](https://en.wikipedia.org/wiki/Automatic_identification_system) vessel-location data. Vessels are identified by their [Maritime Mobile Service Identity](https://en.wikipedia.org/wiki/Maritime_Mobile_Service_Identity) numbers, and other data about the vessels is also typically included. Data is provided for January 2020, but additional months and years of data can be downloaded for US coastal areas from [marinecadastre.gov](marinehttps://marinecadastre.gov/ais), and similar approaches should be usable for other AIS data available for other regions.

In [None]:
import os, numpy as np, pandas as pd, panel as pn, colorcet as cc, datashader as ds, holoviews as hv
import spatialpandas as sp, spatialpandas.io, spatialpandas.geometry, spatialpandas.dask, dask.dataframe as dd

from glob import glob
from holoviews.util.transform import lon_lat_to_easting_northing as ll2en
from holoviews.operation.datashader import rasterize, datashade, dynspread, inspect_points
from dask.diagnostics import ProgressBar

hv.extension('bokeh', width=100)

## Vessel categories 

AIS pings come with an associated integer `VesselType`, which broadly labels what sort of vessel it is. Different types of vessels are used for different purposes and behave differently, as we can see if we color-code the location of each ping by the `VesselType` using Datshader. 

Type names are defined in a separate file constructed using lists of 100+ [AIS Vessel Types](https://api.vtexplorer.com/docs/ref-aistypes.html), and can be further collapsed into a smaller number of broad vessel categories:

In [None]:
vessel_types=pd.read_csv("AIS_categories.csv")
vessel_types.iloc[34:37]

We can further reduce the `category` to the 6 most common (with the rest as `Other`). We will create a dictionary which maps the value to one of the categories:

In [None]:
categories = {r.num: r.category if r.category in [0,2,3,19,12,18] else 21 for i, r in vessel_types.iterrows()}
categories[np.NaN] = 0

def category_desc(val):
    """Return description for the category with the indicated integer value"""
    return vessel_types[vessel_types.category==val].iloc[0].category_desc

vessel_mapping = dict(zip(vessel_types.num.to_list(), vessel_types.category.to_list()))

Now let us look at the categories:

In [None]:
groups = {categories[i]: category_desc(categories[i]) for i in vessel_types.num.unique()}
print(" ".join([f"{k}:{v}" for k,v in sorted(groups.items())]))

Given a set of colors, let's construct a color key for Datashader to use later, along with a visible legend we can add to such a plot:

In [None]:
colors    = cc.glasbey_bw_minc_20_minl_30
color_key = {list(groups.keys())[i]:tuple(int(e*255.) for e in v) for i,v in 
              enumerate(colors[:(len(groups))][::-1])}
legend    = hv.NdOverlay({groups[k]: hv.Points([0,0], label=str(groups[k])).opts(
                                         color=cc.rgb_to_hex(*v), size=0) 
                          for k, v in color_key.items()})
#legend #uncomment to see legend alone

## Load AIS pings

Next we will load the data from disk, either directly from a spatially indexed Parquet file (if previously cached) or from the raw CSV files. We'll also project the data to the coordinate system we will use later for plotting.

Since particularly in raw form this is a lot of data, we will use the `map_partitions` functionality of a dask.DataFrame. To do this we define a function to the conversion and an example DataFrame with the required structure:

In [None]:
def convert_partition(df):
    east, north = ll2en(df.LON.astype('float32'), df.LAT.astype('float32'))
    return sp.GeoDataFrame({
        'geometry': sp.geometry.PointArray((east, north)),
        'MMSI':     df.MMSI.fillna(0).astype('int32'),
        'category': df.VesselType.replace(categories).astype('int32')})

example = sp.GeoDataFrame({
    'geometry': sp.geometry.PointArray([], dtype='float32'),
    'MMSI':     np.array([], dtype='int32'),
    'category': np.array([], dtype='int32')})

Next we will define the function that will load our data, reading a much-smaller (and much faster to load) cached Parquet-format file from disk if available:

In [None]:
basedir = './2020/'
basename = 'AIS_2020_01'
index = 'MMSI'
dfcols = ['MMSI', 'LON', 'LAT', 'BaseDateTime', 'VesselType']
vesselcols = ['MMSI', 'IMO', 'CallSign', 'VesselName', 'VesselType', 'Length', 'Width']

def load_data():
    cache_file = basedir+basename+'_broadcast.parq'
    vessels_file = basedir+basename+'_vessels.parq'
    
    if (os.path.exists(cache_file) and os.path.exists(vessels_file)):
        print('Reading vessel info file')
        vessels = dd.read_parquet(vessels_file)

        print('Reading parquet file')
        gdf = sp.io.read_parquet_dask(cache_file).persist()
        gdf['category'] = gdf['category'].astype('category').cat.as_known()
        
    else:
        csvs = basedir+basename+'*.csv'
        df = dd.read_csv(csvs, usecols=vesselcols, assume_missing=True)
        gdf = dd.read_csv(csvs, usecols=dfcols, assume_missing=True)
        with ProgressBar():
            print('Writing vessel info file')
            vessels = df.groupby(index).last().reset_index().compute()
            vessels[index] = vessels[index].astype('int32')
            vessels.to_parquet(vessels_file)

            print('Reading CSV files')
            gdf = gdf.map_partitions(convert_partition, meta=example).persist()

            print('Writing parquet file')
            gdf = gdf.pack_partitions_to_parquet(cache_file, npartitions=64).persist()
            gdf['category'] = gdf['category'].astype('category').cat.as_known()
         
    return gdf, vessels

Actually load the data, using the disk cache and memory cache if available:

In [None]:
%%time
df, vessels = pn.state.as_cached('df', load_data)

# Plot categorical data

We can now plot the data colored by category, with a color key.

To zoom in & interact with the plot, click the “Wheel zoom” tool in the toolbar on the side of the plot. Click and drag the plot in order to look around.  As you zoom in, finer-grained detail will emerge and fill in, as long as you have a live Python process running to render the data dynamically.  Depending on the size of the dataset and your machine, updating the plot might take a few seconds.

In [None]:
x_range, y_range = ll2en([-54,-128], [15,56])
bounds = dict(x=tuple(x_range), y=tuple(y_range))

pts    = hv.Points(df, vdims=['category']).redim.range(**bounds)
points = dynspread(datashade(pts, aggregator=ds.count_cat('category'), color_key=color_key))

tiles  = hv.element.tiles.ESRI().opts(alpha=0.4, bgcolor="black").opts(responsive=True, min_height=600)
labels = hv.element.tiles.StamenLabels().opts(alpha=0.7, level='glyph')

tiles * points * labels * legend

Clearly, the ship's behavior is highly dependent on category, with very different patterns of motion between these categories (and presumably the other categories not shown). E.g. passenger vessels tend to travel _across_ narrow waterways, while towing and cargo vessels travel _along_ them. Fishing vessels, as one would expect, travel out to open water and then cover a wide area around their initial destination. Zooming and panning (using the [Bokeh](https://docs.bokeh.org/en/latest/docs/user_guide/tools.html) tools at the right) reveal other patterns at different locations and scales.

# Selecting specific datapoints

To help understand clusters of datapoints or individual datapoints, we can use the x,y location of a tap to query the dataset for a ping in that region, then highlight it on top of the main plot.

In [None]:
vessels_df = vessels.compute()

def points_transformer(df):
    return df.merge(vessels_df, on='MMSI').merge(vessel_types, on='category')

In [None]:
xr, yr   = ll2en([-126,-120.7], [47.5,49.5])
pts2     = hv.Points(df, vdims=['category']).redim.range(x=tuple(xr), y=tuple(yr))
pointsp  = dynspread(datashade(pts2, color_key=color_key, aggregator=ds.count_cat('category'), min_alpha=90))

vdims = ['MMSI', 'VesselName', 'Length', 'Width', 'category_desc']
highlight = inspect_points(pointsp, streams=[hv.streams.Tap], points_transformer=points_transformer)
highlight = highlight.opts(color='white', tools=["hover"], marker='square', size=10, fill_alpha=0)

#tiles * pointsp * highlight * legend

We could view the result above by uncommenting the last line, but let's just go ahead and make a little app so that we can let the user decide whether to have labels visible:

In [None]:
def label_fn(enable=True):
    return hv.element.tiles.StamenLabels().opts(level='glyph', alpha=0.9 if enable else 0)
show_labels = pn.widgets.Checkbox(name="Show labels", value=True)
labels = hv.DynamicMap(pn.bind(label_fn, enable=show_labels))

overlay = tiles * pointsp * highlight * labels * legend
                 
pn.Column("# Categorical plot of AIS data by type",
          "Zoom or pan to explore the data, then click to select "
          "a particular data point to see more information about it (after a delay). ",
          "You may need to zoom in before a point is selectable.",
          show_labels, overlay, sizing_mode='stretch_width').servable()