# AIS Sequence building

This example shows two different methods of building temporal sequences: assembling and expanding.  
It uses AIS data from a CSV file containing a full day of observations provided by the Danish Maritime Authority.  
It is divided in 2 sections, each corresponding to one MEOS example:

- [Assemble](https://github.com/estebanzimanyi/MobilityDB/blob/develop/meos/examples/ais_assemble_full.c)
- [Expand](https://github.com/estebanzimanyi/MobilityDB/blob/develop/meos/examples/ais_expand_full.c)


Note that due to its size, the data file used is not provided. It can be downloaded using the following URL:

[https://web.ais.dk/aisdata/aisdk-2023-08-01.zip](https://web.ais.dk/aisdata/aisdk-2023-08-01.zip)

Store the file in the [data](./data) directory. There is no need to decompress it.

In [1]:
import zipfile as zip
from datetime import datetime
from typing import Tuple, Optional, Dict, List

from pymeos import *
import pandas as pd

pymeos_initialize()

## Assemble

In this section, the sequences are assembled from a list of instants, i.e. we read all the instants in the CSV, and once we have read them all, we create a sequence from it. 

In [2]:
ships: Dict[int, Dict[datetime, Tuple[TGeogPointInst, TFloatInst]]] = {}

We create now a couple functions that will extract the information we need from a line of the observations CSV, and another one that creates the two instants (Position and SOG) from the extracted information

In [3]:
def get_ais_data(line: bytes) -> Optional[Tuple[int, datetime, float, float, float]]:
    fields = line.split(b',')
    try:
        timestamp = datetime.strptime(fields[0].decode(), "%m/%d/%Y %H:%M:%S")
        mmsi = int(fields[2])
        lat = float(fields[3])
        lon = float(fields[4])
        sog = float(fields[7])
    except:
        return None

    if mmsi == 0:
        return None
    if lat < 40.18 or lat > 84.17:
        return None
    if lon < -16.1 or lon > 32.88:
        return None
    if sog < 0 or sog > 1022:
        return None
    return mmsi, timestamp, lat, lon, sog


In [4]:
def add_instants(ships: Dict[int, Dict[datetime, Tuple[TGeogPointInst, TFloatInst]]],
                 mmsi: int,
                 timestamp: datetime, lat: float, lon: float, sog: float) -> None:
    if mmsi not in ships:
        ships[mmsi] = {}
    if timestamp in ships[mmsi]:
        return
    tgi = TGeogPointInst(point=(lon, lat), timestamp=timestamp, srid=4326)
    tfi = TFloatInst(value=sog, timestamp=timestamp)
    ships[mmsi][timestamp] = (tgi, tfi)


Using the previous functions, we process the CSV

In [5]:
%%time
with zip.ZipFile('./data/aisdk-2023-08-01.zip', 'r') as ais_zip:
    with ais_zip.open('aisdk-2023-08-01.csv', 'r') as ais_file:
        header = ais_file.readline()[:-1]
        for line in ais_file:
            data = get_ais_data(line)
            if data is None:
                continue
            add_instants(ships, *data)


CPU times: user 5min 59s, sys: 2.25 s, total: 6min 1s
Wall time: 6min 1s


Now, we create the sequences from the list of instants

In [6]:
%%time
ship_sequences: List[Tuple[int, int, int, int, TGeogPointSeq, TFloatSeq, float, float]] = []
for mmsi, instants in ships.items():
    geom_point_list = [item[0] for item in instants.values()]
    float_inst_list = [item[1] for item in instants.values()]

    tg = TGeogPointSeq(instant_list=geom_point_list, upper_inc=True)
    tf = TFloatSeq(instant_list=float_inst_list, upper_inc=True)
    
    distance = tg.length()
    average = tf.time_weighted_average()
    ship_sequences.append(
        (mmsi, len(geom_point_list), tg.num_instants(), tf.num_instants(), tg, tf, distance, average))

CPU times: user 36.1 s, sys: 320 ms, total: 36.5 s
Wall time: 36.5 s


We'll use Pandas to show the data

In [7]:
df = pd.DataFrame(ship_sequences, columns=['MMSI', '#Records', '#TripInstants', '#SOGInstants', 'Trajectory', 'SOG', 'Distance', 'Average Speed'])
df[['MMSI', '#Records', '#TripInstants', '#SOGInstants', 'Distance', 'Average Speed']].head()

Unnamed: 0,MMSI,#Records,#TripInstants,#SOGInstants,Distance,Average Speed
0,219020187,8170,6527,402,2563.841481,0.0018
1,219000873,7210,6533,2406,67924.08826,1.4373
2,219008746,8714,7566,1799,32187.308659,0.632541
3,219005496,14103,12850,4580,60577.325209,1.290721
4,219014072,754,566,8,469.887918,0.000399
