# Introduction

We will analyze a sample of AIS data from the Danish Maritime Authority.

The data as been preprocessed using postgres, postgis, and timescaledb. We performed the following:

- Remove position with incorrect coordinates
- Keep one position every thirty minutes using timescaledb
- Calculate a fishing score based on [Global Fish Watch heuristic model](https://github.com/GlobalFishingWatch/vessel-scoring/blob/master/notebooks/Model-Descriptions.ipynb)
- Calculate a distance from land using land polygon from [pgosmdata](https://github.com/gma2th/pgosmdata) and postgis nearest neighbor algorithm
- Create fishing zones with dbscan algorithm


In this notebook we will:

- Load and explore the data
- Find ships with the longest self-reported fishing time
- Find ships with the longest fishing time that does not report fishing in their navigational status
- Find the longest trip of the day

In [None]:
%matplotlib inline

In [None]:
import datetime as dt
import geopandas as gpd
import numpy as np
import movingpandas as mpd
import pandas as pd
from shapely.geometry import Polygon

from fiona.crs import from_epsg

import warnings
warnings.simplefilter("ignore")

# Loading sample AIS data

In [None]:
%%time

SAMPLING_DELTA = dt.timedelta(minutes=30)

_df = gpd.read_file('data/aisdk_30min.gpkg')
df = _df.copy(deep=True)
print("Finished reading {}".format(len(df)))

Let's have a first look at the data:

In [None]:
df.head()

In [None]:
df.describe()

In [None]:
df.describe(include = ['O'])

In [None]:
df.columns

# Preprocessing

What type of ships are in our dataset?

In [None]:
df['ship_type'].value_counts().plot(kind='bar', figsize=(15,3))

The vessel might be spoofing its vessel type, but we will only work with vessels with a type fishing:

In [None]:
df = df[df.ship_type == 'Fishing']

Most of the navigational statuses are "Engaged in fishing", but there is also a lot of unknown values:

In [None]:
df.navigational_status.value_counts().plot(kind="bar")

    There are a lot of records with speed over ground (SOG) values of zero in this dataframe:

In [None]:
df['sog'].hist(bins=100, figsize=(15,3))

Let's get rid of the rows with a SOG of zero:

In [None]:
print("Original size: {} rows".format(len(df)))
df = df[df.sog>0.0]
print("Reduced to {} rows after removing 0 speed records".format(len(df)))
df['sog'].hist(bins=100, figsize=(15,3))

Let's plot the positions:

In [None]:
df.hvplot(geo=True, tiles="OSM", color='red', alpha=0.2)

# Analysis

We will use movingpandas to build and plot trajectories.
We first need to create a temporal index:

In [None]:
df['t'] = pd.to_datetime(df['bucket'])
df = df.set_index('t')

In [None]:
%%time
# MIN_LENGTH = 100 # meters
traj_collection = mpd.TrajectoryCollection(df, 'mmsi')
print("Finished creating {} trajectories".format(len(traj_collection)))

## Find ships with the longest self-reported fishing time

In [None]:
df[df["navigational_status"] == "Engaged in fishing"].groupby("mmsi").size().nlargest(10) * SAMPLING_DELTA

In [None]:
traj_collection.get_trajectory(211519000).hvplot(cmap='Dark2', height=300, line_width=5.0)

## Find ships with the longest fishing time that does not report fishing in their navigational status

In [None]:
df[(df["navigational_status"] != "Engaged in fishing") & (df["fishing_score"] > 0.5) & (df["distance_from_land"] > 1000)].groupby("mmsi").size().nlargest(10) * SAMPLING_DELTA

In [None]:
traj_collection.get_trajectory(235007860).hvplot(cmap='Dark2', height=300, line_width=5.0)

## Find the longest trip of the day

In [None]:
traj_collection.df = pd.DataFrame([(traj.id, traj) for traj in traj_collection.trajectories], columns=["id", "trajectory"])
traj_collection.df["length"] = traj_collection.df.trajectory.apply(lambda traj: traj.get_length())

In [None]:
traj_collection.df.sort_values("length", ascending=False).head()

In [None]:
traj_collection.get_trajectory(220141000).hvplot(cmap='Dark2', height=300, line_width=5.0)

# Next steps

- We use Global Fish Watch heuristic model to predict is vessel are fishing. We could go further by using their logistic model.
- Perform track analysis to find gap or AIS spoofing in messages received.


