<a href="https://colab.research.google.com/github/fiorellaguillen/CASA0025/blob/main/notebooks/W04_postgis2_try2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Ship-to-Ship Transfer Detection

Now for a less structured exercise. We're going to look at ship-to-ship transfers. The idea is that two ships meet up in the middle of the ocean, and one ship transfers cargo to the other. This is a common way to avoid sanctions, and is often used to transfer oil from sanctioned countries to other countries. We're going to look at a few different ways to detect these transfers using AIS data.

In [2]:
%pip install duckdb duckdb-engine jupysql



In [3]:
import duckdb
import pandas as pd

# Import jupysql Jupyter extension to create SQL cells
%load_ext sql
%config SqlMagic.autopandas = True
%config SqlMagic.feedback = False
%config SqlMagic.displaycon = False
%sql duckdb:///:memory:

In [4]:
%%sql
INSTALL httpfs;
LOAD httpfs;
INSTALL spatial;
LOAD spatial;

Unnamed: 0,Success


## Step 1

Create a spatial database using the following AIS data:

https://storage.googleapis.com/qm2/casa0025_ships.csv

Each row in this dataset is an AIS 'ping' indicating the position of a ship at a particular date/time, alongside vessel-level characteristics.

It contains the following columns:
* `vesselid`: A unique numerical identifier for each ship, like a license plate
* `vessel_name`: The ship's name
* `vsl_descr`: The ship's type
* `dwt`: The ship's Deadweight Tonnage (how many tons it can carry)
* `v_length`: The ship's length in meters
* `draught`: How many meters deep the ship is draughting (how low it sits in the water). Effectively indicates how much cargo the ship is carrying
* `sog`: Speed over Ground (in knots)
* `date`: A timestamp for the AIS signal
* `lat`: The latitude of the AIS signal (EPSG:4326)
* `lon`: The longitude of the AIS signal (EPSG:4326)

Create a table called 'ais' where each row is a different AIS ping, with no superfluous information. Construct a geometry column.

Create a second table called 'vinfo' which contains vessel-level information with no superfluous information.

You can set a spatial index on each of these tables as follows:

`CREATE INDEX index_name ON table_name USING RTREE(geom);`

In [5]:
%%sql

SELECT * FROM "https://storage.googleapis.com/qm2/casa0025_ships.csv" ;

Unnamed: 0,vesselid,vessel_name,vsl_descr,dwt,v_length,draught,sog,date,lat,lon,geom
0,350053,30 Let Pobedy,general cargo,5150.0,,3.5,5.2,2022-07-25 02:53:29,45.151777,36.513327,POINT (36.5133266666667 45.1517766666667)
1,350053,30 Let Pobedy,general cargo,5150.0,,3.5,0.7,2022-07-25 03:09:37,45.146487,36.520780,POINT (36.52078 45.1464866666667)
2,350053,30 Let Pobedy,general cargo,5150.0,,3.5,0.7,2022-07-25 03:13:58,45.146218,36.521965,POINT (36.521965 45.1462183333333)
3,350053,30 Let Pobedy,general cargo,5150.0,,3.5,0.1,2022-07-25 04:16:06,45.145058,36.522020,POINT (36.52202 45.1450583333333)
4,350053,30 Let Pobedy,general cargo,5150.0,,3.5,0.0,2022-07-25 05:20:17,45.144933,36.521848,POINT (36.5218483333333 45.1449333333333)
...,...,...,...,...,...,...,...,...,...,...,...
101323,217531,Zubeyde,roll on roll off with container capacity,5000.0,113.0,4.5,0.1,2022-08-10 14:16:47,45.091987,36.522157,POINT (36.5221566666667 45.0919866666667)
101324,217531,Zubeyde,roll on roll off with container capacity,5000.0,113.0,4.5,0.1,2022-08-10 14:43:48,45.091643,36.522213,POINT (36.5222133333333 45.0916433333333)
101325,217531,Zubeyde,roll on roll off with container capacity,5000.0,113.0,4.5,5.8,2022-08-10 15:04:28,45.100457,36.519397,POINT (36.5193966666667 45.1004566666667)
101326,217531,Zubeyde,roll on roll off with container capacity,5000.0,113.0,4.5,8.3,2022-08-23 06:06:51,45.087527,36.506987,POINT (36.5069866666667 45.0875266666667)


In [6]:
%%sql

DROP TABLE IF EXISTS vinfo;

CREATE TABLE vinfo AS
SELECT DISTINCT vesselid, vessel_name, dwt, v_length
FROM "https://storage.googleapis.com/qm2/casa0025_ships.csv";

SELECT * FROM vinfo LIMIT 5;

Unnamed: 0,vesselid,vessel_name,dwt,v_length
0,301537,Omskiy 86,3201.0,108.0
1,296750,Omskiy 99,3108.0,54.0
2,286084,Omskiy- 119,3157.0,108.0
3,278778,Omskiy-103,3283.0,108.0
4,246401,Omskiy-106,3191.0,108.0


In [7]:
%%sql

DROP TABLE IF EXISTS ais;

CREATE TABLE ais AS
SELECT vesselid, date, sog, ST_GEOMFROMTEXT(geom) AS geom
FROM "https://storage.googleapis.com/qm2/casa0025_ships.csv";

CREATE INDEX ais_index ON ais USING RTREE(geom);

SELECT * FROM ais LIMIT 5;

Unnamed: 0,vesselid,date,sog,geom
0,350053,2022-07-25 02:53:29,5.2,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, ..."
1,350053,2022-07-25 03:09:37,0.7,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, ..."
2,350053,2022-07-25 03:13:58,0.7,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, ..."
3,350053,2022-07-25 04:16:06,0.1,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, ..."
4,350053,2022-07-25 05:20:17,0.0,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, ..."


In [8]:
%%sql
SELECT COUNT(*) FROM ais


Unnamed: 0,count_star()
0,101328


In [9]:
%%sql
SELECT COUNT(*) FROM vinfo

Unnamed: 0,count_star()
0,835


## Step 2

Use a spatial join to identify ship-to-ship transfers in this dataset.
Two ships are considered to be conducting a ship to ship transfer IF:

* They are within 500 meters of each other
* For more than two hours
* And their speed is lower than 1 knot

Some things to consider: make sure you're not joining ships with themselves. Try working with subsets of the data first while you try different things out.

In [None]:
%%sql

SELECT
a1.vesselid AS ship1,
a2.vesselid AS ship2,
a1.date AS start,
a2.date AS end

FROM ais AS a1
JOIN ais AS a2
ON ST_DWITHIN(a1.geom, a2.geom, 500)
AND ABS(EXTRACT(EPOCH FROM (a2.date - a1.date))) > 7200
AND a1.vesselid <> a2.vesselid
AND a1.sog < 1
AND a2.sog < 1


FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

In [None]:
%%sql

SELECT
a1.vesselid AS ship1,
a2.vesselid AS ship2,
MIN(a1.date) AS start,
MAX(a2.date) AS end,
GROUP BY a1.vesselid, a2.vesselid

FROM ais AS a1
JOIN ais AS a2
ON ST_DWITHIN(a1.geom, a2.geom, 500)
AND ABS(EXTRACT(EPOCH FROM (a2.date - a1.date))) > 7200
AND a1.vesselid <> a2.vesselid
AND a1.sog < 1
AND a2.sog < 1
LIMIT 5;

In [None]:
%%sql

WITH interactions AS (
    SELECT
        a1.vesselid AS ship1,
        a2.vesselid AS ship2,
        a1.date AS start_time,
        a2.date AS end_time
    FROM ais AS a1
    JOIN ais AS a2
    ON ST_DWITHIN(a1.geom, a2.geom, 500)
    AND a1.vesselid <> a2.vesselid
    AND a1.sog < 1
    AND a2.sog < 1
)
SELECT
    ship1,
    ship2,
    CAST(start_time AS DATE) AS interaction_day,
    MIN(start_time) AS start_time,
    MAX(end_time) AS end_time,
    EXTRACT(EPOCH FROM (MAX(end_time) - MIN(start_time))) AS duration
FROM interactions
GROUP BY ship1, ship2, interaction_day
HAVING EXTRACT(EPOCH FROM (MAX(end_time) - MIN(start_time))) BETWEEN 7200 AND 86400
LIMIT 5;

In [None]:
%%sql

WITH interactions AS (
    SELECT
        a1.vesselid AS ship1,
        a2.vesselid AS ship2,
        a1.date AS start_time,
        a2.date AS end_time
    FROM ais AS a1
    JOIN ais AS a2
    ON ST_DWITHIN(a1.geom, a2.geom, 500)
    AND a1.vesselid <> a2.vesselid
    AND a1.sog < 1
    AND a2.sog < 1
)
SELECT DISTINCT ON (ship1, ship2, CAST(start_time AS DATE))
    ship1,
    ship2,
    CAST(start_time AS DATE) AS date,
    start_time,
    end_time,
    EXTRACT(EPOCH FROM (end_time - start_time)) AS duration

FROM interactions
WHERE duration BETWEEN 7200 AND 86400
LIMIT 5;

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))