<a href="https://colab.research.google.com/github/fiorellaguillen/CASA0025/blob/main/notebooks/W04_postgis2_try3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Ship-to-Ship Transfer Detection

Now for a less structured exercise. We're going to look at ship-to-ship transfers. The idea is that two ships meet up in the middle of the ocean, and one ship transfers cargo to the other. This is a common way to avoid sanctions, and is often used to transfer oil from sanctioned countries to other countries. We're going to look at a few different ways to detect these transfers using AIS data.

In [1]:
%pip install duckdb duckdb-engine jupysql



In [2]:
import duckdb
import pandas as pd

# Import jupysql Jupyter extension to create SQL cells
%load_ext sql
%config SqlMagic.autopandas = True
%config SqlMagic.feedback = False
%config SqlMagic.displaycon = False
%sql duckdb:///:memory:

In [3]:
%%sql
INSTALL httpfs;
LOAD httpfs;
INSTALL spatial;
LOAD spatial;

Unnamed: 0,Success


## Step 1

Create a spatial database using the following AIS data:

https://storage.googleapis.com/qm2/casa0025_ships.csv

Each row in this dataset is an AIS 'ping' indicating the position of a ship at a particular date/time, alongside vessel-level characteristics.

It contains the following columns:
* `vesselid`: A unique numerical identifier for each ship, like a license plate
* `vessel_name`: The ship's name
* `vsl_descr`: The ship's type
* `dwt`: The ship's Deadweight Tonnage (how many tons it can carry)
* `v_length`: The ship's length in meters
* `draught`: How many meters deep the ship is draughting (how low it sits in the water). Effectively indicates how much cargo the ship is carrying
* `sog`: Speed over Ground (in knots)
* `date`: A timestamp for the AIS signal
* `lat`: The latitude of the AIS signal (EPSG:4326)
* `lon`: The longitude of the AIS signal (EPSG:4326)

Create a table called 'ais' where each row is a different AIS ping, with no superfluous information. Construct a geometry column.

Create a second table called 'vinfo' which contains vessel-level information with no superfluous information.

You can set a spatial index on each of these tables as follows:

`CREATE INDEX index_name ON table_name USING RTREE(geom);`

In [4]:
#visualize all data and its structure

%%sql

SELECT * FROM "https://storage.googleapis.com/qm2/casa0025_ships.csv" ;

Unnamed: 0,vesselid,vessel_name,vsl_descr,dwt,v_length,draught,sog,date,lat,lon,geom
0,350053,30 Let Pobedy,general cargo,5150.0,,3.5,5.2,2022-07-25 02:53:29,45.151777,36.513327,POINT (36.5133266666667 45.1517766666667)
1,350053,30 Let Pobedy,general cargo,5150.0,,3.5,0.7,2022-07-25 03:09:37,45.146487,36.520780,POINT (36.52078 45.1464866666667)
2,350053,30 Let Pobedy,general cargo,5150.0,,3.5,0.7,2022-07-25 03:13:58,45.146218,36.521965,POINT (36.521965 45.1462183333333)
3,350053,30 Let Pobedy,general cargo,5150.0,,3.5,0.1,2022-07-25 04:16:06,45.145058,36.522020,POINT (36.52202 45.1450583333333)
4,350053,30 Let Pobedy,general cargo,5150.0,,3.5,0.0,2022-07-25 05:20:17,45.144933,36.521848,POINT (36.5218483333333 45.1449333333333)
...,...,...,...,...,...,...,...,...,...,...,...
101323,217531,Zubeyde,roll on roll off with container capacity,5000.0,113.0,4.5,0.1,2022-08-10 14:16:47,45.091987,36.522157,POINT (36.5221566666667 45.0919866666667)
101324,217531,Zubeyde,roll on roll off with container capacity,5000.0,113.0,4.5,0.1,2022-08-10 14:43:48,45.091643,36.522213,POINT (36.5222133333333 45.0916433333333)
101325,217531,Zubeyde,roll on roll off with container capacity,5000.0,113.0,4.5,5.8,2022-08-10 15:04:28,45.100457,36.519397,POINT (36.5193966666667 45.1004566666667)
101326,217531,Zubeyde,roll on roll off with container capacity,5000.0,113.0,4.5,8.3,2022-08-23 06:06:51,45.087527,36.506987,POINT (36.5069866666667 45.0875266666667)


In [5]:
#create table with information of vessels that won't change

%%sql

DROP TABLE IF EXISTS vinfo;

CREATE TABLE vinfo AS
SELECT DISTINCT vesselid, vessel_name, dwt, v_length
FROM "https://storage.googleapis.com/qm2/casa0025_ships.csv";

SELECT * FROM vinfo LIMIT 5;

Unnamed: 0,vesselid,vessel_name,dwt,v_length
0,256543,Omskiy 143,3104.0,108.0
1,276630,Omskiy- 127,3177.0,108.0
2,256167,Omskiy- 128,3174.0,108.0
3,265523,Omskiy- 206,2835.0,114.0
4,274272,Omskiy-108,3284.0,103.0


In [112]:
#Create table filtering only the first and last ping of the day for each vessel
#in order to avoid repetitive pings for the same vessel.
#Also, already filtered only vessels where sog is less than 1

%%sql

DROP TABLE IF EXISTS boats;

CREATE TABLE boats AS (
SELECT
vesselid,
CAST(date AS DATE) AS interaction_day,
MIN(date) AS min_date,
MAX(date) AS max_date,

FROM "https://storage.googleapis.com/qm2/casa0025_ships.csv"
WHERE sog <1
GROUP BY vesselid, interaction_day
);

SELECT COUNT(*) FROM boats

Unnamed: 0,count_star()
0,6366


In [102]:
#Create table with only geometry, converted to meters, and minimal info

%%sql

DROP TABLE IF EXISTS geometry;

CREATE TABLE geometry AS
SELECT vesselid, date, ST_TRANSFORM(ST_GEOMFROMTEXT(geom), 'EPSG:4326', --transform to meters instead of degrees
        'EPSG:3857') AS geom
FROM "https://storage.googleapis.com/qm2/casa0025_ships.csv"
WHERE sog <1;

CREATE INDEX geometry_index ON geometry USING RTREE(geom);

SELECT * FROM geometry LIMIT 5;

Unnamed: 0,vesselid,date,geom
0,350053,2022-07-25 03:09:37,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, ..."
1,350053,2022-07-25 03:13:58,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, ..."
2,350053,2022-07-25 04:16:06,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, ..."
3,350053,2022-07-25 05:20:17,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, ..."
4,350053,2022-07-25 06:23:57,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, ..."


In [75]:
%%sql
SELECT COUNT(*) FROM boats


Unnamed: 0,count_star()
0,86140


In [76]:
%%sql
SELECT COUNT(*) FROM geometry

Unnamed: 0,count_star()
0,86140


In [103]:
#Merge geometry and information before running the dwithin analysis.
#I will use only geometry of vessel at the first location in the morning.

%%sql

DROP TABLE IF EXISTS boats_final;

CREATE TABLE boats_final AS (
SELECT boats.vesselid, boats.interaction_day, boats.min_date, boats.max_date, geometry.geom
FROM boats
JOIN geometry
ON boats.min_date = geometry.date
AND boats.vesselid = geometry.vesselid
);

SELECT * FROM boats_final LIMIT 5;

CREATE INDEX boats_final_index ON boats_final USING RTREE(geom);

Unnamed: 0,vesselid,interaction_day,min_date,max_date,geom
0,350053,2022-07-26,2022-07-26 00:36:36,2022-07-26 22:19:15,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, ..."
1,350053,2022-07-27,2022-07-27 00:37:18,2022-07-27 23:35:22,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, ..."
2,350053,2022-07-29,2022-07-29 02:40:22,2022-07-29 23:01:13,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, ..."
3,350053,2022-07-30,2022-07-30 00:05:20,2022-07-30 08:52:21,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, ..."
4,350053,2022-08-05,2022-08-05 05:39:07,2022-08-05 14:20:00,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, ..."


## Step 2

Use a spatial join to identify ship-to-ship transfers in this dataset.
Two ships are considered to be conducting a ship to ship transfer IF:

* They are within 500 meters of each other
* For more than two hours
* And their speed is lower than 1 knot

Some things to consider: make sure you're not joining ships with themselves. Try working with subsets of the data first while you try different things out.

In [104]:
%%sql

SELECT
a1.vesselid AS ship1,
a2.vesselid AS ship2,
a1.min_date AS start,
a2.max_date AS end

FROM boats_final AS a1
JOIN boats_final AS a2
ON ST_DWITHIN(a1.geom, a2.geom, 500)
AND ABS(EXTRACT(EPOCH FROM (a2.max_date - a1.min_date))) > 7200
AND a1.vesselid < a2.vesselid;



FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

Unnamed: 0,ship1,ship2,start,end
0,64736,110326,2022-06-03 00:13:57,2022-06-18 14:28:53
1,110326,113330,2022-08-21 06:35:34,2022-08-08 23:18:19
2,110326,113330,2022-08-21 06:35:34,2022-08-09 09:29:31
3,110326,113330,2022-08-21 06:35:34,2022-08-07 22:53:20
4,113330,113737,2022-08-24 06:25:31,2022-07-05 23:34:05
...,...,...,...,...
181584,291496,13854153,2022-06-26 00:28:08,2022-07-24 23:12:48
181585,10972075,13854153,2022-08-17 11:09:13,2022-07-24 23:12:48
181586,272505,13854153,2022-08-04 10:41:05,2022-07-25 20:33:14
181587,291496,13854153,2022-06-26 00:28:08,2022-07-25 20:33:14


In [105]:
%%sql

SELECT
a1.vesselid AS ship1,
a2.vesselid AS ship2,
MIN(a1.min_date) AS start,
MAX(a2.max_date) AS end,
a1.interaction_day,
ABS(EXTRACT(EPOCH FROM (MAX(a2.max_date) - MIN(a1.min_date)))/3600) AS 'duration in hours'


FROM boats_final AS a1
JOIN boats_final AS a2
ON ST_DWITHIN(a1.geom, a2.geom, 500)
AND ABS(EXTRACT(EPOCH FROM (a2.max_date - a1.min_date))) > 7200
AND a1.vesselid < a2.vesselid
AND a1.interaction_day = a2.interaction_day
GROUP BY a1.vesselid, a2.vesselid, a1.interaction_day

ORDER BY a1.interaction_day;


Unnamed: 0,ship1,ship2,start,end,interaction_day,duration in hours
0,231063,307524,2022-06-01 01:04:23,2022-06-01 09:15:45,2022-06-01,8.189444
1,174064,276869,2022-06-01 06:20:52,2022-06-01 23:53:19,2022-06-01,17.540833
2,263073,336720,2022-06-01 00:47:01,2022-06-01 22:56:25,2022-06-01,22.156667
3,157634,162620,2022-06-01 01:02:02,2022-06-01 23:47:59,2022-06-01,22.765833
4,230124,279773,2022-06-01 00:23:52,2022-06-01 23:12:54,2022-06-01,22.817222
...,...,...,...,...,...,...
2060,263346,396946,2022-08-31 06:54:57,2022-08-31 19:16:59,2022-08-31,12.367222
2061,235461,396946,2022-08-31 00:52:20,2022-08-31 19:16:59,2022-08-31,18.410833
2062,255336,10984250,2022-08-31 18:35:13,2022-08-31 23:32:32,2022-08-31,4.955278
2063,176504,12874115,2022-08-31 01:28:43,2022-08-31 23:38:19,2022-08-31,22.160000
