<a href="https://colab.research.google.com/github/fiorellaguillen/CASA0025/blob/main/notebooks/W04_postgis2_try3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Ship-to-Ship Transfer Detection

Now for a less structured exercise. We're going to look at ship-to-ship transfers. The idea is that two ships meet up in the middle of the ocean, and one ship transfers cargo to the other. This is a common way to avoid sanctions, and is often used to transfer oil from sanctioned countries to other countries. We're going to look at a few different ways to detect these transfers using AIS data.

In [121]:
%pip install duckdb duckdb-engine jupysql



In [122]:
import duckdb
import pandas as pd

# Import jupysql Jupyter extension to create SQL cells
%load_ext sql
%config SqlMagic.autopandas = True
%config SqlMagic.feedback = False
%config SqlMagic.displaycon = False
%sql duckdb:///:memory:

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


In [123]:
%%sql
INSTALL httpfs;
LOAD httpfs;
INSTALL spatial;
LOAD spatial;

Unnamed: 0,Success


## Step 1

Create a spatial database using the following AIS data:

https://storage.googleapis.com/qm2/casa0025_ships.csv

Each row in this dataset is an AIS 'ping' indicating the position of a ship at a particular date/time, alongside vessel-level characteristics.

It contains the following columns:
* `vesselid`: A unique numerical identifier for each ship, like a license plate
* `vessel_name`: The ship's name
* `vsl_descr`: The ship's type
* `dwt`: The ship's Deadweight Tonnage (how many tons it can carry)
* `v_length`: The ship's length in meters
* `draught`: How many meters deep the ship is draughting (how low it sits in the water). Effectively indicates how much cargo the ship is carrying
* `sog`: Speed over Ground (in knots)
* `date`: A timestamp for the AIS signal
* `lat`: The latitude of the AIS signal (EPSG:4326)
* `lon`: The longitude of the AIS signal (EPSG:4326)

Create a table called 'ais' where each row is a different AIS ping, with no superfluous information. Construct a geometry column.

Create a second table called 'vinfo' which contains vessel-level information with no superfluous information.

You can set a spatial index on each of these tables as follows:

`CREATE INDEX index_name ON table_name USING RTREE(geom);`

In [124]:
#visualize all data and its structure

%%sql

SELECT * FROM "https://storage.googleapis.com/qm2/casa0025_ships.csv" ;

Unnamed: 0,vesselid,vessel_name,vsl_descr,dwt,v_length,draught,sog,date,lat,lon,geom
0,350053,30 Let Pobedy,general cargo,5150.0,,3.5,5.2,2022-07-25 02:53:29,45.151777,36.513327,POINT (36.5133266666667 45.1517766666667)
1,350053,30 Let Pobedy,general cargo,5150.0,,3.5,0.7,2022-07-25 03:09:37,45.146487,36.520780,POINT (36.52078 45.1464866666667)
2,350053,30 Let Pobedy,general cargo,5150.0,,3.5,0.7,2022-07-25 03:13:58,45.146218,36.521965,POINT (36.521965 45.1462183333333)
3,350053,30 Let Pobedy,general cargo,5150.0,,3.5,0.1,2022-07-25 04:16:06,45.145058,36.522020,POINT (36.52202 45.1450583333333)
4,350053,30 Let Pobedy,general cargo,5150.0,,3.5,0.0,2022-07-25 05:20:17,45.144933,36.521848,POINT (36.5218483333333 45.1449333333333)
...,...,...,...,...,...,...,...,...,...,...,...
101323,217531,Zubeyde,roll on roll off with container capacity,5000.0,113.0,4.5,0.1,2022-08-10 14:16:47,45.091987,36.522157,POINT (36.5221566666667 45.0919866666667)
101324,217531,Zubeyde,roll on roll off with container capacity,5000.0,113.0,4.5,0.1,2022-08-10 14:43:48,45.091643,36.522213,POINT (36.5222133333333 45.0916433333333)
101325,217531,Zubeyde,roll on roll off with container capacity,5000.0,113.0,4.5,5.8,2022-08-10 15:04:28,45.100457,36.519397,POINT (36.5193966666667 45.1004566666667)
101326,217531,Zubeyde,roll on roll off with container capacity,5000.0,113.0,4.5,8.3,2022-08-23 06:06:51,45.087527,36.506987,POINT (36.5069866666667 45.0875266666667)


In [125]:
#create table with information of vessels that won't change

%%sql

DROP TABLE IF EXISTS vinfo;

CREATE TABLE vinfo AS
SELECT DISTINCT vesselid, vessel_name, dwt, v_length
FROM "https://storage.googleapis.com/qm2/casa0025_ships.csv";

SELECT * FROM vinfo LIMIT 5;

Unnamed: 0,vesselid,vessel_name,dwt,v_length
0,350053,30 Let Pobedy,5150.0,
1,323648,A Line,12259.0,109.0
2,213151,Absheron,3344.0,116.0
3,330665,Adafera,105215.0,226.0
4,1925911,Aeolian Victory,82152.0,223.0


In [126]:
%%sql

DROP TABLE IF EXISTS boats_full;

CREATE TABLE boats_full AS (
    SELECT
        vesselid,
        CAST(date AS DATE) AS interaction_day,
        date AS timestamp,
        ST_TRANSFORM(ST_GEOMFROMTEXT(geom), 'EPSG:4326', 'EPSG:3857') AS geom
    FROM "https://storage.googleapis.com/qm2/casa0025_ships.csv"
    WHERE sog < 1
);

CREATE INDEX boats_full_index ON boats_full USING RTREE(geom);

Unnamed: 0,Success


In [127]:
%%sql

SELECT
    a1.vesselid AS ship1,
    a2.vesselid AS ship2,
    MIN(a1.timestamp) AS start_time,
    MAX(a2.timestamp) AS end_time,
    a1.interaction_day,
    ROUND(EXTRACT(EPOCH FROM (MAX(a2.timestamp) - MIN(a1.timestamp))) / 3600, 2) AS "duration in hours"
FROM boats_full AS a1
JOIN boats_full AS a2
ON ST_DWITHIN(a1.geom, a2.geom, 500)  -- Verifica proximidad en cada momento del día
AND a1.vesselid < a2.vesselid  -- Evita comparaciones con el mismo barco
AND a1.interaction_day = a2.interaction_day  -- Solo comparamos barcos en el mismo día
GROUP BY a1.vesselid, a2.vesselid, a1.interaction_day
HAVING ROUND(EXTRACT(EPOCH FROM (MAX(a2.timestamp) - MIN(a1.timestamp))) / 3600, 2) > 2  -- Filtra solo interacciones > 2 horas
ORDER BY a1.interaction_day;


FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

Unnamed: 0,ship1,ship2,start_time,end_time,interaction_day,duration in hours
0,174064,276869,2022-06-01 06:20:52,2022-06-01 23:53:19,2022-06-01,17.54
1,263073,336720,2022-06-01 00:47:01,2022-06-01 22:56:25,2022-06-01,22.16
2,263073,352210,2022-06-01 12:41:04,2022-06-01 23:02:43,2022-06-01,10.36
3,231502,268188,2022-06-01 21:23:16,2022-06-01 23:35:51,2022-06-01,2.21
4,269876,272439,2022-06-01 00:33:32,2022-06-01 23:52:51,2022-06-01,23.32
...,...,...,...,...,...,...
3563,235461,396946,2022-08-31 00:52:20,2022-08-31 19:16:59,2022-08-31,18.41
3564,144270,307524,2022-08-31 00:37:00,2022-08-31 23:48:49,2022-08-31,23.20
3565,255336,10984250,2022-08-31 18:35:13,2022-08-31 23:32:32,2022-08-31,4.96
3566,144270,352014,2022-08-31 00:37:00,2022-08-31 23:26:26,2022-08-31,22.82


In [128]:
%%sql
SELECT
    a1.vesselid AS ship1,
    a2.vesselid AS ship2,
    COUNT(DISTINCT a1.interaction_day) AS interaction_count  -- Cuenta las veces que se encontraron
FROM boats_full AS a1
JOIN boats_full AS a2
ON ST_DWITHIN(a1.geom, a2.geom, 500)
AND a1.vesselid < a2.vesselid
AND a1.interaction_day = a2.interaction_day
GROUP BY a1.vesselid, a2.vesselid
ORDER BY interaction_count DESC;

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

Unnamed: 0,ship1,ship2,interaction_count
0,162620,336720,25
1,151024,336720,22
2,157634,336720,15
3,151024,162620,13
4,276883,13354187,13
...,...,...,...
3045,312965,327047,1
3046,272505,296740,1
3047,272505,277800,1
3048,327047,10954288,1


## Step 2

Use a spatial join to identify ship-to-ship transfers in this dataset.
Two ships are considered to be conducting a ship to ship transfer IF:

* They are within 500 meters of each other
* For more than two hours
* And their speed is lower than 1 knot

Some things to consider: make sure you're not joining ships with themselves. Try working with subsets of the data first while you try different things out.