# dark-vessel-hunter
DTU Deep Learning project 29, group 80


### Run this in your terminal before executing this:

In [1]:
pip install -r requirements.txt

Note: you may need to restart the kernel to use updated packages.


## Import of the files

In [2]:
import ais_downloader
import ais_filtering
# import ais_to_parquet



## Data setup
### Set data preferences

In [3]:
START_DATE = "2025-11-01"
END_DATE   = "2025-11-03"

FOLDER_NAME = "ais-data"

DELETE_DOWNLOADED_CSV = False

### Imports for the script

In [4]:
from tqdm import tqdm
from pathlib import Path
import pandas as pd
from datetime import date, timedelta

### Script

In [5]:
# --- Create folder path ---
folder_path = Path(FOLDER_NAME)
folder_path.mkdir(parents=True, exist_ok=True)

# --- If you want to download all csv files before, uncomment the line below ---
# ais_downloader.download_multiple_ais_data(START_DATE, END_DATE, folder_path)

# --- Build the schedule of download string dates ---
dates = ais_downloader.get_work_dates(START_DATE, END_DATE, folder_path, filter=False)

# --- Define separator for conflicting data ---
separator = " | "

# --- Iterate with tqdm and download, unzip and delete ---
for day in tqdm(dates, desc=f"Processing data", unit="file" ):
    tag = f"{day:%Y-%m}" if day < date.fromisoformat("2024-03-01") else f"{day:%Y-%m-%d}"
    print(f"\nProcessing date: {tag}")

    # --- Download one day ---
    csv_path = ais_downloader.download_one_ais_data(day, folder_path)
    
    # --- Load CSV into DataFrame ---
    df_day = pd.read_csv(csv_path)
    # --- Optionally delete the downloaded CSV file ---
    if DELETE_DOWNLOADED_CSV: csv_path.unlink(missing_ok=True)
    
    # --- Filter and split ---
    df_filtered = ais_filtering.df_filter(df_day, verbose_mode=True, polygon_filter=True)
    # print(df_filtered.head()) # For debugging purposes to see the filtered data
    df_static, df_dynamic = ais_filtering.split_static_dynamic(df_filtered, join_conflicts=True, sep=separator)
    
    # --- Save to parquet ---
    # ais_to_parquet.save_by_mmsi(df_static, df_dynamic, folder_path, tag)

Processing data:   0%|          | 0/3 [00:00<?, ?file/s]


Processing date: 2025-11-01
Skipping 2025-11-01 download: already present in ais-data folder
Before filtering: 16,522,105 rows, 3,462 unique vessels
 Initial filtering complete: 9,341,096 rows, 3,205 unique vessels
 Bounding box filtering complete: 535,908 rows, 238 unique vessels


Processing data:  33%|███▎      | 1/3 [00:34<01:09, 34.68s/file]

 Polygon filtering complete: 276,111 rows, 176 unique vessels
Split complete:
   Static:  176 unique vessels with 10 columns
   Dynamic: 276,111 AIS messages with 13 columns
  Static conflicts: Width (1), Type of position fixing device (1)

Processing date: 2025-11-02
Skipping 2025-11-02 download: already present in ais-data folder
Before filtering: 15,826,904 rows, 3,259 unique vessels
 Initial filtering complete: 8,943,205 rows, 3,042 unique vessels
 Bounding box filtering complete: 500,728 rows, 225 unique vessels
 Polygon filtering complete: 259,909 rows, 152 unique vessels


Processing data:  67%|██████▋   | 2/3 [01:09<00:34, 34.97s/file]

Split complete:
   Static:  152 unique vessels with 10 columns
   Dynamic: 259,909 AIS messages with 13 columns
  Static conflicts: Type of position fixing device (1)

Processing date: 2025-11-03
Starting download and extraction for 2025-11-03


Downloading 2025-11-03 zip file: 100%|██████████| 540M/540M [01:19<00:00, 7.14MB/s]
Unzipping into ais-data folder : 100%|██████████| 1/1 [00:05<00:00,  5.39s/it]


Completed download and extraction for 2025-11-03
Before filtering: 16,050,529 rows, 3,151 unique vessels
 Initial filtering complete: 8,996,859 rows, 2,933 unique vessels
 Bounding box filtering complete: 526,788 rows, 227 unique vessels
 Polygon filtering complete: 278,910 rows, 153 unique vessels


Processing data: 100%|██████████| 3/3 [03:10<00:00, 63.49s/file]

Split complete:
   Static:  153 unique vessels with 10 columns
   Dynamic: 278,910 AIS messages with 13 columns
  Static conflicts: Type of position fixing device (4)



