# AIS Data Ingestion 

Vessel locations data is ingested from the Automatic Identification System (AIS) data available from the federal [Marine Cadastre website](https://hub.marinecadastre.gov/pages/vesseltraffic), and is processed in the following steps:
- read data from csv urls 
- drop unnessary columns
- filter to only include cargo vessels
- cast datatypes appropriately
- append to a monthly file for storage
- save monthly files to parquet

Descriptions of each column of the raw data are available at the [AIS Data Dictionary](https://coast.noaa.gov/data/marinecadastre/ais/data-dictionary.pdf).

In [1]:
#preliminaries
import numpy as np
import pandas as pd
import polars as pl

#enable string cache for polars categoricals
pl.enable_string_cache()
#display settings
pd.set_option('display.max_columns', None)

In [2]:
#init globals

#dates
years = pl.arange(2015,2025,eager=True)
months = pl.arange(1,13,eager=True)
days = pl.arange(1,32,eager=True)

#vessel types - includes cargo and tanker types
cargo_types = pl.arange(70,90,eager=True)

#monthly df
month_df = pl.DataFrame()

In [3]:
#loop through years
for year in years:
    #loop through months
    for month in months:
        #loop through days
        for day in days:
            #load from url to pandas df
            try:
                day_df = (
                    pd.read_csv(f'https://coast.noaa.gov/htdata/CMSP/AISDataHandler/{year}/AIS_{year}_{month:02d}_{day:02d}.zip')
                )
                print(f'Download complete for {year}_{month}_{day}.')
            except:
                print(f'Invalid URL for {year}_{month}_{day} - Date may be invalid or file may not exist.')
                continue
            #convert to polars ;)
            day_df = pl.DataFrame(day_df)
            #process data
            day_df = (
                day_df
                #keep only cargo vessels
                .filter(pl.col('VesselType').is_in(cargo_types))
                #keep cols of interest
                .select('MMSI', 'BaseDateTime','LAT', 'LON', 'SOG', 'COG', 
                        'Heading', 'Status', 'VesselName', 'VesselType', 'IMO',
                        'Length', 'Width', 'Draft','Cargo')
                #give pythonic names
                .rename({
                    'MMSI':'mmsi',
                    'BaseDateTime':'time',
                    'LAT':'lat',
                    'LON':'lon',
                    'SOG':'speed',
                    'COG':'course',
                    'Heading':'heading',
                    'Status':'status',
                    'VesselName':'vessel_name',
                    'VesselType':'vessel_type',
                    'IMO':'imo',
                    'Length':'length',
                    'Width':'width',
                    'Draft':'draft',
                    'Cargo':'cargo'
                })
                #clean cols
                .with_columns(
                    #strip IMO prefix and cast to int
                    imo = pl.col('imo').str.strip_prefix('IMO').cast(pl.Int64),
                    #clean course and heading 
                    course = pl.col('course').replace(360.0,None),
                    heading = pl.col('heading').replace(511.0,None)
                )
                #cast
                .cast({
                    'time':pl.Datetime,
                    'vessel_name':pl.Categorical
                })
            )
            #concat and deduplicate
            month_df = pl.concat([month_df,day_df], how='diagonal').unique()
        #save monthly data
        month_df.write_parquet(f'data/ais_clean/{year}_{month}.parquet')
        print(f'{year}_{month} file saved to parquet.')

Download complete for 2015_1_1.
Download complete for 2015_1_2.
Download complete for 2015_1_3.
Download complete for 2015_1_4.
Download complete for 2015_1_5.
Download complete for 2015_1_6.
Download complete for 2015_1_7.
Download complete for 2015_1_8.
Download complete for 2015_1_9.
Download complete for 2015_1_10.
Download complete for 2015_1_11.
Download complete for 2015_1_12.
Download complete for 2015_1_13.
Download complete for 2015_1_14.
Download complete for 2015_1_15.
Download complete for 2015_1_16.
Download complete for 2015_1_17.
Download complete for 2015_1_18.
Download complete for 2015_1_19.
Download complete for 2015_1_20.
Download complete for 2015_1_21.
Download complete for 2015_1_22.
Download complete for 2015_1_23.
Download complete for 2015_1_24.
Download complete for 2015_1_25.
Download complete for 2015_1_26.
Download complete for 2015_1_27.
Download complete for 2015_1_28.
Download complete for 2015_1_29.
Download complete for 2015_1_30.
Download complete f