### 0. Thoughts and ideas 

#### 0.1 General

One possible strategy:
- Treat the prediction of future AIS data as a prediction task itself (X: Historic AIS and positions, Y: AIS data in next timestep) and create a model for this
- Use the predicted AIS data as well as historic AIS data and positions to predict new position. 


Another:
- Let a model use the previous timesteps to predict all information about the next timestep.

#### 0.2 About the datasets

#### 0.3 Research


##### Article evaluating several models to predict ship trajectories

Definitions:
- Ship trajectory is the sequence if timestamped points Pi = {Ti, LATi, LONi, SOGi, COGi}


Methodology:
- Information from the first four timestamps are used to predict the next.
- Implemented using a Pytorch framework
- Use ADAM as optimizer
- Use the following hyperparameters: Learning rate: 0.0001, epoch: 100, dropout: 0.5, Hidden size:128 (15), input/output dimensions: 2 and hidden layer: 1


Interesting points
- "Deep learning exhibits remarkable performance in AIS data-driven ship trajectory prediction"
- "Deep learning are in general better than machine learning for this application"
- Transformer, BI-GRU and GRU performs the best, transformer only outperforms on medium sized datasets
- SVR is the best machine learning algorithm

##### Brainstorming - 18.09.2024

AIS - data:
- Parameters that intuitively give us the next position (COG, SOG and ROT), (current position, ETARAW and PortID)
- Should try merging navstat codes used to describe the same activity



General:
- Should somehow allow the algorithm to keep the last values - research different strategies (CNN or LSTM?)
- Might want to use a classifier to predict features
- Ship-ID is probably a pointless input for the classifier


##### Other


- Gustav & co brukte autogluon: https://auto.gluon.ai/stable/index.html


In [18]:
# IMPORTS

import numpy as np
import pandas as pd
import xgboost as xgb

### 1. Data 

#### 1.1 Load data into dataframes

In [19]:



ais_test_data_path = '../../Project materials/ais_test.csv'
ais_train_data_path = '../../Project materials/ais_train.csv'

ais_data_test = pd.read_csv(ais_test_data_path)
ais_data_train = pd.read_csv(ais_train_data_path, sep='|')

ais_data_test.head()
ais_data_train.head()

Unnamed: 0,time,cog,sog,rot,heading,navstat,etaRaw,latitude,longitude,vesselId,portId
0,2024-01-01 00:00:25,284.0,0.7,0,88,0,01-09 23:00,-34.7437,-57.8513,61e9f3a8b937134a3c4bfdf7,61d371c43aeaecc07011a37f
1,2024-01-01 00:00:36,109.6,0.0,-6,347,1,12-29 20:00,8.8944,-79.47939,61e9f3d4b937134a3c4bff1f,634c4de270937fc01c3a7689
2,2024-01-01 00:01:45,111.0,11.0,0,112,0,01-02 09:00,39.19065,-76.47567,61e9f436b937134a3c4c0131,61d3847bb7b7526e1adf3d19
3,2024-01-01 00:03:11,96.4,0.0,0,142,1,12-31 20:00,-34.41189,151.02067,61e9f3b4b937134a3c4bfe77,61d36f770a1807568ff9a126
4,2024-01-01 00:03:51,214.0,19.7,0,215,0,01-25 12:00,35.88379,-5.91636,61e9f41bb937134a3c4c0087,634c4de270937fc01c3a74f3


In [20]:
#Create dataframes for each ship-id:

ship_train_groups = ais_data_train.groupby('vesselId')
ship_train_dataframes = {ship_id: group for ship_id, group in ship_train_groups}

#Split data into input and output. Input can now be accessed as ship_dataframes[shipID][0] and output as ship_dataframes[shipID][1]

# for key in ship_train_dataframes:
#     ship_train_dataframes[key] = [ship_train_dataframes[key].drop(columns=['latitude', 'longitude']), ship_train_dataframes[key][['latitude', 'longitude']]]


# print(ship_train_dataframes)

#### 1.2 Split data into X and Y

### 2. Try to create predictions using simple models:

#### 2.1 XG-boost

In [21]:
# xgb_simple = xgb.XGBClassifier()


# for key in ship_train_dataframes:
#     xgb_simple.fit(ship_train_dataframes[key][0], ship_train_dataframes[key][1])

### 3. Attempting to implement a similar approach as in the article: 

In [22]:
#Imports

import torch
import torch.nn as nn
from torch.utils.data import DataLoader, Dataset
import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler

#### 3.1 Preprocess data into timeseries

In [27]:
all_timeseries = []
scaler = MinMaxScaler()

sequence_length = 5


for ship_id, df in ship_train_dataframes.items():
    df['time'] = pd.to_datetime(df['time'])

    df['hour'] = df['time'].dt.hour
    df['minute'] = df['time'].dt.minute
    df['second'] = df['time'].dt.second

    features = df[['hour', 'minute', 'second', 'longitude', 'latitude', 'sog', 'cog']].values

    features_normalized = scaler.fit_transform(features)

    for i in range(len(features_normalized) - sequence_length):
        timeseries = features_normalized[i:i+sequence_length+1]
        all_timeseries.append(timeseries)


all_timeseries = np.array(all_timeseries)

X_data = all_timeseries[:, :-1, :]
Y_data = all_timeseries[:, -1, :]




print(all_timeseries.shape)
print(X_data.shape)
print(Y_data.shape)

(1518629, 6, 7)
(1518629, 5, 7)
(1518629, 7)
1518630


#### 3.2 GRU - model