In [1]:
import glob
import os
import torch
import numpy as np
import xarray as xr
import netCDF4 as nc
import pandas as pd
import simpy as sp
import logging
import multiprocessing as mp
from multiprocessing import Pool

In [2]:
import cdsapi

In [3]:
from matplotlib import pyplot as plt
import cartopy.crs as ccrs
import cartopy.feature as cfeature

In [4]:
os.chdir('..')

In [5]:
%load_ext autoreload
%autoreload 2

In [None]:
from datasets.retrieval import grab_df_from_era5_csvs
from datasets.km_dataset import DeepKoopmanDataset
from algos.km_classical import ClassicalKoopmanAnalysis
from algos.km_havoc import HAVOCAnalysis
from algos.km_net import DeepKoopmanWrapper

In [7]:
os.environ["KMP_DUPLICATE_LIB_OK"] = "TRUE"

# Koopman Forecast of Maritime Weather for Cargo Ship Route Optimization

## ERA5 Around Hamburg Harbor

ERA5 is a dataset which provides hourly maritime weather data.<br>
Its API does only provide single point measurements, but we can play a little and grab this data for multiple points all together.

![Map with coordinated of HH harbor with midpoint of the grid.](../img/map_grid_hhh.png)<br>
*Coastal map. The blue point is at coordinates lon 10 and lat 53.5, around Hamburg Harbor.*<br>
*The red dot is representing the initial radius taken when data mining the region of interest.*<br>
*One can see that that the mean on that point needs to be far more at the caostal line to get sufficiant forecasting!*

In [8]:
# params
data_path = './data'
center_lon, center_lat = 5.5, 56.5
radius = 4.5
step = 0.5
start_date = '2025-02-01'
end_date = '2025-09-30'

output_dir = os.path.join(data_path, f"{start_date}--{end_date}")
lon_vals = np.arange(center_lon - radius, center_lon + radius + step, step)
lat_vals = np.arange(center_lat - radius, center_lat + radius + step, step)

In [9]:
df = grab_df_from_era5_csvs(output_dir, lat_vals, lon_vals)
df

Unnamed: 0,time,u100,v100,u10,v10,d2m,t2m,msl,sst,skt,sp,ssrd,strd,tp,lat,lon,mwd,mwp,swh
0,2025-02-01 00:00:00,-1.227249,4.551605,-0.971558,2.206314,276.52350,277.55730,103012.190,,275.89688,102521.016,0.0,987721.60,0.000000e+00,52.0,1.0,,,
1,2025-02-01 01:00:00,-1.344467,4.589905,-1.027405,2.290192,276.49360,277.50160,103025.500,,275.48804,102534.330,0.0,1026047.44,0.000000e+00,52.0,1.0,,,
2,2025-02-01 02:00:00,-1.250153,4.995331,-0.835266,2.517288,276.37183,277.37183,103046.810,,275.55533,102554.920,0.0,1021265.70,0.000000e+00,52.0,1.0,,,
3,2025-02-01 03:00:00,-1.073959,5.168274,-0.689453,2.575562,275.93164,276.87340,103048.560,,275.18945,102556.310,0.0,958170.00,4.768372e-07,52.0,1.0,,,
4,2025-02-01 04:00:00,-0.700256,5.065414,-0.453903,2.541901,275.88983,276.80780,103031.750,,275.37950,102539.140,0.0,979850.20,0.000000e+00,52.0,1.0,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2096683,2025-09-30 19:00:00,-1.078003,3.887268,-0.837814,1.905701,277.56770,278.32910,103420.125,,278.16960,95322.080,0.0,1168652.20,3.099441e-05,61.0,10.0,,,
2096684,2025-09-30 20:00:00,-0.953369,3.793167,-0.762024,1.887436,277.37200,277.93220,103431.625,,278.13623,95328.805,0.0,1155937.60,2.861023e-05,61.0,10.0,,,
2096685,2025-09-30 21:00:00,-0.952301,3.694656,-0.779129,1.820908,277.21252,277.76376,103446.375,,277.84314,95336.560,0.0,1173570.80,3.004074e-05,61.0,10.0,,,
2096686,2025-09-30 22:00:00,-0.584839,2.949158,-0.589203,1.509872,277.02023,277.70374,103458.060,,277.23773,95322.200,0.0,1146051.50,3.337860e-05,61.0,10.0,,,


## Classical Koopman Analysis

When looking into stadard Fourier Analysis, we describe a linear system between the actual oscillators and the desired output space, minimizing the squared error:

$\begin{equation}E\left(K, \omega\right) = \sum_{t=0}^{T-1} \left(x - K \Omega(\omega t) \right)^2\end{equation}$

With $E$ being the actual error function between our ground truth $x$ and the oscillator $\Omega(\omega', t)$.


The oscillator $\Omega(\omega t)$ is defined in the same manner as in the Fourier wavelets:

$
\begin{equation}
\Omega(\omega t) = \begin{pmatrix} 
    \sin(\omega_1 t) \\
    \vdots \\
    \sin(\omega_N t) \\
    \cos(\omega_1 t) \\
    \vdots \\
    \cos(\omega_N t)
\end{pmatrix}
\end{equation}
$

In Koopman Analysis, however, we look into any type of function $f_{\Theta}$, mostly non-linear or at least quasi-linear:

$\begin{equation}E\left(f_{\Theta'}(\Omega(\omega', t)) | x, \Theta', \omega'\right) = \sum_{t=0}^{T-1} \left(x - f_{\Theta}(\Omega(\omega' t), \Theta') \right)^2\end{equation}$

Furthermore, from this error function, we can derive some type of pseudo $log$-likelihood, using any kind of error function for oscillators and periodic frequency elements:

$\begin{equation}\log L\left(\Theta', \omega'\right) = - E\left(f_{\Theta'}(\Omega(\omega' t)) | x, \Theta', \omega'\right) \end{equation}$

In this case we optimize our non-liearity $\Theta' \rightarrow \Theta$ and our frequencies $\omega' \rightarrow \omega$.

Here for such a pseudo-likelihood, we would use a softmax function over all samples $n$ and for all target dimensions $d$ to guarantee a distribution:

$\begin{equation}L\left(\Theta', \omega'\right) = \frac{\exp\left(\log L_{n, d}\left(\Theta', \omega'\right)\right)}
{\sum_{n=0}^{N-1} \sum_{\forall d} \exp\left(\log L_{n, d}\left(\Theta, \omega'\right)\right)} \end{equation}$




## HAVOK

The Hankel Anlternative View of Koopman (HAVOK) does try to enforce a quasi-linear simulation for quasi-chaotic real-world systems using the Hankel method and Koopman operators.

As we have seen Koopman in the topic above in action and theory, we can just jump right ahead into what the Hankel view does to such a system and how it helps stabilize and optimize out Koopman maritime weather forecasting.