# Loading the output from a Lagrangian simulation

In this notebook, we use a simple output file from a Lagrangian simulation to highlight the required steps to convert a dataset into the ragged array format that is used by the CloudDrift library. The example dataset comes in a format that is similar that of [Ocean Parcels](https://oceanparcels.org/) and [OpenDrift](https://opendrift.github.io/).

In [1]:
import numpy as np
import xarray as xr
from clouddrift import RaggedArray

## Download

In [2]:
import os
from os.path import isfile, join, exists
import urllib.request

folder = "../data/raw/numerical/"
file = "example.nc"
os.makedirs(folder, exist_ok=exists(folder))  # create raw data folder

if not isfile(join(folder, file)):
    url = "https://zenodo.org/record/6310460/files/global-marine-litter-2021.nc"
    print(f"Downloading ~1.1GB from {url}.")
    req = urllib.request.urlretrieve(url, join(folder, file))
    print(f"Dataset saved at {join(folder, file)}")
else:
    print(f"Dataset already at {join(folder, file)}.")

Dataset already at ../data/raw/numerical/example.nc.


## Data

Numerical outputs from Lagrangian simulations are usually stored as bidimensional matrices. This particular example contains 387,600 trajectories saved at daily intervals during the year 2021.

In [3]:
ds = xr.open_dataset(join(folder, file), decode_times=False)

In [4]:
ds

 At the beginning of each month, 32,300 particles are released, and trajectories are padded with `nan` before their release date.

In [5]:
ds.close()

## Preprocessing

To pack the data into a ragged array, it's possible to create a preprocessing function and use the `RaggedArray.from_files()` class method, similar to the example in the `gdp.ipynb` notebook.
A faster alternative solution for numerical simulations is to manually create the required dicts to hold the dataset and to create the ragged array instance directly.

Note that this dataset does not contain variables other than `time`, `lon`, `lat`, and `ids`.
We initialize an empty `data` dict nevertheless.

In [6]:
coords = {}
metadata = {}
data = {}
attrs_global = {}
attrs_variables = {}

In [None]:
# decode_times=False to get time data and not datetime conversion
ds = xr.open_dataset(join(folder, file), decode_times=False)

finite_values = np.isfinite(ds['lon'])
idx_finite = np.where(finite_values)

rowsize = np.bincount(idx_finite[0]).astype('int32')
unique_id = np.unique(idx_finite[0]).astype('int32')

# coordinates
coords["time"] = np.tile(ds.time.data, (ds.dims['traj'],1))[idx_finite]  # reshape to 2D to get ragged time
coords["lon"] = ds.lon.data[idx_finite].astype('float32')
coords["lat"] = ds.lat.data[idx_finite].astype('float32')
coords["ids"] = np.repeat(unique_id, rowsize)

# metadata variables
metadata["rowsize"] = rowsize
metadata["ID"] = unique_id

# attributes for each variable
attrs_variables = {
    "ID": {'long_name': 'Trajectory id', 'units':'-'},
    "time": {'long_name': 'Time in days', 'units': 'days since 2021-01-01'}, 
    "lon": {'long_name': 'longitude', 'units': 'degrees_east'}, 
    "lat": {'long_name': 'latitude', 'units': 'degrees_north'}, 
    "ids": {'long_name': 'Trajectory identification number repeated along observations', 'units': '-'},
    "rowsize": {'long_name': 'Number of observations per trajectory', 'sample_dimension': 'obs', 'units':'-'},
}

# 
attrs_global={
    'title': 'Marine Litter 2021',
    'institution': 'Florida State University Center for Ocean-Atmospheric Prediction Studies (COAPS)'
}

ds.close()

In [None]:
ra = RaggedArray(coords, metadata, data, attrs_global, attrs_variables)