# Loading the output from a Lagrangian simulation

In this notebook, we use a simple output file from a Lagrangian simulation to highlight the required steps to convert a dataset into the ragged array format that is used by the CloudDrift library. The example dataset is generated using this [tutorial](https://nbviewer.org/github/OceanParcels/parcels/blob/master/parcels/examples/tutorial_output.ipynb)
from the [Ocean Parcels](https://oceanparcels.org/) documentation. Although [OpenDrift](https://opendrift.github.io/) output format differs, a very similar approach could be use to create a ragged array for any type of Lagrangian simulation ouputs.

In [1]:
import numpy as np
import xarray as xr
from clouddrift import RaggedArray
from clouddrift.ragged import regular_to_ragged
from os.path import join

## Data

Numerical outputs from Lagrangian simulations are usually stored as bidimensional matrices. This particular example contains 13 trajectories released individually 2 hours apart.

In [2]:
folder = file = "../data/original/numerical/"
file = "Output.zarr"
ds = xr.open_zarr(join(folder, file))

In [3]:
ds

Unnamed: 0,Array,Chunk
Bytes,520 B,4 B
Shape,"(10, 13)","(1, 1)"
Dask graph,130 chunks in 2 graph layers,130 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 520 B 4 B Shape (10, 13) (1, 1) Dask graph 130 chunks in 2 graph layers Data type float32 numpy.ndarray",13  10,

Unnamed: 0,Array,Chunk
Bytes,520 B,4 B
Shape,"(10, 13)","(1, 1)"
Dask graph,130 chunks in 2 graph layers,130 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,520 B,4 B
Shape,"(10, 13)","(1, 1)"
Dask graph,130 chunks in 2 graph layers,130 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 520 B 4 B Shape (10, 13) (1, 1) Dask graph 130 chunks in 2 graph layers Data type float32 numpy.ndarray",13  10,

Unnamed: 0,Array,Chunk
Bytes,520 B,4 B
Shape,"(10, 13)","(1, 1)"
Dask graph,130 chunks in 2 graph layers,130 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.02 kiB,8 B
Shape,"(10, 13)","(1, 1)"
Dask graph,130 chunks in 2 graph layers,130 chunks in 2 graph layers
Data type,timedelta64[ns] numpy.ndarray,timedelta64[ns] numpy.ndarray
"Array Chunk Bytes 1.02 kiB 8 B Shape (10, 13) (1, 1) Dask graph 130 chunks in 2 graph layers Data type timedelta64[ns] numpy.ndarray",13  10,

Unnamed: 0,Array,Chunk
Bytes,1.02 kiB,8 B
Shape,"(10, 13)","(1, 1)"
Dask graph,130 chunks in 2 graph layers,130 chunks in 2 graph layers
Data type,timedelta64[ns] numpy.ndarray,timedelta64[ns] numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,520 B,4 B
Shape,"(10, 13)","(1, 1)"
Dask graph,130 chunks in 2 graph layers,130 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 520 B 4 B Shape (10, 13) (1, 1) Dask graph 130 chunks in 2 graph layers Data type float32 numpy.ndarray",13  10,

Unnamed: 0,Array,Chunk
Bytes,520 B,4 B
Shape,"(10, 13)","(1, 1)"
Dask graph,130 chunks in 2 graph layers,130 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


The output dataset used here contains 10 particles and 13 observations. Not every particle has 13 observations however; since particles are released at different times, some trajectories are shorter than others.

We can observe this by looking at the time matrix.

In [4]:
np.set_printoptions(linewidth=160)
ns_per_hour = np.timedelta64(1, "h")  # nanoseconds in an hour

print(ds["time"].data / ns_per_hour)

dask.array<truediv, shape=(10, 13), dtype=float64, chunksize=(1, 1), chunktype=numpy.ndarray>


By creating a ragged array, the resulting file is smaller since we do not have to store those `nan` values and keep the same number of observations per trajectory.

In [5]:
ds.close()

## Preprocessing

To pack the data into a ragged array, it's possible to create a preprocessing function and use the `RaggedArray.from_files()` class method, similar to the example in the `gdp.ipynb` notebook.
A faster alternative solution for numerical simulations is to manually create the required dictionnary to hold the dataset and to create the ragged array instance directly.

In [6]:
help(RaggedArray.__init__)

Help on function __init__ in module clouddrift.raggedarray:

__init__(self, coords: 'dict', metadata: 'dict', data: 'dict', attrs_global: 'dict | None' = {}, attrs_variables: 'dict | None' = {}, name_dims: 'dict[str, DimNames]' = {}, coord_dims: 'dict[str, str]' = {})
    Initialize self.  See help(type(self)) for accurate signature.



In [7]:
coords = {}
metadata = {}
data = {}
attrs_global = {}
attrs_variables = {}

In [8]:
ds.lon.values

array([[ 3000.   ,  9250.728, 15178.53 , 20637.297, 25446.793, 29530.29 , 33482.03 , 39987.598, 52016.75 , 62540.59 , 67807.93 , 71646.73 , 75938.414],
       [ 3000.   ,  9323.72 , 15390.491, 21139.074, 26617.256, 32358.955, 40099.16 , 51234.85 , 61792.926, 68916.88 , 74510.26 , 80026.21 ,       nan],
       [ 3000.   ,  9444.7  , 15730.222, 21897.203, 28193.936, 35385.695, 44646.52 , 55353.74 , 64614.508, 71806.24 , 78103.   ,       nan,       nan],
       [ 3000.   ,  9591.134, 16122.643, 22705.436, 29655.143, 37594.62 , 46994.074, 56932.133, 65754.09 , 73208.3  ,       nan,       nan,       nan],
       [ 3000.   ,  9741.477, 16503.77 , 23423.857, 30785.238, 38984.81 , 48125.297, 57481.027, 66081.984,       nan,       nan,       nan,       nan],
       [ 3000.   ,  9880.081, 16834.746, 23994.904, 31575.1  , 39796.71 , 48621.96 , 57549.23 ,       nan,       nan,       nan,       nan,       nan],
       [ 3000.   ,  9998.321, 17100.443, 24415.76 , 32091.506, 40239.49 , 48782.582,    

In [9]:
# decode_times=False to get time data and not datetime conversion
ds = xr.open_dataset(join(folder, file), engine="zarr")

# dimension name
name_dims = {"rows": "rows", "obs": "obs"}

# identify indices of finite values
finite_values = np.isfinite(ds["lon"])
idx_finite = np.where(finite_values)

# number of observations per trajectory
rowsize = np.bincount(idx_finite[0]).astype("int32")

# unique trajectory identification
unique_id = np.unique(ds.trajectory.values[idx_finite[0]]).astype("int32")

# coordinates
coords["time"] = np.tile(ds.time.data, (ds.dims["trajectory"], 1))[
    idx_finite
]  # reshape to 2D to get ragged time
coords["ids"] = np.repeat(unique_id, rowsize)

# mapping from coordinates to dimension
coord_dims = {"time": "obs", "ids": "obs"}

# metadata variables
metadata["rowsize"] = rowsize
metadata["id"] = unique_id

# data variable
# transform to ragged array using helper function
data["lon"], rowsize = regular_to_ragged(ds.lon.values)
data["lat"], _ = regular_to_ragged(ds.lat.values)
data["z"], _ = regular_to_ragged(ds.z.values)

# attributes for each variable
attrs_variables = {
    "id": {"long_name": "Trajectory id", "units": "-"},
    "time": {"axis": "T", "long_name": "time", "standard_name": "time"},
    "lon": {"axis": "X", "long_name": "longitude", "units": "degrees_east"},
    "lat": {"axis": "Y", "long_name": "latitude", "units": "degrees_north"},
    "ids": {
        "long_name": "Trajectory identification number repeated along observations",
        "units": "-",
    },
    "rowsize": {
        "long_name": "Number of observations per trajectory",
        "sample_dimension": "obs",
        "units": "-",
    },
}

# keep original global attributes
attrs_global = {
    "Conventions": "CF-1.6/CF-1.7",
    "feature_type": "trajectory",
    "ncei_template_version": "NCEI_NetCDF_Trajectory_Template_v2.0",
    "parcels_mesh": "flat",
    "parcels_version": "2.4.0",
}

ds.close()

  coords["time"] = np.tile(ds.time.data, (ds.dims["trajectory"], 1))[


In [10]:
ra = RaggedArray(
    coords, metadata, data, attrs_global, attrs_variables, name_dims, coord_dims
)

In [11]:
ds = ra.to_xarray()
ds

And we can also rewrite the dataset as a ragged array in a NetCDF file as an example:

In [12]:
ra.to_netcdf("../data/process/Output.nc")