# Loading the output from a Lagrangian simulation

In this notebook, we use a simple output file from a Lagrangian simulation to highlight the required steps to convert a dataset into the ragged array format that is used by the CloudDrift library. The example dataset is generated using this [tutorial](https://nbviewer.org/github/OceanParcels/parcels/blob/master/parcels/examples/tutorial_output.ipynb)
from the [Ocean Parcels](https://oceanparcels.org/) documentation. Although [OpenDrift](https://opendrift.github.io/) output format differs, a very similar approach could be use to create a ragged array for any type of Lagrangian simulation ouputs.

In [1]:
import numpy as np
import xarray as xr
from clouddrift import RaggedArray
from os.path import join

## Data

Numerical outputs from Lagrangian simulations are usually stored as bidimensional matrices. This particular example contains 13 trajectories released individually 2 hours apart.

In [2]:
folder = file = "../data/original/numerical/"
file = "Output.zarr"
ds = xr.open_dataset(join(folder, file), engine="zarr")

In [3]:
ds

The output dataset used here contains 10 particles and 13 observations. Not every particle has 13 observations however; since particles are released at different times, some trajectories are shorter than others.

We can observe this by looking at the time matrix.

In [4]:
np.set_printoptions(linewidth=160)
ns_per_hour = np.timedelta64(1, 'h') # nanoseconds in an hour

print(ds['time'].data/ns_per_hour)

[[ 0.  2.  4.  6.  8. 10. 12. 14. 16. 18. 20. 22. 24.]
 [ 2.  4.  6.  8. 10. 12. 14. 16. 18. 20. 22. 24. nan]
 [ 4.  6.  8. 10. 12. 14. 16. 18. 20. 22. 24. nan nan]
 [ 6.  8. 10. 12. 14. 16. 18. 20. 22. 24. nan nan nan]
 [ 8. 10. 12. 14. 16. 18. 20. 22. 24. nan nan nan nan]
 [10. 12. 14. 16. 18. 20. 22. 24. nan nan nan nan nan]
 [12. 14. 16. 18. 20. 22. 24. nan nan nan nan nan nan]
 [14. 16. 18. 20. 22. 24. nan nan nan nan nan nan nan]
 [16. 18. 20. 22. 24. nan nan nan nan nan nan nan nan]
 [18. 20. 22. 24. nan nan nan nan nan nan nan nan nan]]


  base = data.astype(np.int64)
  data = (base * m + (frac * m).astype(np.int64)).view("timedelta64[ns]")


By creating a ragged array, the resulting file is smaller since we do not have to store those `nan` values and keep the same number of observations per trajectory.

In [5]:
ds.close()

## Preprocessing

To pack the data into a ragged array, it's possible to create a preprocessing function and use the `RaggedArray.from_files()` class method, similar to the example in the `gdp.ipynb` notebook.
A faster alternative solution for numerical simulations is to manually create the required dictionnary to hold the dataset and to create the ragged array instance directly.

In [6]:
help(RaggedArray.__init__)

Help on function __init__ in module clouddrift.dataformat:

__init__(self, coords: dict, metadata: dict, data: dict, attrs_global: Optional[dict] = {}, attrs_variables: Optional[dict] = {})
    Initialize self.  See help(type(self)) for accurate signature.



In [7]:
coords = {}
metadata = {}
data = {}
attrs_global = {}
attrs_variables = {}

In [8]:
# decode_times=False to get time data and not datetime conversion
ds = xr.open_dataset(join(folder, file), engine="zarr")

# identify indices of finite values
finite_values = np.isfinite(ds['lon'])
idx_finite = np.where(finite_values)

# number of observations per trajectory
rowsize = np.bincount(idx_finite[0]).astype('int32')

# unique trajectory identification
unique_id = np.unique(ds.trajectory.values[idx_finite[0]]).astype('int32')

# coordinates
coords["time"] = np.tile(ds.time.data, (ds.dims['trajectory'],1))[idx_finite]  # reshape to 2D to get ragged time
coords["ids"] = np.repeat(unique_id, rowsize)

# metadata variables
metadata["rowsize"] = rowsize
metadata["ID"] = unique_id

# data variable
data["lon"] = ds.lon.data[idx_finite].astype('float32')
data["lat"] = ds.lat.data[idx_finite].astype('float32')
data["z"] = ds.z.data[idx_finite].astype('float32')

# attributes for each variable
attrs_variables = {
    "ID": {'long_name': 'Trajectory id', 'units':'-'},
    "time": {'axis': 'T', 'long_name': 'time', 'standard_name': 'time'}, 
    "lon": {'axis': 'X', 'long_name': 'longitude', 'units': 'degrees_east'}, 
    "lat": {'axis': 'Y', 'long_name': 'latitude', 'units': 'degrees_north'}, 
    "ids": {'long_name': 'Trajectory identification number repeated along observations', 'units': '-'},
    "rowsize": {'long_name': 'Number of observations per trajectory', 'sample_dimension': 'obs', 'units':'-'},
}

# keep original global attributes
attrs_global={
    'Conventions': 'CF-1.6/CF-1.7',
    'feature_type': 'trajectory',
    'ncei_template_version': 'NCEI_NetCDF_Trajectory_Template_v2.0',
    'parcels_mesh': 'flat',
    'parcels_version': '2.4.0'
}

ds.close()

In [9]:
ra = RaggedArray(coords, metadata, data, attrs_global, attrs_variables)

And we can finally rewrite the dataset as a ragged array in a NetCDF file as an example:

In [10]:
ra.to_netcdf('../data/process/Output.nc')