# Data Access for the Oceanbench v2 datachallenge demo


## Reusing processing steps and reproducing data preparation

### Use the configured `ocb-dc_ose_2021-input_data` pipeline
![](imgs/data_doc.png)

#### Reproduce processing of single satellite

In [None]:
!ocb-dc_ose_2021-input_data params.sat=j2g

In [None]:
import xarray as xr
ds = xr.open_mfdataset('data/prepared/input/*.nc', combine='nested',concat_dim='time')
ds

In [None]:
# 2D map
bin_size = 1/20
(
    ds.sel(time='2017-08-01').assign(
        lat=ds.lat / bin_size // 1 * bin_size,
        lon=ds.lon / bin_size // 1 * bin_size
    )[['ssh', 'lat', 'lon']].load()
    .drop_vars('time')
    .to_dataframe()
    .groupby(['lat', 'lon']).mean()
    .to_xarray()
).ssh.plot()

#### Dry (without actual execution) run for all satellites

In [None]:
!ocb-dc_ose_2021-input_data --multirun dry=True

## Downloading versioned and preprocessed data

### Listing datachallenge content

In [None]:
# Storing the repo url for convenience
%env DC_REPO=https://github.com/quentinf00/ocb-dc-ose-2021.git

In [None]:
# Listing and pretty printing all files of the datachallenge
!dvc ls -R $DC_REPO datachallenge/data \
| tree --fromfile

### Downloading prepared input data

In [None]:
!dvc get -q $DC_REPO datachallenge/data/prepared/input

In [None]:
!tree input

### Visualize input data

In [None]:
ds = xr.open_mfdataset('input/*.nc', combine='nested',concat_dim='time')
ds

In [None]:
# 2D map
bin_size = 1/20
(
    ds.sel(time='2017-08-01').assign(
        lat=ds.lat / bin_size // 1 * bin_size,
        lon=ds.lon / bin_size // 1 * bin_size
    )[['ssh', 'lat', 'lon']].load()
    .drop_vars('time')
    .to_dataframe()
    .groupby(['lat', 'lon']).mean()
    .to_xarray()
).ssh.plot()

### Checking generated data VS downloaded

In [None]:
xr.testing.assert_allclose(
    xr.open_dataset('data/prepared/input/j2g.nc'),
    xr.open_dataset('input/j2g.nc'),
)
print("Successful reproduction")

### More on pipeline usage (help, doc, ...)

In [None]:
!ocb-dc_ose_2021-input_data --help

In [None]:
!ocb-dc_ose_2021-input_data params.sat=alg dry=True 'hydra.verbose=[aprl.appareil]'