# temperature differences

In this notebook, we compute the difference between the temperature mesured by the tag and the temperature from the reference model

**Summary:**

1. Opening the data: reference model (copernicus) and tag log
2. Set up the dask cluster
3. Data alignment
4. Compute the differences
5. Save to disk

In [None]:
import cf_xarray
import dask
import fsspec
import intake
import numba
import numpy as np
import pandas as pd
import xarray as xr

from pangeo_fish.cf import bounds_to_bins
from pangeo_fish.diff import marc_diff_z
from pangeo_fish.model import marc_sigma_to_depth
from pangeo_fish.tags import adapt_model_time, reshape_by_bins, to_time_slice

parametrize with [papermill](https://papermill.readthedocs.io/en/latest/)

In [None]:
tag_url: str

catalog: str
catalog_parameters: dict = {}

scheduler_address: str | None = None

relative_depth_threshold: float = 0.8

diff_path: str

# local PC
tag_url="/Users/todaka/python/git/pangeo-fish/data_local/fish-intel/tag/nc/A18832.nc"
catalog = "https://data-taos.ifremer.fr/kerchunk/ref-copernicus.yaml"
scheduler_address: str | None = None
catalog = "https://data-taos.ifremer.fr/kerchunk/ref-marc.yaml"
diff_path="/Users/todaka/python/git/pangeo-fish/data_local/fish-intel/A18832-f1_e2500/diff.zarr"


# Datarmor
tag_url="/home/datawork-lops-iaocea/data/fish-intel/tag/nc/A18832.nc"

# mars
catalog_mars = "/home/datawork-taos-s/intranet/kerchunk/ref-marc.yaml"
catalog_parameters: dict = {  "region": "f1_e2500",  "year": "2022"}
diff_path="/home/datawork-taos-s/public/fish/A18832-f1_e2500/diff.zarr"


# copernicus
catalog="/home/datawork-taos-s/intranet/kerchunk/ref-copernicus.yaml"
#catalog_parameters: dict = {  "type": ["2022_3D","2022_2D","mdt"]}
diff_path="/home/datawork-taos-s/public/fish/A18832-copernicus/diff.zarr"



cluster

In [None]:
import dask_hpcconfig
from distributed import Client

In [None]:
overrides = {}
# overrides = { "cluster.cores": 28 , "cluster.processes": 6 }

cluster = dask_hpcconfig.cluster("datarmor", **overrides)
client = Client(cluster)
cluster.scale(50)
client

## Open the data: reference model (mars) and tag log

open the tag log

In [None]:
tag = xr.open_dataset(fsspec.open(tag_url).open(), engine="h5netcdf").load()
tag

open the reference model

TODO: for now, we will directly read the data, but in the future we might want to use [xpublish](https://github.com/xpublish-community/xpublish) to hide the reading / preprocessing of the reference model (especially computing the depth / pressure and stitching together different models)

In [None]:
cat = intake.open_catalog(catalog)
ds=cat.data_tmp(type="2022_3D").to_dask().rename({"thetao":"TEMP"})[["TEMP"]]
ds["XE"]=cat.data_tmp(type="2022_2D").to_dask().zos
ds["H0"]=cat.data_tmp(type="mdt").to_dask().deptho.rename({"latitude":"lat","longitude":"lon"})
ds["mask"]=cat.data_tmp(type="mdt").to_dask().mask.rename({"latitude":"lat","longitude":"lon"})

ds=ds.assign_coords(time=lambda ds: ds.time.astype("datetime64[ns]"))
ds

## data alignment

In order to compare measured temperature with the model, we need to
1. align time ranges
2. calculate the modelled depth
3. group the measured data into bins

### align time ranges

In [None]:
slice_ = to_time_slice(tag.times)

In [None]:
tag_log = tag.sel(time=slice_)
tag_log

In [None]:
#subset with time
model_subset = ds.sel(time=adapt_model_time(slice_))
#take out the depths where fish will never go.
depth_max=(tag_log.pressure.max() - model_subset.XE.min()).compute()
model_subset=model_subset.sel(depth=slice(0,depth_max))

#subset with location(iroise ocean) for quick computational tests
model_subset=model_subset.sel(lat=slice(45,51),lon=slice(-8,0))#.H0.plot()
model_subset

### Convert sigma level to depth

The formula for the computation of the depth is model-specific.

*TODO*:
- calculate the modelled pressure and use that to compute the diff – essentially, that's what the tag measured
- have the hosted model (using `xpublish`) calculate the depth – that way, we don't need to worry about the model-specific formula

In [None]:
reference_model_=model_subset.rename({"depth":"level"}).chunk({"level": -1})
reference_model_['depth']=reference_model_.XE+reference_model_.level
reference_model_['bottom']=reference_model_.XE+reference_model_.H0



In [None]:
#set  coordinates to have 2 D coordinate system

broadcasted, = xr.broadcast(
    reference_model_[["lat","lon"]].reset_index(["lat","lon"]).rename_vars(
        {"lat":"latitude","lon":"longitude"}).reset_coords())
reference_model_=reference_model_.merge(broadcasted).set_coords(["latitude","longitude"])

In [None]:
reference_model = reference_model_.cf.add_bounds(["time"], output_dim="bounds").pipe(
    bounds_to_bins, bounds_dim="bounds"
)
reference_model

### reshape the tag data

To further align both datasets, we need to reshape the data into bins, such that the `temperature(measured_time)` and `depth(measured_time)` coordinates become `temperature(model_time, obs)` and `depth(model_time, obs)`.

determine the bins

reshape

In [None]:
%%time
reshaped_tag = (
    tag_log[["water_temperature", "pressure"]]
    .pipe(
        reshape_by_bins,
        dim="time",
        bins=reference_model.time_bins,
        bin_dim="bincount",
        other_dim="obs",
    )
    .assign_coords(time=lambda ds: reference_model.time.isel(time=ds.bincount))
    .swap_dims({"bincount": "time"})
    .drop_vars(["bincount", "time_bins"])
    .chunk({"time": 1})
)
reshaped_tag

## compute the differences

Now that both datasets are aligned, we can compute the actual difference. However, since the model's depth is a function of time and position, we can't just subtract the tag log from the model. Instead, we have to find the matching depth for each pixel separately, take the temperature at that depth and calculate the difference. Finally, we can compute the mean along the observation dimension to get a single value per pixel and timestep.

In [None]:
%%time
diff = (
    marc_diff_z(reference_model, reshaped_tag, depth_threshold=relative_depth_threshold)
    .to_dataset()
    .assign_attrs({"tag_id": tag_log.attrs["tag_id"]})
    .assign({"H0": reference_model.H0})
    .chunk({"time": 1,"lat":-1,"lon":-1})
)
diff

## Save the differences to disk

In [None]:
%%time
# need to drop the bins since zarr cannot represent that
diff.drop_vars(["time_bins"]).to_zarr(diff_path, mode="w", consolidated=True)

In [None]:
import xarray as xr
diff_=xr.open_zarr(diff_path)


In [None]:
import hvplot.xarray
diff_['diff'].isel(time=0).plot(x='longitude',y='latitude')