# Toy QDM
This notebook is to debug QDM implementation. Use cmethods or modify its source code directly.

16 Jan 2024 | EHU

In [None]:
import os
import sys
import copy
import csv
import time
import datetime
import math
import pandas as pd
import numpy as np
import xarray as xr
import netCDF4 as nc
import matplotlib.pyplot as plt

# sys.path.append('/home/theghub/ehultee/ISMIP7-utils/python-cmethods')
from cmethods import adjust

# from verjansFunctions import qmProjCannon2015

Initial run settings from Vincent.  Replace most of this with our own file selection, eventually.  Just check that this works.

In [None]:
DepthRange         = [0,500]
ShallowThreshold   = 100
PeriodObs0         = [1950,2015]
SigmaExclusion     = 4 #number of sdevs beyond which we constrain values in QDM
yrWindowProj       = 30 #number of years running window CDF in projection period 

# DirEN4         = f'{cwd}Verjans_InputOutput/'
# EN4file        = f'dpavg_tf_EN4anl_Dp{DepthRange[0]}to{DepthRange[1]}_bathymin{ShallowThreshold}.nc'
DirEN4         = f'/Users/eultee/Downloads/'
EN4file        = f'dpavg_tf_EN4anl_Dp{DepthRange[0]}to{DepthRange[1]}_bathymin{ShallowThreshold}.nc'

In [None]:
## Load EN4 using xarray
ds = xr.open_dataset(DirEN4+EN4file, decode_times='timeDim')
ds2 = ds.assign_coords({'timeDim': ds.time, 
                  'latDim': ds.lat, 
                  'lonDim': ds.lon,
                  'depthDim': ds.depth})
ds2

tfEN4 = ds2.tfdpavg0to500_bathymin100.rename({'timeDim': 'time',
                                              'latDim': 'lat',
                                              'lonDim': 'lon'})
tfEN4

In [None]:
# import pandas as pd
## Bryan Riel, please save me. Decimal year to datetime is the bane of this notebook.
## pasting stuff from iceutils below.
#-*- coding: utf-8 -*-

def tdec2datestr(tdec_in, returndate=False):
    """
    Convert a decimaly year to an iso date string.
    """
    if isinstance(tdec_in, (list, np.ndarray)):
        tdec_list = copy.deepcopy(tdec_in)
    else:
        tdec_list = [tdec_in]
    current_list = []
    for tdec in tdec_list:
        year = int(tdec)
        yearStart = datetime.datetime(year, 1, 1)
        if year % 4 == 0:
            ndays_in_year = 366.0
        else:
            ndays_in_year = 365.0
        days = (tdec - year) * ndays_in_year
        seconds = (days - int(days)) * 86400
        tdelta = datetime.timedelta(days=int(days), seconds=int(seconds))
        current = yearStart + tdelta
        if not returndate:
            current = current.isoformat(' ').split()[0]
        current_list.append(current)

    if len(current_list) == 1:
        return current_list[0]
    else:
        return np.array(current_list)

time_arr = tdec2datestr(tfEN4.time.values)
time_arr

Okay, finally successfully converted.  We need the obs dataset to have the same time type as the modeled one in order to use QDM `adjust`.

---

Now we try the cmethods Quantile Delta Mapping.  See [example notebook](https://github.com/ehultee/gris-iceocean-process/blob/main/python-cmethods_examples.ipynb) added by DF.

QDM `adjust` from cmethods needs datasets defined and input as simulated historical (`simh`), simulated projection (`simp`), and observed historical against which to bias-correct (`obs`).

Slice the EN4 dataset for the obs period defined by Vincent's `PeriodObs0`.  Import the example dataset of CESM2 TF for the same depth range and bathymetric threshold.

In [None]:
ds3 = xr.open_dataset(DirEN4+'/tfdpavg-CESM2-2024-11-14.nc')
ds3

This is a short example dataset.  For the sake of argument, let's take a very short correction period over the first half, and use the second half as the projection.  Let's try at a single grid cell to get our bearings.

---
### Try QDM on a 1D series

In [None]:
## select a single site
lat_sel = 65.5 ## deg N
lon_sel = 0.5 ## deg E

test_series = ds3.TF.sel(lon=lon_sel, lat=lat_sel, method='nearest')

test_series

In [None]:
test_obs = tfEN4.sel(lat=lat_sel, lon=lon_sel, method='nearest')
test_obs_trimmed = test_obs.sel(time=slice('2000', '2014'))
test_obs_trimmed

In order to plot these series together, we need both to have a date type matplotlib recognizes. You would think it would be enough to convert one of them (EN4) but now the other is not behaving, so here we are.  Importing the inverse function from [Bryan Riel](https://github.com/bryanvriel/iceutils/blob/master/iceutils/timeutils.py).

In [None]:
def datestr2tdec(yy=0, mm=0, dd=0, hour=0, minute=0, sec=0, microsec=0, dateobj=None):
    """
    Convert year, month, day, hours, minutes, seconds to decimal year.
    """
    if dateobj is not None:
        if type(dateobj) == str:
            yy, mm, dd = [int(val) for val in dateobj.split('-')]
            hour, minute, sec = [0, 0, 0]
        elif type(dateobj) == datetime.datetime:
            attrs = ['year', 'month', 'day', 'hour', 'minute', 'second']
            yy, mm, dd, hour, minute, sec = [getattr(dateobj, attr) for attr in attrs]
        elif type(dateobj) == np.datetime64:
            yy = dateobj.astype('datetime64[Y]').astype(int) + 1970
            mm = dateobj.astype('datetime64[M]').astype(int) % 12 + 1
            days = (
                (dateobj - dateobj.astype('datetime64[M]')) / np.timedelta64(1, 'D')
            )
            dd = int(days) + 1
            hour, minute, sec = [0, 0, 0]
        else:
            raise NotImplementedError('dateobj must be str, datetime, or np.datetime64.')

    # Make datetime object for start of year
    yearStart = datetime.datetime(yy, 1, 1, 0, 0, 0)
    # Make datetime object for input time
    current = datetime.datetime(yy, mm, dd, hour, minute, sec, microsec)
    # Compute number of days elapsed since start of year
    tdelta = current - yearStart
    # Convert to decimal year and account for leap year
    if yy % 4 == 0:
        return float(yy) + tdelta.total_seconds() / (366.0 * 86400)
    else:
        return float(yy) + tdelta.total_seconds() / (365.0 * 86400)

In [None]:
## should we try expressing both with to_datetimeindex...?
# test_obs_trimmed.indexes['time'].to_datetimeindex()
tobs_times = pd.to_datetime(tdec2datestr(test_obs_trimmed.time.values))

In [None]:
fig, ax = plt.subplots()
# ax.plot(test_series.time.values, test_series)
ax.plot(test_series.indexes['time'].to_datetimeindex().values, test_series, label='CESM2')
ax.plot(tobs_times,
    test_obs_trimmed, label='EN4')
ax.legend(loc='best')
ax.set(xlabel='Year', ylabel='Thermal forcing', title='Series extracted for example cell ({} E,{} N)'.format(lat_sel, lon_sel))
plt.show()

In [None]:
obs_series = test_obs_trimmed.assign_coords(new_time = ('time', tobs_times))
obs_series = obs_series.drop_indexes('time')
obs_series_1 = obs_series.set_xindex('new_time').drop_vars('time')
obs_series_1 = obs_series_1.rename({'new_time': 'time'})
obs_series_1

In [None]:
sim_series = test_series.assign_coords(new_time = ('time', test_series.indexes['time'].to_datetimeindex().values))
sim_series = sim_series.drop_indexes('time')
sim_series_1 = sim_series.set_xindex('new_time').drop_vars('time')
sim_series_1 = sim_series_1.rename({'new_time': 'time'})
sim_series_1 
## = sim_series_1.sel(time=slice('2000','2013')) ## maybe the underlying data has to be exactly the same length? alignment otherwise seems good...

In [None]:
obs_ds = obs_series_1.to_dataset()
sim_ds = sim_series_1.to_dataset()

In [None]:
sim_ds = sim_ds.rename({'TF': 'tfdpavg0to500_bathymin100'})

In [None]:
## let's try QDM adjustment here
# to adjust a 3d dataset
qdm_result = adjust(
    method = "quantile_delta_mapping",
    obs = obs_ds.sel(time=slice('2000','2007')),
    simh = sim_ds.sel(time=slice('2000', '2007')).rename({'time':'t_simh'}),
    simp = sim_ds.sel(time=slice('2007', '2014')),
    n_quantiles = 100,
    input_core_dims={"obs": "time", "simh": "t_simh", "simp": "time"},
    # group={"obs": "time.month", "simh": "t_simh.month", "simp": "time"},
    kind = "+", # to calculate the relative rather than the absolute change, "*" can be used instead of "+" (this is prefered when adjusting precipitation)
)

---
## Test QDM on a 3D dataset instead of a series
We are getting 
```
AttributeError: 'Dataset' object has no attribute 'to_dataset'
```

Perhaps this is because we've fed the function a series converted to a Dataset, rather than a 3D dataset?  Take a small slice to try.

### Process a small subset of simulated data

In [None]:
## try to do all the steps in pre-processing at once...
test_ds = ds3.TF.sel(lon=slice(lon_sel+0.1, lon_sel+1), lat=slice(lat_sel-1, lat_sel+2))
test_ds

In [None]:
## aligning the time indices
test_ds = test_ds.assign_coords(new_time = ('time', test_ds.indexes['time'].to_datetimeindex().values))
test_ds = test_ds.drop_indexes('time')
test_ds = test_ds.set_xindex('new_time').drop_vars('time')

## aligning the names of the variables between obs and sim
test_ds = test_ds.to_dataset()
test_ds = test_ds.rename({'new_time': 'time', 'TF': 'tfdpavg0to500_bathymin100'})
test_ds

### Process small subset of reanalysis data

In [None]:
tobs_ds = tfEN4.sel(lon=slice(lon_sel-1, lon_sel+1), 
                    lat=slice(lat_sel-2, lat_sel+2), 
                    time=slice('2000', '2014'))
tobs_ds

We are back to the problem of offset grids.  Ideally we would do a point-by-point implementation that uses nearest neighbors, because xarray is good at this.  Realigning the whole grid is less efficient.  For now, let's see if it will work to just force the same grid -- this tells us whether the problem is even worth solving for cmethods implementation.

In [None]:
scam_lat = [v+0.5 for v in tobs_ds.lat.values]
scam_lon = [v+0.5 for v in tobs_ds.lon.values]

In [None]:
## overwrite them
tobs_ds = tobs_ds.assign_coords(new_lat = ('lat', scam_lat))
tobs_ds = tobs_ds.drop_indexes('lat')
tobs_ds = tobs_ds.set_xindex('new_lat').drop_vars('lat')

tobs_ds = tobs_ds.assign_coords(new_lon = ('lon', scam_lon))
tobs_ds = tobs_ds.drop_indexes('lon')
tobs_ds = tobs_ds.set_xindex('new_lon').drop_vars('lon')

tobs_ds = tobs_ds.rename({'new_lat': 'lat', 'new_lon': 'lon'})

In [None]:
tobs_ds

Reset the time index to be a datetime type.  Note that these data are also float32 rather than float64.  Could cause problems?

In [None]:
tobs_ds = tobs_ds.assign_coords(new_time = ('time', pd.to_datetime(tdec2datestr(tobs_ds.time.values))))
tobs_ds = tobs_ds.drop_indexes('time')
tobs_ds = tobs_ds.set_xindex('new_time').drop_vars('time')
tobs_ds = tobs_ds.rename({'new_time': 'time'})
tobs_ds

### Attempt QDM on these 3D sets

In [None]:
qdm_result = adjust(
    method = "quantile_delta_mapping",
    obs = tobs_ds.sel(time=slice('2000','2007')),
    simh = test_ds.sel(time=slice('2000', '2007')).rename({'time':'t_simh'}),
    simp = test_ds.sel(time=slice('2007', '2014')),
    n_quantiles = 100,
    input_core_dims={"obs": "time", "simh": "t_simh", "simp": "time"},
    # group={"obs": "time.month", "simh": "t_simh.month", "simp": "time"},
    kind = "+", # to calculate the relative rather than the absolute change, "*" can be used instead of "+" (this is prefered when adjusting precipitation)
)

The source code warns that this is disabled (??), and the line is only called if the group argument is not set.  So, fun-lovers that we are, we try to use a grouping suggested in the cmethods docs:

In [None]:
qdm_result = adjust(
    method = "quantile_delta_mapping",
    obs = tobs_ds.sel(time=slice('2000','2007')),
    simh = test_ds.sel(time=slice('2000', '2007')).rename({'time':'t_simh'}),
    simp = test_ds.sel(time=slice('2007', '2014')),
    n_quantiles = 100,
    input_core_dims={"obs": "time", "simh": "t_simh", "simp": "time"},
    group={"obs": "time.month", "simh": "t_simh.month", "simp": "time"},
    kind = "+", # to calculate the relative rather than the absolute change, "*" can be used instead of "+" (this is prefered when adjusting precipitation)
)

...but we can't use group for distribution-based methods.  Alas.