## calculate the probabilities for terciles, deciles and percentiles (0.02 then 0.05 to 0.95) categories for a 'realtime' forecast, with respect to the lead-time dependent monthly and seasonal hindcast climatologies, for each of the 8 C3S GCMs (ECMWF, UKMO, METEO-FRANCE, DWD, CMCC, NCEP, JMA and ECCC)

This notebook:   
    
1) reads the latest forecasts from the C3S MME [ECMWF, UKMO, METEO-FRANCE, DWD, CMCC, NCEP, JMA and ECCC]  
2) preprocesses, converts to monthly / seasonal rainfall accumulations   
3) reads the leadtime-dependent terciles, deciles and percentiles (0.02 then 0.05 to 0.95) climatologies corresponding to the initial month of the forecast  
4) calculate the probabilities for each quantile category as the proportion of the GCM's ensemble members  
5) saves to disk these probabilities for later use and mapping   

In [24]:
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [25]:
### os 
import os 
import sys

### datetimes 
from datetime import datetime, timedelta
from dateutil.relativedelta import relativedelta
from calendar import month_name

### scipy 
import numpy as np 
import pandas as pd
import xarray as xr

In [26]:
import pathlib
HOME = pathlib.Path.home()
CWD = pathlib.Path.cwd() 

### import local functions for the processing of the C3S forecasts 

In [27]:
sys.path.append('../..')

In [28]:
from ICU_Water_Watch import C3S, domains

In [29]:
domain = domains.domains['C3S_download']

In [30]:
domain

[100, 240, -50, 30]

### provider (always CDS for now)

In [31]:
provider = 'CDS'

### variable name

In [32]:
varname = 'tprate'

### period (`monthly` or `seasonal`)

In [33]:
period = 'monthly'
# period = 'seasonal'

### provider 

In [34]:
provider = 'CDS'

### list of GCMs 

In [35]:
list_GCMs = ['ECMWF','UKMO','METEO_FRANCE','CMCC','DWD', 'NCEP', 'JMA', 'ECCC_CanCM4i', 'ECCC_GEM_NEMO']

### lag in months (if need to process older forecasts)

In [36]:
lag = 0

### get today's date 

In [37]:
date = datetime.utcnow()

### apply lag 

In [38]:
date = date - relativedelta(months=lag)

In [39]:
print(f"will process forecasts issued in {date:%B %Y}")

will process forecasts issued in November 2021


### path to the GCMs **hindcast datasets** and **climatologies** 

In [40]:
gcm_path = pathlib.Path(f'/media/nicolasf/END19101/ICU/data/{provider}/operational/hindcasts')

### path to where the **realtime forecasts** have been downloaded 

In [41]:
forecasts_path = pathlib.Path(f'/media/nicolasf/END19101/ICU/data/{provider}/operational/forecasts')

In [42]:
forecasts_path

PosixPath('/media/nicolasf/END19101/ICU/data/CDS/operational/forecasts')

### output path for the probabilistic forecast files 

In [43]:
opath = CWD.parents[1].joinpath("outputs/C3S")

In [44]:
if not opath.exists(): 
    opath.mkdir(parents=True)

### get year and month 

In [45]:
year, month =  date.year, date.month

### loop over the GCMs here

In [46]:
for GCM in list_GCMs: 

    ### path to the CLIMATOLOGICAL terciles and deciles, 
    
    ### calculated over all the ensembles and month (for each initial month, i.e. leadtime dependent climatology) 
    
    # clim_path = gcm_path.joinpath(f'operational/hindcasts/CLIMATOLOGY/{GCM}')
   
    clim_path = gcm_path.joinpath(f'CLIMATOLOGY/{GCM}')
    
    tercile_climatology = xr.open_dataset(clim_path.joinpath(f"{GCM}_{period}_tercile_climatology_{str(month).zfill(2)}.netcdf"), engine='netcdf4')
    
    decile_climatology = xr.open_dataset(clim_path.joinpath(f"{GCM}_{period}_decile_climatology_{str(month).zfill(2)}.netcdf"), engine='netcdf4')
    
    percentile_climatology = xr.open_dataset(clim_path.joinpath(f"{GCM}_{period}_percentile_climatology_{str(month).zfill(2)}.netcdf"), engine='netcdf4')
    
    # make sure we have the same domains
    
    tercile_climatology = domains.extract_domain(tercile_climatology, domain)
    
    decile_climatology = domains.extract_domain(decile_climatology, domain)
    
    percentile_climatology = domains.extract_domain(percentile_climatology, domain)
    
    print(f"{50*'-'}\nReading forecasts issued {year}-{str(month).zfill(2)} for GCM {GCM}")

    if 'ECCC' in GCM: 
    
        x = xr.open_dataset(forecasts_path.joinpath(f"{GCM}/{varname.upper()}/ensemble_seas_forecasts_{varname}_from_{year}_{str(month).zfill(2)}_{GCM.split('_')[0]}.netcdf"), engine='netcdf4')

    else: 
        
        x = xr.open_dataset(forecasts_path.joinpath(f"{GCM}/{varname.upper()}/ensemble_seas_forecasts_{varname}_from_{year}_{str(month).zfill(2)}_{GCM}.netcdf"), engine='netcdf4')

    ### preprocess (harmonize the variable names, sort the latitudes, etc )

    x = C3S.preprocess_GCM(x)

    ### convert from mm/day to mm/month 

    x = C3S.convert_rainfall(x, varin='tprate', varout='precip', leadvar='step', timevar='time', dropvar=True)

    ### just in case, remove potential missing fields (members)

    x = x.dropna(dim='member')
    
    ### make sure we have the same domain for the climatologies and the latest forecasts
    
    x = domains.extract_domain(x, domain)
    
    # calculates the seasonal values if period == seasonal

    if period == 'seasonal': 

        print("Calculating the seasonal (3 months) accumulations")

        x = x.rolling({'step':3}, min_periods=3, center=False).sum('step') 

        # get rid of the 2 first steps, which by definition contain missing values 

        x = x.sel(step=slice(3, None))

    ### checks that the initial month corresponds indeed to what we defined earlier 

    if (x.time.dt.year != year) or (x.time.dt.month != month): 

        print(f"issue with the initial date in the latest forecast, expected {year}-{month}, got {x.time.dt.year}-{x.time.dt.month}")

    ### Now calculates the tercile category for each member

    terciles_category = C3S.get_GCM_category_digitize(x, tercile_climatology.squeeze(), varname='precip', dim='quantile')

    ### and calculate the proportion of member in each category 
    
    terciles_category_percent = C3S.calculate_quantiles_probabilities(terciles_category, ncategories=3)
    
    ### same thing as abobe, for decile categories 

    deciles_category = C3S.get_GCM_category_digitize(x, decile_climatology.squeeze(), varname='precip', dim='quantile')

    deciles_category_percent = C3S.calculate_quantiles_probabilities(deciles_category, ncategories=10)

    ### same thing as above for percentile categories 
    
    percentiles_category = C3S.get_GCM_category_digitize(x, percentile_climatology.squeeze(), varname='precip', dim='quantile')

    percentiles_category_percent = C3S.calculate_quantiles_probabilities(percentiles_category, ncategories=21)
    
    ### creates a dummy 'GCM' dimension, and saves the tercile and decile probabilities to disk 

    terciles_category_percent = terciles_category_percent.expand_dims(dim={'GCM':[GCM]}, axis=0) 

    deciles_category_percent = deciles_category_percent.expand_dims(dim={'GCM':[GCM]}, axis=0) 

    percentiles_category_percent = percentiles_category_percent.expand_dims(dim={'GCM':[GCM]}, axis=0)

    # includes the quantile values (i.e. the 'bounds' for the quantile categories) in the dataset 

    terciles_category_percent.attrs['pct_values'] = tercile_climatology['quantile'].data
    deciles_category_percent.attrs['pct_values'] = decile_climatology['quantile'].data
    percentiles_category_percent.attrs['pct_values'] = percentile_climatology['quantile'].data

    ### saves to disk 
    
    print(f"saving the quantile probabilities in the folder {str(opath)}")

    terciles_category_percent.to_netcdf(opath.joinpath(f"{period}_terciles_probabilities_from_{date:%Y-%m}_{GCM}.netcdf")) 

    deciles_category_percent.to_netcdf(opath.joinpath(f"{period}_deciles_probabilities_from_{date:%Y-%m}_{GCM}.netcdf")) 

    percentiles_category_percent.to_netcdf(opath.joinpath(f"{period}_percentiles_probabilities_from_{date:%Y-%m}_{GCM}.netcdf")) 

    print(f"\n{GCM} {period} forecasts from {year}-{str(month).zfill(2)} processed and saved in {str(opath)}...\n")


--------------------------------------------------
Reading forecasts issued 2021-11 for GCM ECMWF

unit is m s**-1, converting to mm/day

now converting to mm/month, converted precipitation will be held in var = precip
saving the quantile probabilities in the folder /home/nicolasf/operational/ICU/development/hotspots/code/ICU_Water_Watch/outputs/C3S

ECMWF monthly forecasts from 2021-11 processed and saved in /home/nicolasf/operational/ICU/development/hotspots/code/ICU_Water_Watch/outputs/C3S...

--------------------------------------------------
Reading forecasts issued 2021-11 for GCM UKMO

unit is m s**-1, converting to mm/day

now converting to mm/month, converted precipitation will be held in var = precip
saving the quantile probabilities in the folder /home/nicolasf/operational/ICU/development/hotspots/code/ICU_Water_Watch/outputs/C3S

UKMO monthly forecasts from 2021-11 processed and saved in /home/nicolasf/operational/ICU/development/hotspots/code/ICU_Water_Watch/outputs/C3S...