### <center> This notebook has the postprocess of the raw downloaded S2S forecast members and the following construction of the coastnorth/coastal hovmoller dataset

The S2S data where download for the pacific basin (110°E - 295°E, 45°S, 45°N), including variables in the "surface" category and in the "ocean category". The main difference is that the surface variables (winds and sst) are in a global 1.5°x1.5° grid, and the ocean variables (sea level, heat content, etc) are in a 1°x1° grid. Since this last grid is the finer one, the postprocess includes a regridding of the surface variables to the ocean grid, where for all variables a bilinear interpolation will be the main inteprolation method. In addition, 11 members are going to be used (1 control and 10 perturbation forecasts), so the structure of the new dataset will have the following dimensions: distance along the hovmoller (space), initialization time since 2000 to 2022, leadtime (from 0 to 46) and ensame member (from 0 to 10, being 0 the control simulation). Lets get started!


In [1]:
# Imports
import xarray as xr
import xesmf as xe
import numpy as np
import pandas as pd
from glob import glob
import matplotlib.pyplot as plt

# Just for supressing an annoying warning
import dask
dask.config.set({"array.slicing.split_large_chunks": False})

# For using more cores
from dask.distributed import Client, LocalCluster
cluster = LocalCluster(n_workers=20, threads_per_worker=1)
client  = Client(cluster, asynchronous=True)


2023-07-25 15:50:57,286 - distributed.diskutils - INFO - Found stale lock file and directory '/tmp/dask-worker-space/worker-ngjcc2mf', purging
2023-07-25 15:50:57,286 - distributed.diskutils - INFO - Found stale lock file and directory '/tmp/dask-worker-space/worker-3l5j2s9u', purging
2023-07-25 15:50:57,286 - distributed.diskutils - INFO - Found stale lock file and directory '/tmp/dask-worker-space/worker-erooqlp1', purging
2023-07-25 15:50:57,286 - distributed.diskutils - INFO - Found stale lock file and directory '/tmp/dask-worker-space/worker-v4wxsg5a', purging
2023-07-25 15:50:57,286 - distributed.diskutils - INFO - Found stale lock file and directory '/tmp/dask-worker-space/worker-w117m19a', purging
2023-07-25 15:50:57,286 - distributed.diskutils - INFO - Found stale lock file and directory '/tmp/dask-worker-space/worker-aknl2qe_', purging
2023-07-25 15:50:57,287 - distributed.diskutils - INFO - Found stale lock file and directory '/tmp/dask-worker-space/worker-3sth9mw7', purging

In [2]:
# Load masks
mask = xr.open_dataset('data/S2S_masks.nc')

In [117]:
!ls data/S2S/REFORECASTS/
print('\n')
!ls data/S2S/REFORECASTS/ensamble5 | head -n 5
print('\n')
!ls data/S2S/REFORECASTS/ensamble5/2020-01-06 | head -n 5

backup	   ensamble1   ensamble2  ensamble4  ensamble6	ensamble8
ensamble0  ensamble10  ensamble3  ensamble5  ensamble7	ensamble9


2020-01-06
2020-01-09
2020-01-13
2020-01-16
2020-01-20


2000-01-06_O2D.nc
2000-01-06_SURF.nc
2000-01-06_TSM.nc
2001-01-06_O2D.nc
2001-01-06_SURF.nc


As it can be seen in the previous shell command the reforecasts dataset is stored in different folders. First of all there is a folder with the corresponding ensamble member. Second there is a folder with the initialization time of each near real time (NRT) S2S forecast. Third and final are the netcdf files of the corresponding reforecast of the NRT forecast. The reforecasts are forecast initialized in the same day as a NRT S2S forecast but for the previous 20 years. Its strange but it is what it is. The dataset is saved on disk with this format because of how S2S is run (monday and thursday of every week) exists the possibility of repeated reforecasts. 

In [113]:
def first_preprocess(ds):
    """
    This small function open an S2S netcdf file and grab the 
    46 leadtimes and define the leadtime and inittime coordinates
    """
    ds = ds.squeeze().isel(time=slice(-46,None))
    ds = ds.assign_coords({'inittime':ds.time[0].values})
    ds = ds.rename({'time':'leadtime'})
    ds.coords['leadtime'] = ('leadtime',np.arange(len(ds.leadtime)))
    return ds.compute()

def preprocess_O2D(ds):
    """
    For the ocean downloaded variables just make some clean up
    """
    ds = first_preprocess(ds)
    ds = ds.drop(['depth','depth_2','depth_2_bnds'])
    ds = ds.rename({'dslm':'zos','param18.4.10':'T300'})
    return ds

def preprocess_SURF(ds):
    """
    For downloaded winds just make some clean up
    """
    ds = first_preprocess(ds)
    ds = ds.drop(['height'])
    ds = ds[['10u','10v']]
    return ds

def load_s2s(member, date):
    """
    Given the ensamble and NRT S2S date this function
    loads all the S2S data as an xarray (ocean and winds)
    """
    path_surf = glob(f'data/S2S/REFORECASTS/ensamble{member}/{date}/*_SURF.nc')
    path_sst  = glob(f'data/S2S/REFORECASTS/ensamble{member}/{date}/*_TSM.nc')
    path_O2D  = glob(f'data/S2S/REFORECASTS/ensamble{member}/{date}/*_O2D.nc')
    # Load surface and ocean datasets
    surf = xr.merge([xr.open_mfdataset(path_surf, preprocess=preprocess_SURF,parallel=True, concat_dim='inittime', combine='nested'),
                    xr.open_mfdataset(path_sst, preprocess=first_preprocess,parallel=True, concat_dim='inittime', combine='nested')])
    o2d  = xr.open_mfdataset(path_O2D, preprocess=preprocess_O2D,parallel=True, concat_dim='inittime', combine='nested')
    # Perform the regridding of 1.5°x1.5° surface to 1°x1° ocean
    regridder = xe.Regridder(surf,o2d,'bilinear')
    surf      = regridder(surf.ffill('lon'))
    # Merge everything
    s2s = xr.merge([surf,o2d])
    s2s['sst'] = s2s['sst'].where(~np.isnan(s2s.zos))
    s2s = s2s.sel(lon=slice(111,293))
    del surf, o2d
    return s2s