# Access from `iridl.ldeo.columbia.edu`

IRI Data Library (IRIDL) hosts various subseasonal initialized forecast and hindcast simulations:
- `S2S project`:
    - http://iridl.ldeo.columbia.edu/SOURCES/.ECMWF/.S2S/
        - hindcast/reforecast: one variable, one model:
        - login required
- `SubX project`:
    - http://iridl.ldeo.columbia.edu/SOURCES/.Models/.SubX/
        - hindcast/reforecast: one variable, one model:
        - login not required
- Notes:
    - Output on IRIDL is not always on the 1.5 degree grid requested for the competition. Also dimension names and coordinates differ.
    - Beware that most models are not only initialized on thursdays. It is not forbidden to use simulations which are started on other weekdays, buy please pay attention that you may only use information available on `forecast_time`, i.e. if the model is initialized on Mondays, you have to use the day 14+3=17 to day 27+3=30 forecast for week 3-4.
---
This notebook also provides opendap magic, i.e. commands added to the opendap URL which preprocess data server-side. (not implemented)

---

Here are instructions for configuring xarray to open protected Data Library datasets, after you have created a Data Library account and accepted the terms and conditions for the dataset.
1. Visit https://iridl.ldeo.columbia.edu/auth/genkey . Log in to the Data Library. Copy the key from the response.

2. Create a file with the following content, substituting the key from step 1 for `"xxxx"`:
`Set-Cookie: __dlauth_id=xxxx; domain=.iridl.ldeo.columbia.edu`

3. Put the following in `~/.daprc`, which is `/home/jovyan/.daprc` on renku, substituting the path to the above file for `/path/to/cookie/file`:
`HTTP.COOKIEJAR=/path/to/cookie/file`. You may need to copy `.daprc` to `/home/jovyan` on renku, because `/home/jovyan` is not tracked by `git`.

In [1]:
%%writefile /work/s2s-ai-challenge-template/.daprc
HTTP.COOKIEJAR=/work/s2s-ai-challenge-template/.cookie_iridl

Writing /work/s2s-ai-challenge-template/.daprc


In [2]:
!cp /work/s2s-ai-challenge-template/.daprc /home/jovyan

In [3]:
#%writefile /work/s2s-ai-challenge-template/.cookie_iridl
#Set-Cookie: __dlauth_id=xxxx; domain=.iridl.ldeo.columbia.edu

In [4]:
%%writefile /work/s2s-ai-challenge-template/.cookie_iridl
Set-Cookie: __dlauth_id=6d3f0d342e1bdd448b287481f6d7989673305eeba2fa65fabb2709e2d76101b21ae816ffe0560b1a25ed3c8d0bf8884eab7d4bc2; domain=.iridl.ldeo.columbia.edu

Writing /work/s2s-ai-challenge-template/.cookie_iridl


In [1]:
import xarray as xr
xr.set_options(display_style='text')



<xarray.core.options.set_options at 0x7efe3cb51fd0>

Please beawre that most models are not only initialized on thursdays.
It is not forbidden to use simulations which are started on other weekdays,
buy please pay attention that you may only use information available on `forecast_time`,
i.e. if the model is initialized on Mondays, you have to use the day 14+3=17 to day 27+3=30 forecast for week 3-4.

In [2]:
ds = xr.open_dataset('https://iridl.ldeo.columbia.edu/SOURCES/.ECMWF/.S2S/.ECMF/.reforecast/.control/.2m_above_ground/.2t/dods',
                     chunks='auto', decode_times=False)

/opt/conda/lib/python3.8/site-packages/gribapi/_bindings.cpython-38-x86_64-linux-gnu.so: undefined symbol: codes_bufr_key_is_header


In [3]:
# calendar '360' not recognized, but '360_day'
if ds.hdate.attrs['calendar'] == '360':
    ds.hdate.attrs['calendar'] = '360_day'

In [4]:
ds = xr.decode_cf(ds).rename({'X':'longitude', 'Y':'latitude', 'S':'forecast_time', 'LA': 'lead_time', '2t':'t2m'})
ds['t2m']

In [5]:
ds.nbytes/1e9,'GB'

(88.496735436, 'GB')

In [6]:
# hdate gets the privous years reforecast for that dayofyear

## Hindcast Availability

- BOM:	BoM POAMA Ensemble.
- CMA:	Beijing Climate Center (BCC) Climate Prediction System for S2S.
- CNRM:	CNRM Ensemble Prediction System.
- ECCC:	ECCC Ensemble Prediction System.
- ECMF:	ECMWF Ensemble.
- HMCR:	HMCR Ensemble.
- ISAC:	ISAC-CNR Ensemble.
- JMA:	JMA Ensemble System.
- KMA:	KMA Seasonal Prediction System.
- NCEP:	NCEP CFSv2 Ensemble.
- UKMO:	UKMO Ensemble Prediction System.

In [7]:
models = ['BOM','CNRM','ECCC','ECMF','HMCR','ISAC','JMA','KMA','NCEP','UKMO']
for model in models:
    try:
        ds = xr.open_dataset(f'https://iridl.ldeo.columbia.edu/SOURCES/.ECMWF/.S2S/.{model}/.reforecast/.perturbed/.2m_above_ground/.2t/dods',
                             chunks='auto', decode_times=False).rename({'S':'forecast_time', 'LA':'lead_time','M':'realization', 'X':'longitude', 'Y':'latitude'})
        # calendar '360' not recognized, but '360_day'
        for c in ['hdate','forecast_time']:
            if c in ds.coords:
                if ds[c].attrs['calendar'] == '360':
                    ds[c].attrs['calendar'] = '360_day'
        ds = xr.decode_cf(ds)
        onthefly = True if 'hdate' in ds.coords else False
        forecast_time_freq = xr.infer_freq(ds.forecast_time)
        print(model, 'on-the-fly' if onthefly else 'not on-the-fly',
              'forecast_time freq:'+forecast_time_freq if forecast_time_freq else 'forecast_time freq not found',
              '\n',ds.coords,'\n',ds.sizes,ds.nbytes/1e9,'GB','\n')
    except Exception as e:
        print(f'model={model} failed due to {type(e).__name__}: {e} \n')

BOM not on-the-fly forecast_time freq not found 
 Coordinates:
  * latitude       (latitude) float32 88.1 85.64 83.16 ... -83.16 -85.64 -88.1
  * lead_time      (lead_time) timedelta64[ns] 0 days 12:00:00 ... 61 days 12...
  * forecast_time  (forecast_time) datetime64[ns] 1981-01-01 ... 2013-12-26
  * realization    (realization) float32 1.0 2.0 3.0 4.0 ... 29.0 30.0 31.0 32.0
  * longitude      (longitude) float32 0.0 2.507 5.014 ... 353.5 356.0 358.5 
 Frozen(SortedKeysDict({'latitude': 72, 'lead_time': 62, 'forecast_time': 2376, 'realization': 32, 'longitude': 144})) 195.498364944 GB 

CNRM not on-the-fly forecast_time freq not found 
 Coordinates:
  * latitude       (latitude) float32 90.0 88.5 87.0 85.5 ... -87.0 -88.5 -90.0
  * lead_time      (lead_time) timedelta64[ns] 0 days 12:00:00 ... 60 days 12...
  * forecast_time  (forecast_time) datetime64[ns] 1993-01-01 ... 2014-12-15
  * realization    (realization) float32 1.0 2.0 3.0 4.0 ... 11.0 12.0 13.0 14.0
  * longitude      (lo

# SubX

The access to output from the SubX project does not require login information via cookie.

In [8]:
ds = xr.open_dataset('http://iridl.ldeo.columbia.edu/SOURCES/.Models/.SubX/.CESM/.30LCESM1/.hindcast/.tas/dods',
                     chunks='auto', decode_times=False)

In [9]:
# calendar '360' not recognized, but '360_day'
if ds.S.attrs['calendar'] == '360':
    ds.S.attrs['calendar'] = '360_day'

In [10]:
ds = xr.decode_cf(ds).rename({'X':'longitude', 'Y':'latitude', 'S':'forecast_time', 'L': 'lead_time', 'M':'realization', 'tas':'t2m'})
ds['t2m']

In [11]:
ds.nbytes/1e9,'GB'

(104.03446566, 'GB')

## Hindcast Availability

- center: model
- CESM:	30LCESM1 46LCESM1
- ECCC:	GEM GEPS6 GEPS5
- EMC:	GEFS GEFSv12
- ESRL:	FIMr1p1
- GMAO:	GEOS_V2p1
- NCEP:	CFSv2
- NRL:	NESM
- RSMAS:	CCSM4

In [12]:
centers = ['CESM',   'CESM',  'ECCC', 'ECCC', 'ECCC', 'EMC', 'EMC',   'ESRL',    'GMAO'    , 'NCEP', 'NRL','RSMAS']
models = ['30LCESM1','46LCESM1','GEM','GEPS6','GEPS5','GEFS','GEFSv12','FIMr1p1','GEOS_V2p1','CFSv2','NESM','CCSM4']
for center,model in zip(centers,models):
    try:
        ds = xr.open_dataset(f'https://iridl.ldeo.columbia.edu/SOURCES/.Models/.SubX/.{center}/.{model}/.hindcast/.tas/dods',
                             chunks='auto', decode_times=False).rename({'S':'forecast_time', 'L':'lead_time','M':'realization', 'X':'longitude', 'Y':'latitude'})
        # calendar '360' not recognized, but '360_day'
        for c in ['hdate','forecast_time']:
            if c in ds.coords:
                if ds[c].attrs['calendar'] == '360':
                    ds[c].attrs['calendar'] = '360_day'
        ds = xr.decode_cf(ds)
        onthefly = True if 'hdate' in ds.coords else False
        forecast_time_freq = xr.infer_freq(ds.forecast_time)
        print(model, 'on-the-fly' if onthefly else 'not on-the-fly',
              'forecast_time freq:'+forecast_time_freq if forecast_time_freq else 'forecast_time freq not found',
              '\n',ds.coords,'\n',ds.sizes,ds.nbytes/1e9,'GB','\n')
    except Exception as e:
        print(f'center={center} model={model} failed due to {type(e).__name__}: {e} \n')

30LCESM1 not on-the-fly forecast_time freq:W-WED 
 Coordinates:
  * lead_time      (lead_time) timedelta64[ns] 0 days 12:00:00 ... 44 days 12...
  * latitude       (latitude) float32 -90.0 -89.0 -88.0 -87.0 ... 88.0 89.0 90.0
  * forecast_time  (forecast_time) datetime64[ns] 1999-01-06 ... 2015-12-30
  * realization    (realization) float32 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0
  * longitude      (longitude) float32 0.0 1.0 2.0 3.0 ... 357.0 358.0 359.0 
 Frozen(SortedKeysDict({'lead_time': 45, 'latitude': 181, 'forecast_time': 887, 'realization': 10, 'longitude': 360})) 104.03446566 GB 

46LCESM1 not on-the-fly forecast_time freq:W-WED 
 Coordinates:
  * lead_time      (lead_time) timedelta64[ns] 0 days 12:00:00 ... 44 days 12...
  * latitude       (latitude) float32 -90.0 -89.0 -88.0 -87.0 ... 88.0 89.0 90.0
  * forecast_time  (forecast_time) datetime64[ns] 1999-01-06 ... 2015-12-30
  * realization    (realization) float32 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0
  * longitude      

# Opendap magic

Opendap URLs be appended for server-side preprocessing.

- https://www.opendap.org/support
- http://iridl.ldeo.columbia.edu/dochelp/topics/DODS/fnlist.html
- https://iridl.ldeo.columbia.edu/dochelp/Documentation/funcindex.html?Set-Language=en

In [13]:
from subprocess import call
fname = 'GEFS_pra_hc.nc'
# endless magic commands selecting week 3-4 and aggregating pr to tp with unit conversion
dset_url = 'http://iridl.ldeo.columbia.edu/SOURCES/.Models/.SubX/.EMC/.GEFS/.hindcast/.pr/S/(0000%206%20Jan%201999)/(0000%2028%20Dec%202015)/RANGEEDGES/S/(days%20since%201999-01-01)/streamgridunitconvert/Y/1/20/RANGE/X/-20/10/RANGE/L/(14)/(28)/RANGEEDGES/%5BL%5Daverage/S/(Jun-Aug)/VALUES/SOURCES/.Models/.SubX/.EMC/.GEFS/.hindcast/.dc9915/.pr/Y/1/20/RANGE/X/-20/10/RANGE/L/(14)/(28)/RANGEEDGES/%5BL%5Daverage/S/to366daysample/%5BYR%5Daverage/S/sampleDOY/sub/c%3A/0.001/(m3%20kg-1)/%3Ac/mul/c%3A/1000/(mm%20m-1)/%3Ac/mul/c%3A/86400/(s%20day-1)/%3Ac/mul/c%3A/7.0//units//days/def/%3Ac/mul/data.nc'
# download data with curl
call(['curl','-k',dset_url, '-o',fname])

0

In [14]:
import pandas as pd
ds = xr.open_dataset(fname).rename({'X':'longitude', 'Y':'latitude', 'S':'forecast_time', 'M':'realization', 'aprod':'tp'}).assign_coords(lead_time=pd.Timedelta('14 d'))
ds

In [15]:
# aggregate w34 precip to tp
ds = xr.open_dataset('http://iridl.ldeo.columbia.edu/SOURCES/.Models/.SubX/.EMC/.GEFS/.hindcast/.pr/S/(0000%206%20Jan%202000)/(0000%2031%20Dec%202019)/RANGEEDGES/L/(14)/(28)/RANGEEDGES/[L]sum/dods',
                     chunks='auto').rename({'X':'longitude', 'Y':'latitude', 'S':'forecast_time', 'M':'realization', 'pr':'tp'}).assign_coords(lead_time=pd.Timedelta('14 d'))
print(ds.nbytes/1e9,'GB')
ds.tp

5.080404184 GB
