# Preprocessing of REMO output for double nesting

Preprocessing of REMO output data is required if you want to a parent REMO run to drive a higher-resolution double nesting run.

## Accessing REMO output

Preparing REMO output for preprocessing if quite easy since we alredy have one timestep per file (the `tfile`). We can speed things up if we preprocess files in parallel.

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import dask
import xarray as xr
from dask.distributed import Client
import tarfile
import glob
import os
import pyremo as pr
from pyremo.preproc import remap_remo, write_forcing_file

In [3]:
client = Client(dashboard_address="localhost:8787", threads_per_worker=1, n_workers=32)



First, we prepare the input data by extracting a tfile archive from a parent REMO run.

In [4]:
tar_file_path = (
    "/work/bg1439/data/remo-results/remo_results_067000/1979/e067000t197901.tar"
)
destination_path = "/scratch/g/g300046/067000/1979/"

In [5]:
with tarfile.open(tar_file_path, "r") as tar:
    tar.extractall(path=destination_path)

Now, we collect the extracted files into a list for preprocessing:

In [8]:
tfiles = sorted(glob.glob(os.path.join(destination_path, "*.nc")))

Now, we define some functions to handle the preprocessing. Note, that we *delay* the remap function so that we can compute them later in parallel.

In [9]:
def open_remo_dataset(filename):
    return pr.parse_dates(xr.open_dataset(filename), use_cftime=True)


@dask.delayed
def remap(filename, em, hm, vc, surflib, path, expid):
    ds = open_remo_dataset(filename)
    ads = remap_remo(ds, em, hm, vc, surflib, initial=True, lice=True)
    path = path.format(date=ds.time.item())
    os.makedirs(path, exist_ok=True)
    return write_forcing_file(ads, path=path, expid=expid)

Now, let's define some details about the input and output grids and file pathes.

In [10]:
expid = "000000"
path = "/scratch/g/g300046/000000/{date:%Y}/{date:%m}"

vc = pr.vc.tables["vc_49lev_nh_pt2000"]
surflib = pr.update_meta_info(
    xr.open_dataset("/scratch/g/g300046/lib_WRC-0275_frac.nc").squeeze(drop=True).load()
)
em = pr.domain_info("AFR-22")
hm = pr.domain_info("WRC-3")

afiles = [
    remap(
        tfile,
        em,
        hm,
        vc,
        surflib,
        path,
        expid,
    )
    for tfile in tfiles
]

In [11]:
%time afiles_ = dask.compute(*afiles)

This may cause some slowdown.
Consider loading the data with Dask directly
 or using futures or delayed objects to embed the data into the graph without repetition.
See also https://docs.dask.org/en/stable/best-practices.html#load-data-with-dask for more information.


 dxemhm  0.81249999999998768     
 dyemhm  0.31250000000000644     
 dlamem, dphiem  0.22000000000000000       0.22000000000000000     
 dxemhm  0.81249999999998768     
  0.17874999999999730     
 dxemhm  0.81249999999998768     
 dyemhm  0.31250000000000644     
 dxemhm  0.81249999999998768     
   6.8750000000001421E-002
 dlamem, dphiem  0.22000000000000000       0.22000000000000000     
 dxemhm  0.81249999999998768     
 dxemhm  0.81249999999998768     
  0.81249999999998768     
 dyemhm  0.31250000000000644     
 dyemhm  0.31250000000000644     
  0.17874999999999730     
  0.31250000000000644     
 dyemhm  0.31250000000000644     
 dlamem, dphiem  0.22000000000000000       0.22000000000000000     
 dyemhm  0.31250000000000644     
 dlamem, dphiem  0.22000000000000000       0.22000000000000000     
  0.17874999999999730     
   6.8750000000001421E-002
 dlamem, dphiem  0.22000000000000000       0.22000000000000000     
 dlamem, dphiem  0.22000000000000000       0.22000000000000000 