This example demonstrates how to open NEMO files and make them compliant with xgcm.
NEMO files consist of two different types of files: 1) the domain files containing information on the grid (domain_cfg and mesh_mask files),
and 2) the nemo files containing the outputted variables (usually, the filenames are similar to `XXX_01234_01234_grid_X.nc`).
To create the `xgcm.Grid`, most of the information is located in the domain files. It is thus necessary to open both the domain and the nemo files.

These files can either be opened in two different Datasets, or combined into a single one. Both options are demonstrated in this example.


Start by importing the functions

In [None]:
from pathlib import Path
from os import listdir

from xnemogcm import open_domain_cfg, open_nemo, process_nemo, open_namelist, open_nemo_and_domain_cfg
from xnemogcm import __version__ as xnemogcm_version

In [None]:
xnemogcm_version

## First open the domain and nemo files into 2 datasets
### domain

In [None]:
help(open_domain_cfg)

---

You can provide the file names / folder using 3 similar methods:
1. Give the path to the files and xnemogcm opens the domain_cfg_out and/or mesh_mesk files
2. Give the path to the data folder + the name of the files
3. Give the name of the files that already contain the tree (e.g. ['/path/to/file1', '/path/to/file2']

These 3 methods are equivalent, however if your domain files don't have the standard names you need to provide them by hand.

We use one of the test folder:

In [None]:
datadir = Path('../../xnemogcm/test/data/4.2.0/open_and_merge/')

In [None]:
print(listdir(datadir))

In [None]:
domcfg = open_domain_cfg(datadir=datadir)
# or
domcfg = open_domain_cfg(datadir=datadir, files=['mesh_mask.nc'])
# or
domcfg = open_domain_cfg(files=datadir.glob('*mesh_mask*.nc'))
domcfg

### Nemo

2 options here: 1) open netcdf files and make the preprocess automatically with `open_nemo` or 2) open by hand the files (or retrieve them from anywhere, e.g. zarr on a remote) and process using `process_nemo`.

Note: `open_nemo` internally uses `process_nemo`.

#### open_nemo

In [None]:
help(open_nemo)

---
We can provide the files folder / name following the same convention as for the `open_domain_cfg` function. We also **need** to provide the `domcfg` dataset so xnemogcm knows how to set the variables on the proper grid position. We can also provide extra kwargs to the underlying call to `xarray.open_mfdataset` function.

In [None]:
nemo = open_nemo(domcfg=domcfg, datadir=datadir)
# or
nemo = open_nemo(domcfg=domcfg, files=datadir.glob('*grid*.nc'))
# or, using attributes from dataset and not name
datadir2 = Path('../../xnemogcm/test/data/4.2.0/nemo_no_grid_in_filename/')
nemo = open_nemo(
    domcfg=domcfg, files=[
        datadir2 / 'T.nc',
        datadir2 / 'U.nc',
        datadir2 / 'V.nc',
        datadir2 / 'W.nc'
    ]
)
nemo

#### process_nemo

In [None]:
help(process_nemo)

In [None]:
import xarray as xr
datadir2 = Path('../../xnemogcm/test/data/4.2.0/nemo_no_grid_in_filename/')
nemo = process_nemo(
    positions=[
        (xr.open_dataset(datadir2 / 'T.nc'), 'T'),
        (xr.open_dataset(datadir2 / 'U.nc'), 'U'),
        (xr.open_dataset(datadir2 / 'V.nc'), 'V'),
        (xr.open_dataset(datadir2 / 'W.nc'), 'W')
    ],
    domcfg=domcfg
)
# or, if the datasets contain the attribute 'description'
nemo = process_nemo(
    positions=[
        (xr.open_dataset(datadir2 / 'T.nc'), None),
        (xr.open_dataset(datadir2 / 'U.nc'), None),
        (xr.open_dataset(datadir2 / 'V.nc'), None),
        (xr.open_dataset(datadir2 / 'W.nc'), None)
    ],
    domcfg=domcfg
)

## Open both at once

It is possible to open the domain and nemo output at once in one unique dataset. What happens is that 2 datasets are created and then merged. Thus all option possible for the `open_nemo` and `open_domain_cfg` functions are still possible.

In [None]:
help(open_nemo_and_domain_cfg)

---
Again, multiple equivalent arguments are possible to open the data

In [None]:
# the simplest for simple cases, provide the path
ds = open_nemo_and_domain_cfg(nemo_files=datadir, domcfg_files=datadir)
# or provide the files
ds = open_nemo_and_domain_cfg(nemo_files=datadir.glob('*grid*.nc'), domcfg_files=datadir.glob('*mesh*.nc'))
# or use the nemo_kwargs and domcfg_kwargs dictionaries
ds = open_nemo_and_domain_cfg(nemo_kwargs=dict(datadir=datadir), domcfg_kwargs={'files':datadir.glob('*mesh*.nc')})
ds

### Remark

All opening are lazy using dask, which makes files quick to open, until you actually load the data you need

## Namelist

It can be convenient to open the namelist used for the run (e.g. to compare different runs with different parameters). This is possible using the `f90nml` package (it needs to be installed, this is an optional dependency).

In [None]:
help(open_namelist)

---
Here you provide the folder path containing the reference and configuration namelists, or the filenames (as for nemo and domcfg). You can choose to load both, or only one of them. The configuration namelist will overwrite the default one.

For this we need to use another folder of the test data (with simplified namelists for the example):

In [None]:
datadir = Path('../../xnemogcm/test/data/namelist/')

In [None]:
print(listdir(datadir))

In [None]:
name = open_namelist(datadir)
name

In [None]:
name.nn_it000