## Cleaning up Atlas data - ETHZ ClimWIP
**Function**      : Preprocess netCDF files and restructure the dataset<br>
**Author          : Team BETA**<br>
**First Built**   : 2021.08.11<br>
**Last Update     : 2021.08.13**<br>
**Library**       : os, numpy, netcdf4, xarray<br>
**Description**   : In this notebook serves to clean up Atlas data which is given in netcdf format and aggregate the data into a single file.<br>
**Return Values   : .nc files**<br>
**Note**          : All the data is saved to netCDF4 format. Note that data from different models may vary concerning the resolution and coordinates.<br>

In [1]:
import os
import xarray as xr

### Path
Specify the path to the dataset and the place to save the outputs. <br>

In [2]:
# please specify data path
datapath = '/mnt/d/NLeSC/BETA/EUCP/Atlas'
# please specify output path
output_path = '/mnt/d/NLeSC/BETA/EUCP/Atlas/preprocess'
os.makedirs(output_path, exist_ok = True)

### Extract data
Extract weather/climate data from given netCDF files.

In [3]:
# ETHZ ClimWIP
# first check of data
dataset = xr.open_dataset(os.path.join(datapath,'ETHZ_ClimWIP',
                          'eur_pr_41-60_jja_cmip6_90perc.nc'))
#                          'eur_pr_41-60_jja_cmip6_90perc_rel.nc'))
dataset

In [4]:
# Combining multiple dimensions with a preprocessor
def add_percentile(ds):
    filename = ds.encoding["source"]
    percentile = int(filename.split('_')[-1][:2])
    _, variable, future, season, dataset, _  = filename.split('_')[1:]

    return(ds
           .assign_coords(percentile=percentile).expand_dims('percentile')
          )

# data loader and batch processing
def load_data(project, season, variable):
    # open multiple files with xarray
    ds = xr.open_mfdataset(os.path.join(datapath, 'ETHZ_ClimWIP', f'eur_{variable}_41-60_{season}_{project}_*perc.nc'),
                           preprocess=add_percentile)
    weighted = ds[f'{variable}_mean_weighted'].rename(variable).assign_coords(constrained=1).expand_dims('constrained')
    unweighted = ds[f'{variable}_mean'].rename(variable).assign_coords(constrained=0).expand_dims('constrained')    
    return xr.concat([weighted, unweighted], dim='constrained')

In [5]:
# call the function to preprocess the files and export them as netcdf files
for project in ['cmip6']: #'CORDEX' dataset is not ready
    seasons = []
    for season in ['djf', 'jja']:
        tas = load_data(project, season, 'tas')
        pr = load_data(project, season, 'pr')
        ds = xr.merge([tas, pr]).assign_coords(season=season.upper())
        seasons.append(ds)
    ds = xr.concat(seasons, dim='season')
    ds.to_netcdf(os.path.join(output_path, f'cleaned_ETHZ_ClimWIP_{project.upper()}.nc'))

### Check output
Preview saved data via hvplot. <br>

In [5]:
ds = xr.open_dataset(os.path.join(output_path,'cleaned_ETHZ_ClimWIP_CMIP6.nc'))
ds

### Preview with hvplot
Preview fields from saved files using hvplot.

In [6]:
# interactive plot for preview
import hvplot.xarray
app = ds.hvplot.quadmesh(cmap='coolwarm', coastline=True)
app
# export the app widget
#import panel as pn
#pn.Row(app).save('atlas.html', embed=True)