## Cleaning up Atlas data - University of Reading CALL
**Function**      : Preprocess netCDF files and restructure the dataset<br>
**Author          : Team BETA**<br>
**First Built**   : 2021.10.25<br>
**Last Update     : 2021.10.25**<br>
**Library**       : os, numpy, netcdf4, xarray<br>
**Description**   : In this notebook serves to clean up Atlas data which is given in netcdf format and aggregate the data into a single file.<br>
**Return Values   : .nc files**<br>
**Note**          : All the data is saved to netCDF4 format. Note that data from different models may vary concerning the resolution and coordinates.<br>

In [1]:
import os
from pathlib import Path
import xarray as xr

### Path
Specify the path to the dataset and the place to save the outputs. <br>

In [2]:
# please specify data path
datapath = '/mnt/d/NLeSC/BETA/EUCP/Atlas'
# please specify output path
output_path = '/mnt/d/NLeSC/BETA/EUCP/Atlas/preprocess'
os.makedirs(output_path, exist_ok = True)

### Extract data
Extract weather/climate data from given netCDF files.

In [3]:
# UoR CALL
# first check of data
dataset = xr.open_dataset(Path(datapath,'UoR_CALL',
                          'pr_djf_10perc_UNCONST.nc'))
dataset

In [4]:
# Combining multiple dimensions with a preprocessor
def add_percentile(ds):
    filename = ds.encoding["source"]
    print(filename)
    percentile = int(filename.split('_')[-2][:2])

    return(ds
           .assign_coords(percentile=percentile).expand_dims('percentile')
          )

# data loader and batch processing
def load_data(project, season, variable):
    # open multiple files with xarray
    ds_cons = xr.open_mfdataset(str(Path(datapath, 'UoR_CALL', f'{variable}_{season}_*perc_CONST.nc')),
                                preprocess=add_percentile)
    ds_uncons = xr.open_mfdataset(str(Path(datapath, 'UoR_CALL', f'{variable}_{season}_*perc_UNCONST.nc')),
                                  preprocess=add_percentile)    

    weighted = ds_cons['VARchange'].rename(variable).assign_coords(constrained=1).expand_dims('constrained') 
    unweighted = ds_uncons['VARchange'].rename(variable).assign_coords(constrained=0).expand_dims('constrained')
    return xr.concat([weighted, unweighted], dim='constrained')

In [5]:
# call the function to preprocess the files and export them as netcdf files
for project in ['cmip5']:
    seasons = []
    for season in ['djf', 'jja']:
        tas = load_data(project, season, 'tas')
        pr = load_data(project, season, 'pr')
        ds = xr.merge([tas, pr]).assign_coords(season=season.upper())
        seasons.append(ds)
    ds = xr.concat(seasons, dim='season')
    # re-arrange the dimensions from (lon, lat) to (lat, lon)
    ds = ds.transpose(..., 'lat', 'lon')
    ds.to_netcdf(Path(output_path, f'cleaned_UoR_CALL_{project.upper()}.nc'))

/mnt/d/NLeSC/BETA/EUCP/Atlas/UoR_CALL/tas_djf_10perc_CONST.nc
/mnt/d/NLeSC/BETA/EUCP/Atlas/UoR_CALL/tas_djf_25perc_CONST.nc
/mnt/d/NLeSC/BETA/EUCP/Atlas/UoR_CALL/tas_djf_50perc_CONST.nc
/mnt/d/NLeSC/BETA/EUCP/Atlas/UoR_CALL/tas_djf_75perc_CONST.nc
/mnt/d/NLeSC/BETA/EUCP/Atlas/UoR_CALL/tas_djf_90perc_CONST.nc
/mnt/d/NLeSC/BETA/EUCP/Atlas/UoR_CALL/tas_djf_10perc_UNCONST.nc
/mnt/d/NLeSC/BETA/EUCP/Atlas/UoR_CALL/tas_djf_25perc_UNCONST.nc
/mnt/d/NLeSC/BETA/EUCP/Atlas/UoR_CALL/tas_djf_50perc_UNCONST.nc
/mnt/d/NLeSC/BETA/EUCP/Atlas/UoR_CALL/tas_djf_75perc_UNCONST.nc
/mnt/d/NLeSC/BETA/EUCP/Atlas/UoR_CALL/tas_djf_90perc_UNCONST.nc
/mnt/d/NLeSC/BETA/EUCP/Atlas/UoR_CALL/pr_djf_10perc_CONST.nc
/mnt/d/NLeSC/BETA/EUCP/Atlas/UoR_CALL/pr_djf_25perc_CONST.nc
/mnt/d/NLeSC/BETA/EUCP/Atlas/UoR_CALL/pr_djf_50perc_CONST.nc
/mnt/d/NLeSC/BETA/EUCP/Atlas/UoR_CALL/pr_djf_75perc_CONST.nc
/mnt/d/NLeSC/BETA/EUCP/Atlas/UoR_CALL/pr_djf_90perc_CONST.nc
/mnt/d/NLeSC/BETA/EUCP/Atlas/UoR_CALL/pr_djf_10perc_UNCONST.nc
/m

### Check output
Preview saved data via hvplot. <br>

In [6]:
ds = xr.open_dataset(Path(output_path,'cleaned_UoR_CALL_CMIP5.nc'))
ds