## Cleaning up Atlas data - CNRM HistC
**Function**      : Preprocess netCDF files and restructure the dataset<br>
**Author          : Team BETA**<br>
**First Built**   : 2021.09.14<br>
**Last Update     : 2021.10.01**<br>
**Library**       : os, numpy, netcdf4, xarray<br>
**Description**   : In this notebook serves to clean up Atlas data which is given in netcdf format and aggregate the data into a single file.<br>
**Return Values   : .nc files**<br>
**Note**          : All the data is saved to netCDF4 format. Note that data from different models may vary concerning the resolution and coordinates.<br>

In [1]:
import os
import numpy as np
import xarray as xr

### Path
Specify the path to the dataset and the place to save the outputs. <br>

In [2]:
# please specify data path
datapath = '/mnt/d/NLeSC/BETA/EUCP/Atlas'
# please specify output path
output_path = '/mnt/d/NLeSC/BETA/EUCP/Atlas/preprocess'
os.makedirs(output_path, exist_ok = True)

### Extract data
Extract weather/climate data from given netCDF files.

In [3]:
# CNRM HistC
# first check of data
dataset_tas_djf = xr.open_dataset(os.path.join(datapath,'CNRM',
                          'CNRM_atlas_tas_CMIP6_histssp585_DJF_latlon.nc'))
dataset_tas_djf

In [4]:
# data loader
dataset_tas_djf = xr.open_dataset(os.path.join(datapath,'CNRM',
                                  'CNRM_atlas_tas_CMIP6_histssp585_DJF_latlon.nc'))
dataset_tas_jja = xr.open_dataset(os.path.join(datapath,'CNRM',
                                  'CNRM_atlas_tas_CMIP6_histssp585_JJA_latlon.nc'))

In [5]:
# check target lat and lon from data sets
print(dataset_tas_djf["lat"][48:67])
print(dataset_tas_djf["lon"][:16])
print(dataset_tas_djf["lon"][-4:]-360)

<xarray.DataArray 'lat' (lat: 19)>
array([31.25, 33.75, 36.25, 38.75, 41.25, 43.75, 46.25, 48.75, 51.25, 53.75,
       56.25, 58.75, 61.25, 63.75, 66.25, 68.75, 71.25, 73.75, 76.25])
Coordinates:
  * lat      (lat) float64 31.25 33.75 36.25 38.75 ... 68.75 71.25 73.75 76.25
Attributes:
    units:      degrees_north
    long_name:  lat
    axis:       Y
<xarray.DataArray 'lon' (lon: 16)>
array([ 1.25,  3.75,  6.25,  8.75, 11.25, 13.75, 16.25, 18.75, 21.25, 23.75,
       26.25, 28.75, 31.25, 33.75, 36.25, 38.75])
Coordinates:
  * lon      (lon) float64 1.25 3.75 6.25 8.75 11.25 ... 31.25 33.75 36.25 38.75
Attributes:
    units:      degrees_east
    long_name:  lon
    axis:       X
<xarray.DataArray 'lon' (lon: 4)>
array([-8.75, -6.25, -3.75, -1.25])
Coordinates:
  * lon      (lon) float64 351.2 353.8 356.2 358.8


In [6]:
# due to the lon from 0-360 to -180-180
dataset_tas_djf.coords['lon'] = (dataset_tas_djf.coords['lon'] + 180) % 360 - 180
dataset_tas_djf = dataset_tas_djf.sortby(dataset_tas_djf.lon)

dataset_tas_jja.coords['lon'] = (dataset_tas_jja.coords['lon'] + 180) % 360 - 180
dataset_tas_jja = dataset_tas_djf.sortby(dataset_tas_jja.lon)

dataset_tas_djf

In [7]:
# create an empty xarray to host the processed
ds = xr.Dataset(
                {"tas": (("season", "constrained", "percentile", "lat", "lon"),
                 np.random.rand(2, 2, 4, 19, 20))},
                 #"pr": (("season", "constrained", "percentile", "lat", "lon"),
                 #np.random.rand(2, 2, 4, 19, 20))},
                coords={
                         "season": ["DJF", "JJA"],
                         "constrained": [1, 0],
                         "percentile": [10, 25, 75, 90],
                         "lat": [31.25, 33.75, 36.25, 38.75, 41.25, 43.75, 46.25, 48.75, 51.25, 53.75,
                                 56.25, 58.75, 61.25, 63.75, 66.25, 68.75, 71.25, 73.75, 76.25],
                         "lon": [-8.75, -6.25, -3.75, -1.25, 1.25, 3.75, 6.25, 8.75, 11.25, 13.75,
                                 16.25, 18.75, 21.25, 23.75, 26.25, 28.75, 31.25, 33.75, 36.25, 38.75]
                 },
                 attrs={"description":"CNRM HistC data."}
)
ds

In [8]:
# assembly data
def assembly(ds_original, ds_target, var, season, constrained, percentile):
    """
    Select data from original nc files and save the target fields
    
    """
    cons = ["uncons","cons"] # 0: unconstrained 1: constrained
    seasons = ["DJF", "JJA"]
    key_s = dict(zip(seasons, range(len(seasons))))
    for i, c in enumerate(constrained):
        for j, p in enumerate(percentile):
            # select Europe
            ds_target[f"{var}"].values[key_s[season],i,j,:,:] = ds_original[f"q{p}_{cons[i]}"].sel(lat=slice(30, 77), lon=slice(-9, 39))#values[48:67, :16]

In [9]:
# call the function to preprocess the files and export them as netcdf files
# DJF
assembly(dataset_tas_djf, ds, "tas", "DJF", ds.coords["constrained"].values[:],
         ds.coords["percentile"].values[:])
# JJA
assembly(dataset_tas_jja, ds, "tas", "JJA", ds.coords["constrained"].values[:],
         ds.coords["percentile"].values[:])
# save to netcdf
ds.to_netcdf(os.path.join(output_path, f'cleaned_CNRM_HistC_CMIP6.nc'))

### Check output
Preview saved data via hvplot. <br>

In [10]:
ds = xr.open_dataset(os.path.join(output_path,'cleaned_CNRM_HistC_CMIP6.nc'))
ds