# NetCDF precipitation file arrangement for reading with loadeR 

The file in question to be modified contains daily data from 1998 to 2020 of total precipitation in a specific area of the South Pacific basin. The problem lies in the existence of an expver variable, which contains two values 1 and 5 that refer to the fact that the data are from different versions. That is, for expver 1 corresponds to ERA5 and exper 5 corresponds to ERA5RT. This variable gives problems when reading the data both in NetCDF tools and with the loadeR library. For this reason this variable must be eliminated and the pp data must be structured around the coordinates

A new .nc file will be created with the help of the xArray library, including the same data as the original .nc file except that the expver variable will be removed and the pp data will be 3-dimensional and not 4-dimensional.

In [1]:
import numpy as np
import xarray as xr
import time

The file to be modified is read.

In [2]:
data = xr.open_dataset("precip_reanalysis.nc")
data

In [5]:
data.tp[1:3,:,1:3,1:3]

With the nansum over axis 1 the variable expver is removed and the data is established on 3 variables (time,lon,lat).

In [7]:
np.nansum(data.tp[1:3,:,1:3,1:3], axis = 1)

array([[[9.5354393e-05, 9.9470839e-05],
        [1.9207969e-04, 1.3376959e-04]],

       [[1.8626451e-09, 1.8626451e-09],
        [1.8626451e-09, 1.8626451e-09]]], dtype=float32)

In [8]:
data.tp[:,:,1:3,1:3].shape, np.nansum(data.tp[:,:,1:3,1:3], axis = 1).shape

((8401, 2, 2, 2), (8401, 2, 2))

The new mls data (mls_modified) is computed.

In [10]:
start_time = time.time()
tp_modified = np.nansum(data.tp, axis = 1)
print("--- %s seconds ---" % (time.time() - start_time))

--- 19.170727729797363 seconds ---


The new dataset is created. The structure must be the same as in the original data to avoid some errors. As can be seen in the cell below, the only modification with respect to the original data is the variable tp.

In [15]:
ds = xr.Dataset({
    "tp": (["time", "latitude", "longitude"], tp_modified)
    },
    coords = {
        "longitude": (["longitude"], data.longitude),
        "latitude": (["latitude"], data.latitude),
        "time": (["time"], data.time)
    }
    )

In [18]:
ds

Conventions and history is added as attributes.

In [19]:
ds.attrs["Conventions"] = data.Conventions
ds.attrs["history"] = data.history
ds

Also the metadata of each coordinate and mls data must be added to our new Dataset.

In [20]:
ds.longitude.attrs = data.longitude.attrs
ds.latitude.attrs = data.latitude.attrs
ds.time.attrs = data.time.attrs
ds.tp.attrs = data.tp.attrs

ds

Finally, the dataset created is saved as a .nc file.

In [21]:
ds.to_netcdf("precip_reanalysis_1998-2020_mod.nc")

In [2]:
pp = xr.open_dataset("C:/Users/usuario/Desktop/TRMM-Calibration/Data/precip_reanalysis_1998-2020_mod.nc")
pp