# NetCDF file arrangement for reading with loadeR

The file in question to be modified contains six hourly data from 1998 to 2020 of mean surface pressure in a specific area of the South Pacific basin. The problem lies in the existence of an expver variable, which contains two values 1 and 5 that refer to the fact that the data are from different versions. That is, for expver 1 corresponds to ERA5 and exper 5 corresponds to ERA5RT. This variable gives problems when reading the data both in NetCDF tools and with the loadeR library. For this reason this variable must be eliminated and the msl data must be structured around the coordinates

A new .nc file will be created with the help of the xArray library, including the same data as the original .nc file except that the expver variable will be removed and the msl data will be 3-dimensional and not 4-dimensional.

In [1]:
import xarray as xr
import numpy as np

The file to be modified is read.

In [2]:
data = xr.open_dataset("slp_1998-2020.nc")
data

We can note that msl data in the expver variable is composed by a finite value and a nan value. Specifically, the finite values correspond to the expver = 1, except from 01-12-2020 to 31-12-2020 (both included) where the finite values correspond to expver = 5.

In [3]:
data.msl[1,:,1,1]

In [6]:
np.nansum(data.msl[1,:,1,1], axis = 0)

100773.35

In [18]:
data.msl[1:3,:,1:3,1:3]

With the nansum over axis 1 the variable expver is removed and the data is established on 3 variables (time,lon,lat).

In [15]:
np.nansum(data.msl[1:3,:,1:3,1:3], axis = 1).shape

(2, 2, 2)

In [19]:
data.msl[:,:,1:3,1:3].shape

(33604, 2, 2, 2)

In [20]:
np.nansum(data.msl[:,:,1:3,1:3], axis = 1).shape

(33604, 2, 2)

The new mls data (mls_modified) is computed. This process may take 15-20 minits since the dimensions are quite big.

In [22]:
mls_modified = np.nansum(data.msl, axis = 1)

In [24]:
data.msl.shape, mls_modified.shape

((33604, 2, 121, 201), (33604, 121, 201))

The new dataset is created. The structure must be the same as in the original data to avoid some errors. As can be seen in the cell below, the only modification with respect to the original data is the variable msl.

In [38]:
ds = xr.Dataset({
    "msl": (["time", "latitude", "longitude"], mls_modified)
    },
    coords = {
        "longitude": (["longitude"], data.longitude),
        "latitude": (["latitude"], data.latitude),
        "time": (["time"], data.time)
    }
    )

In [39]:
ds

Conventions and history is added as attributes.

In [40]:
ds.attrs["Conventions"] = data.Conventions
ds.attrs["history"] = data.history
ds

Also the metadata of each coordinate and mls data must be added to our new Dataset. 

In [43]:
ds.longitude.attrs = data.longitude.attrs
ds.latitude.attrs = data.latitude.attrs
ds.time.attrs = data.time.attrs
ds.msl.attrs = data.msl.attrs

ds

Finally, the dataset created is saved as a .nc file.

In [44]:
ds.to_netcdf("slp_1998-2020_mod.nc")

In [2]:
slp = xr.open_dataset("C:/Users/usuario/Desktop/TRMM-Calibration/Data/slp_1998-2020_mod.nc")

In [3]:
slp