# 13 NetCDF and `xarray`

In this lesson, we will get acquainted with a popuar format for working with multidimensional datasets called NetCDF and the Python package `xarray` which is based on NetCDF. 


In [1]:
# Import packages
import os 
import numpy as np
import pandas as pd
import xarray as xr


### Variable values

The underlying data in the `xarray.DataArray` is a `numpy.array` that holds the variable values. 

In [2]:
# Values of a single variable at each point of the coords 
temp_data = np.array([np.zeros((5,5)),
                      np.ones((5,5)),
                      np.ones((5,5))*2]) # Twos

temp_data


array([[[0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.]],

       [[1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.]],

       [[2., 2., 2., 2., 2.],
        [2., 2., 2., 2., 2.],
        [2., 2., 2., 2., 2.],
        [2., 2., 2., 2., 2.],
        [2., 2., 2., 2., 2.]]])

### Dimensions and Coordinates

To specify the dimensions of our upcoming `xarray.DataArray`, we must examine how we've constructed the `numpy.array` holding the temperature data. 
The first dimension is time, the second is latitude, and longitude the third. 

From our exercises, we can also see that the coordinates (values of each dimension) are:

- time coordinates are 2022-09-01, 2022-09-02, 2022-09-03
- latitude coordinates are 70, 60, 50, 40, 30 (notice decreasing order)
- longitude coordinates are 60, 70, 80, 90, 100 (notice increasing order)

We add the dimensions as a tuple of strings and coordinates as a dictionary:

In [3]:
# Names of the dimensions in the required order
dims = ('time', 'lat', 'lon')

# Create coordinates to use for indexing along each dimension 
coords = {'time':pd.date_range("2022-09-01", "2022-09-03"),
          'lat':np.arange(70, 20, -10),
          'lon':np.arange(60, 110, 10)}  

#### Attributes

Next, we add the attributes (metadata) for our temperature data as a dictionary:

In [4]:
# Attributes (metadata) of the data array 
attrs = { 'title':'temperature across weather stations',
          'standard_name':'air_temperature',
          'units':'degree_c'}

#### Putting it all together

Finally, we put all these pieces together (data, dimensions, coordinates, and attributes) to create an `xarray.DataArray`:

In [5]:
# Initialize xarray.DataArray
temp = xr.DataArray(data = temp_data, 
                    dims = dims,
                    coords = coords,
                    attrs = attrs)
temp

We can also update the variable’s attributes after creating the object. 
Notice that each of the coordinates is also an `xarray.DataArray`, so we can add attributes to them.

In [6]:
# Update attributes
temp.attrs['description'] = 'Simple example of an xarray.DataArray'

# Add attributes to coordinates 
temp.time.attrs = {'description':'date of measurement'}

temp.lat.attrs['standard_name']= 'grid_latitude'
temp.lat.attrs['units'] = 'degree_N'

temp.lon.attrs['standard_name']= 'grid_longitude'
temp.lon.attrs['units'] = 'degree_E'
temp

## Subsetting

We can subset data from `xarray` by looking up the dimension by position 

In [7]:
# Cast 1x1 xarray.DataArray as a number
temp.sel(time='2022-09-01', lat=40, lon = 80).item()

0.0

## Reduction

`xarray` has several methods to reduce an `xarray.DataArray` along any number of dimensions

Ex: calculate

In [8]:
avg_temp = temp.mean(dim = 'time')

## `xarray.DataSet`

`xarray.DataArray` = just one variable

`xarray.DataSet` = multiple varaibles, an in-memory representation of a NetCDF file. Each variable is an `xarray.DataArray`. Variables in the `xarray.DataSet` can have the same dimensions, share some dimensions, or have no dimensions in common.

Ex: bundle avg temp and temp data together

In [12]:
# Make dictionaries with variables and attributes
data_vars = {'avg_temp': avg_temp,
             'temp': temp
             }

attrs = {'title':'Temperature data at weather stations: daily and and average',
         'description':'Simple example of an xarray.Dataset'
         }

# Create xarray.Dataset
temp_dataset = xr.Dataset( data_vars = data_vars,
                           attrs = attrs
                           )

In [13]:
# Save array.DataSet as a NetCDF file
temp_dataset.to_netcdf('temp_dataset.nc')

In [14]:
# Import NetCDF file
check = xr.open_dataset('temp_dataset.nc')
check

ValueError: found the following matches with the input file in xarray's IO backends: ['netcdf4', 'h5netcdf']. But their dependencies may not be installed, see:
https://docs.xarray.dev/en/stable/user-guide/io.html 
https://docs.xarray.dev/en/stable/getting-started-guide/installing.html