<center>
<img src='./img/nsidc_logo.png'/>

# 2.0 Read and Plot SMAP data using `xarray.DataTree`

</center>
---

## 1. Overview  
In this tutorial, we will read the SMAP data we downloaded in 1.0 Download SMAP data notebook, read the data into an `xarray.DataTree`, create a map of soil moisture and plot a time-series soil moisture for a location on Earth.

`xarray.DataTree` was created to work with hierachical datasets.  Hierachical datasets are those datasets that use tree-like, nested, directory structures to group and store complex data.  The SMAP Level-3 is a relatively simple hierachical data structure, with only two groups; one group for data from AM satellite overpasses and one group for data from PM satellite overpasses.  To make working with this data easy, we also add meaningful names for the dimensions of the data and geospatial coordinates.  

### **Credits**

This tutorial is based on the notebooks originally provided to NSIDC by Adam Purdy. Jennifer Roebuck of NSIDC updated the tutorials to include the latest version of SMAP data and use earthaccess for authentication, seatching for and downloading the data in order to incorporate it into the NSIDC-Data-Tutorials repo. 

For questions regarding the notebook, or to report problems, please create a new issue in the [NSIDC-Data-Tutorials repo](https://github.com/nsidc/NSIDC-Data-Tutorials/issues).

### **Learning Goals**

1. Read in SMAP data and navigate the metadata
2. Create a map with SMAP data
3. Plot a time-series at a location on Earth. 

### **Prerequisites**

1. The nsidc-tutorials environment is set up and activated. This [README](https://github.com/nsidc/NSIDC-Data-Tutorials/blob/main/README.md) has setup instructions.
2. SMAP data that were downloaded in the previous notebook tutorial 1.0 Download SMAP data. 
3. The EASE-Grid 2.0 longitude and latitude data sets. The binary format of these files have been provided within this repo for use in this tutorial, but please note they are also available in NetCDF format from the NSIDC website at this [page](https://nsidc.org/data/NSIDC-0772/versions/1). 


### **Time requirement**

Allow 15 to 20 minutes to complete this tutorial.

## 2. Tutorial steps

### Import libraries

As with all Python code, we need to import some libraries to read the data, add coordinates and plot the data.

In [None]:
# For opening the data
from pathlib import Path
import xarray as xr

# For adding coordinates
import numpy as np
from affine import Affine
from pyproj import CRS

# For plotting
import matplotlib.pyplot as plt
import cartopy.crs as ccrs

DATAPATH = Path("data/L3_SM_P")

### Get a list of HDF5 files in the data directory

In [None]:
filelist = list(DATAPATH.glob("*.h5"))

### Load a data file into an `xarray.DataTree` object

`xarray.DataTree` objects allow us to work with hierachical data structures and file formats such as HDF5, Zarr and NetCDF4 with groups.  The SMAP level 3 files are hierachical data structures.

We use `xr.open_datatree` to open a SMAP HDF5 file.  We add the `phony_dims="sort"` because data variables in the SMAP files do not have any assigned dimension scales.  `xarray` names dimensions `phony_dim0`, `phony_dim1`, etc.

In [None]:
dt = xr.open_datatree(filelist[0], phony_dims='sort')  # Might not be needed once xarray-datatree updated
dt

`open_datatree` returns an `xarray.DataTree` object that we assign to `dt`.  We can see from the representation of `dt` that there are three groups in the top (called `root`) level.  Clicking on **Groups** reveals that the three groups are `Metadata`, `Soil_Moisture_Retrieval_Data_AM`, and `Soil_Moisture_Retrieval_Data_PM`.  We can also see that there are no dimensions, coordinates, data variables or attributes in the `root` group. 

The `Metadata` group has 11 sub-groups that contain information about data quality and the SMAP instrument.  `Soil_Moisture_Retrieval_Data_AM` and `Soil_Moisture_Retrieval_Data_PM` don't have any subgroups but they each contain 53 variables.  We can also see the names and sizes of the Phony Dimensions.  We can also see that the phony dimensions for each group have the same sizes; `phony_dim0` is the same size as `phony_dim3`, `phony_dim1` is the same size as `phony_dim4` and `phony_dim2` is the same size as `phony_dim3`.  In fact, `phony_dim0` and `phony_dim3` are the `y` dimension of the data grids; `phony_dim1` and `phony_dim4` are the `x` dimension; and `phony_dim2` and `phony_dim5` are the number of IGBP land cover classes.

### Add coordinates to the data

Adding coordinates to the data variables allows us to work with the data as a geospatial dataset: performing geospatial analyses, reprojecting the data and making maps.

First, we will change the names of the _phony dims_ to be more meaningful: `x`, `y` and `igbp_class`.  We do this by using `rename` to rename the dimensions.  To use `rename` we also have to use the dataset accessor `ds` and overwrite the othe soil moisture variables.  After each renaming, we update the soil moisture variable in the datatree to align the common dimensions. 

In [None]:
dt["Soil_Moisture_Retrieval_Data_AM"] = \
  dt["Soil_Moisture_Retrieval_Data_AM"].ds.rename(
      {
          'phony_dim_0': 'y', 
          'phony_dim_1': 'x', 
          'phony_dim_2': 'igbp_class'
      }
  )
dt.update(other=dt["Soil_Moisture_Retrieval_Data_AM"])

dt["Soil_Moisture_Retrieval_Data_PM"] = \
  dt["Soil_Moisture_Retrieval_Data_PM"].ds.rename(
      {
          'phony_dim_3': 'y', 
          'phony_dim_4': 'x', 
          'phony_dim_5': 'igbp_class'
      }
  )
dt.update(other=dt["Soil_Moisture_Retrieval_Data_PM"])

In [None]:
dt

We can now see that `x`, `y` and `igbp_class` dimensions are in the root group.

Now we can add coordinates values.  We will generate coordinates using an Affine matrix. See [working_with_smap_in_xarray.ipynb](./working_with_smap_in_xarray.ipynb) for an explanation of this step.

In [None]:
grid_cell_width = 36032.220840584
grid_cell_height = -36032.220840584
x_upper_left_corner = -17367530.4451615
y_upper_left_corner = 7314540.8306386

transform = Affine(grid_cell_width, 0.0, x_upper_left_corner, 
                   0.0, grid_cell_height, y_upper_left_corner)

nrows, ncolumns, nclass = dt.dims['y'], dt.dims['x'], dt.dims['igbp_class']  #data_vars['soil_moisture'][1].shape
row = np.arange(0.5, nrows)
column = np.arange(0.5, ncolumns)

x, _ = transform * (column, 0.5)
_, y = transform * (0.5, row)
igbp_class = np.arange(nclass)  # Add attributes

We then assign coordinate variables to the DataTree as `xarray.DataArrays`.

In [None]:
dt = dt.assign(
    {
        'x': xr.DataArray(x, dims='x'),
        'y': xr.DataArray(y, dims='y'),
        'igbp_class': xr.DataArray(igbp_class, dims='igbp_class')
    }
)
dt

Now that we have the coordinates added to the DataTree we can plot data on a map with coastlines or other features we want to add.

In [None]:
EASEGrid2 = ccrs.epsg(CRS.from_epsg(6933).to_epsg())

fig = plt.figure(figsize=(12,7))
ax = fig.add_subplot(projection=EASEGrid2)

dt["Soil_Moisture_Retrieval_Data_AM"].soil_moisture.plot(ax=ax)
ax.coastlines()
ax.set_title('Soil Moisture');