# 02 Working with EMIT L2A Reflectance and ECOSTRESS L2 LSTE ET Products

---

**Summary**  

In the previous notebook, we found and downloaded concurrent EMIT L2A Reflectance and ECOSTRESS L2 Land Surface Temperature and Emissivity scenes over our region of interest. In this notebook, we will open and explore both datasets to better understand the structure, then we will conduct some of the necessary preprocessing to use the data together, including: applying quality data, reprojecting, placing data on a common grid, and cropping. 

**Background**

#TODO - Add some text about ECOSTRESS and EMIT Synergies and general info about reflectance and lst

**Requirements** 
 - [NASA Earthdata Account](https://urs.earthdata.nasa.gov/home) 
 - *No Python setup requirements if connected to the workshop cloud instance!*
 - Set up Python Environment - See **setup_instructions.md** in the `/setup/` folder

**Learning Objectives**  
- How to open and work with EMIT L2A Reflectance and ECOSTRESS L2T LSTE data
- How to apply quality data to EMIT and ECOSTRESS scenes
- How to reproject and regrid ECOSTRESS data
- How to crop EMIT and ECOSTRESS data
- How to automate this workflow

**Tutorial Outline**  

1. Setup  
2. Opening and Exploring EMIT Data  
    2.1 Applying Quality Masks to EMIT Data  
    2.2 Cropping EMIT Data  
    2.3 Writing Outputs  
3. Opening and Exploring ECOSTRESS Data  
    3.1 Applying Quality Masks to ECOSTRESS Data  
    3.2 Reprojecting and Regridding ECOSTRESS Data  
    3.3 Cropping ECOSTRESS Data  
    3.4 Writing Outputs  
4. Automation


## 1 Setup 

---

### 1.1 Import Python Libraries



In [None]:
# Import Packages
import os
import glob
import earthaccess
import numpy as np
import xarray as xr
from osgeo import gdal
import rasterio as rio
import rioxarray as rxr
import hvplot.xarray
import hvplot.pandas
import holoviews as hv
import geoviews as gv
import geopandas as gp
import sys
from modules.emit_tools import emit_xarray, ortho_xr

### 1.2 Define Filepaths for one type of each file downloaded

Define a filepath for an EMIT L2A Reflectance file, EMIT L2A Mask file, and an ECOSTRESS L2T LSTE and ECOSTRESS L2T Mask file.

In [None]:
emit_fp = "../data/EMIT_L2A_RFL_001_20230405T190311_2309513_002.nc"
emit_qa_fp = "../data/EMIT_L2A_MASK_001_20230405T190311_2309513_002.nc"
eco_fp = "../data/ECOv002_L2T_LSTE_26921_001_10SGD_20230405T190258_0710_01_LST.tif"
eco_qa_fp = "../data/ECOv002_L2T_LSTE_26921_001_10SGD_20230405T190258_0710_01_cloud.tif"

Define some standards for our holoviz plots as a dictionary.

#TODO - determine what size/figure properties we want to put in this dict (can be unpacked in the `hvplot.image(**fig_opts)`

In [None]:

fig_opts = {'frame_width':1080,'frame_height':720,'alpha':0.7,'tiles':'ESRI'}

## Opening and Exploring EMIT Reflectance Data

EMIT L2A Reflectance Data are distributed in a non-orthocorrected spatially raw NetCDF4 (.nc) format consisting of the data and its associated metadata. Inside the L2A Reflectance `.nc` file there are 3 groups. Groups can be thought of as containers to organize the data. 

1. The root group that can be considered the main dataset contains the reflectance data described by the downtrack, crosstrack, and bands dimensions.  
2. The `sensor_band_parameters`  group containing the wavelength center and the full-width half maximum (FWHM) of each band.  
3. The `location` group contains latitude and longitude values at the center of each pixel described by the crosstrack and downtrack dimensions, as well as a geometry lookup table (GLT) described by the ortho_x and ortho_y dimensions. The GLT is an orthorectified image (EPSG:4326) consisting of 2 layers containing downtrack and crosstrack indices. These index positions allow us to quickly project the raw data onto this geographic grid.

To work with the EMIT data, we will use the `emit_tools` module. There are other ways to work with the data and a more thorough explanation of the `emit_tools` in the [EMIT-Data-Resources Repository](https://github.com/nasa/EMIT-Data-Resources).

Open the example EMIT scene using the `emit_xarray` function. In this step we will use the `ortho=True` argument to orthorectify the scene using the included GLT.

In [None]:
emit_ds = emit_xarray(emit_fp, ortho=True)
emit_ds

We can plot the spectra of an individual pixel closest to a latitude and longitude we want using the `sel` function from `xarray`. Using the `good_wavelengths` flag from the `sensor_band_parameters` group, mask out bands where water absorption features were assigned a value of -0.01 reflectance. Typically data around 1320-1440 nm and 1770-1970 nm is noisy due to the moisture present in the atmosphere; therefore, these spectral regions offer little information about targets and can be excluded from calculations. 

In [None]:
emit_ds['reflectance'].data[:,:,emit_ds['good_wavelengths'].data==0] = np.nan

Now select a point and plot a spectra.

In [None]:
point = emit_ds.sel(latitude=34.717,longitude=-120.042, method='nearest')
point.hvplot.line(y='reflectance',x='wavelengths', color='black').opts(
    title=f'Latitude = {point.latitude.values.round(3)}, Longitude = {point.longitude.values.round(3)}')

We can also plot individual bands spatially by selecting a wavelength, then plotting.

In [None]:
emit_layer = emit_ds.sel(wavelengths=850,method='nearest')
emit_layer.hvplot.image(geo=True,cmap='viridis').opts(title=f"{emit_layer.wavelengths:.3f} {emit_layer.wavelengths.units}")

## 2.1 Quality Masking

The EMIT L2A Mask file contains some bands that are direct masks (Cloud, Dilated, Currus, Water, Spacecraft), and some (AOD550 and H2O (g cm-2)) that contain information calculated during the L2A reflectance retrieval. These may be used as additional screening, depending on the application.  The Aggregate Flag is the mask used during EMIT L2B Mineralogy calculations, which we will also use here, but not all users might want this particular mask.

> Note: It is more memory efficient to apply the mask before orthorectifying, so during the automation section we will do that.

In [None]:
emit_mask = emit_xarray(emit_qa_fp, ortho=True)
emit_mask

List the quality flags contained in the `mask_bands` dimension.

In [None]:
emit_mask.mask_bands.data.tolist()

As mentioned, we will use the `Aggregate Flag`. Select that band, then plot it to visualize.

In [None]:
emit_aggregate_mask = emit_mask.sel(mask_bands='Aggregate Flag')

In [None]:
emit_aggregate_mask.hvplot.image(geo=True, cmap='viridis')

Apply the mask to our EMIT Data by assigning values where the `mask.data == 1` to `np.nan`

In [None]:
emit_ds.reflectance.data[emit_aggregate_mask.mask.data == 1] = np.nan

We can confirm our masking worked with a spatial plot.

In [None]:
emit_layer_filtered_plot = emit_ds.sel(wavelengths=850, method='nearest').hvplot.image(geo=True,cmap='viridis',**fig_opts)
emit_layer_filtered_plot

### 2.2 Cropping EMIT data to a Region of Interest

To crop our dataset to our ROI we first need to open a shapefile of the region. Open the included `geojson` for Sedgwick Reserve and Plot it onto our EMIT 850nm reflectance spatial plot.

In [None]:
shape = gp.read_file("../data/sedgwick_boundary_epsg4326.geojson")
shape

In [None]:
emit_layer_filtered_plot*shape.hvplot(geo=True,color='#d95f02',alpha=0.5)

Now use the `clip` function from `rasterio` to crop the data to our ROI using our shape's `geometry` and `crs`. The `all_touched=True` argument will ensure all pixels touched by our polygon will be included.

In [None]:
emit_sedgwick = emit_ds.rio.clip(shape.geometry.values,shape.crs, all_touched=True)

Plot the cropped data.

In [None]:
emit_sedgwick.sel(wavelengths=850,method='nearest').hvplot.image(geo=True, cmap='viridis', **fig_opts)

### 2.3 Write an output

Lastly for our EMIT dataset, we can write a smaller output that we can use in later notebooks, to calculate Canopy water content or other applications. We use the `granule_id` from the dataset to keep a similar naming convention.

In [None]:
# Write Clipped Output
emit_sedgwick.to_netcdf(f'../data/{emit_sedgwick.granule_id}_sedgwick.nc')

## 3.0 Working with ECOSTRESS L2T Land Surface Temperature and Emissivity

For this example we're only taking a look at the Land Surface Temperature. 

Open the LST file using `open_rasterio` from the `rioxarray` library. Since the file consists of only 1 layer, we can `squeeze` it, removing the `band` dimension.

In [None]:
eco_lst_ds = rxr.open_rasterio(eco_fp).squeeze('band', drop=True)
eco_lst_ds

#TODO - The ecostress plots need to be reprojected before plotting with `geoviews` - that fits better later in the workflow though.

In [None]:
eco_lst_ds.hvplot.image(aspect='equal',cmap='inferno')

In [None]:
eco_cloud_ds = rxr.open_rasterio(eco_qa_fp).squeeze('band', drop=True)
eco_cloud_ds


In [None]:
eco_cloud_ds.hvplot.image(aspect='equal',cmap='greys')

#TODO -  Not sure this cloud masking is necessary, it looks like the LST product has already been masked?

In [None]:
eco_lst_ds.data[eco_cloud_ds.data == 1] = np.nan

In [None]:
eco_lst_ds_regrid = eco_lst_ds.rio.reproject_match(emit_sedgwick)

Regridding to the cropped EMIT data automatically restricts us to its spatial ref/bounding box.

In [None]:
eco_lst_ds_regrid.hvplot.image(geo=True,cmap='inferno',**fig_opts)

In [None]:
eco_sedgwick = eco_lst_ds_regrid.rio.clip(shape.geometry.values,shape.crs, all_touched=True)

In [None]:
eco_sedgwick.hvplot.image(geo=True,cmap='inferno',**fig_opts)

The interactive plot doesn't work with geoviews, not sure if its something we are interested in including, but I thought it was a cool idea.

In [None]:
# Define Ecostress Map - ESRI tiles will not work
eco_map = eco_sedgwick.hvplot.image(aspect='equal',cmap='inferno')

# Stream of X and Y positional data
posxy = hv.streams.PointerXY(source=eco_map, x=-120.042, y=34.717) 
clickxy = hv.streams.Tap(source=eco_map, x=-120.042, y=34.717) 

# Function to build a new spectral plot based on mouse hover positional information retrieved from the RGB image using our full reflectance dataset 
def point_spectra(x,y):
    return emit_sedgwick.sel(longitude=x,latitude=y,method='nearest').hvplot.line(y='reflectance',x='wavelengths',
                                                                           color='#1b9e77', frame_width=400)
# Function to build spectral plot of clicked location to show on hover stream plot
def click_spectra(x,y):
    clicked = emit_sedgwick.sel(longitude=x,latitude=y,method='nearest')
    return clicked.hvplot.line(y='reflectance',x='wavelengths', color='black', frame_width=400).opts(
        title = f'Latitude = {clicked.latitude.values.round(3)}, Longitude = {clicked.longitude.values.round(3)}')
# Define the Dynamic Maps
point_dmap = hv.DynamicMap(point_spectra, streams=[posxy])
click_dmap = hv.DynamicMap(click_spectra, streams=[clickxy])

# Plot the Map and Dynamic Map side by side
(eco_map + click_dmap*point_dmap)

In [None]:
eco_outname = f"../data/{eco_fp.split('/')[-1].split('.')[0]}_sedgwick.tif"
eco_sedgwick.rio.to_raster(raster_path=eco_outname, driver='COG')

# 7. Automation

We can simplify and automate the above for all of the files we downloaded.

In [None]:
# List files
eco_lst_files = glob.glob("../data/*LST.tif")
emit_rfl_files = glob.glob("../data/EMIT_L2A_RFL_*.nc")

In [None]:
eco_lst_files

In [None]:
# Process EMIT Scenes
for rfl_fp in emit_rfl_files:
    # Get Granule ID from filename
    granule_id = rfl_fp.split("\\")[-1].split(".")[0]
    # Set Output Filepath
    out_fp = f"../data/outputs/{granule_id}_sedgwick.nc"

    # Check if desired output exists
    if not os.path.isfile(out_fp):
        # Define Path for the correct Mask File
        mask_fp = rfl_fp.replace("RFL","MASK")
        
        # Read in data
        emit_ds = emit_xarray(rfl_fp)
        # Select Aggregate Mask and Retrieve Array
        emit_mask = emit_xarray(mask_fp).sel(mask_bands="Aggregate Flag").mask.data
        # Apply Mask
        emit_ds.reflectance.data[emit_mask==1] = np.nan
        # Orthorectify Scene
        emit_ds = ortho_xr(emit_ds)
        
        # Read in Shapefile and Clip
        shape = gp.read_file("../data/sedgwick_boundary_epsg4326.geojson")
        emit_sedgwick = emit_ds.rio.clip(shape.geometry.values,shape.crs, all_touched=True)
        
        # Write Output
        emit_sedgwick.to_netcdf(out_fp)
    else: 
        print("Output File Already Exists")

In [None]:
# # Check Files
# ds = xr.open_dataset("../data/outputs/EMIT_L2A_RFL_001_20230405T190311_2309513_002_sedgwick.nc")
# ds.reflectance.sel(wavelengths=850,method='nearest').hvplot.image(geo=True, cmap='viridis')

# TO DO - Check standard gridding for EMIT, way to open concurrent granule

In [None]:
# Process ECOSTRESS Scenes

for lst_fp in eco_lst_files:
    # Get Granule ID from filename
    granule_id = lst_fp.split("\\")[-1].split(".")[0]
    # Set Output Filepath
    out_fp = f"../data/outputs/{granule_id}_sedgwick.nc"

# Check if desired output exists
    if not os.path.isfile(out_fp):
        # Define Path for the correct Mask File
        mask_fp = lst_fp.replace("LST","cloud")
        
        # Read in data
        eco_lst = rxr.open_rasterio(lst_fp).squeeze('band', drop=True)
        # Select Aggregate Mask and Retrieve Array
        eco_mask = rxr.open_rasterio(lst_fp).squeeze('band', drop=True).data
        # Apply Mask
        eco_lst.data[eco_mask==1] = np.nan
        # Reproject and Regrid
        eco_lst_ds_regrid = eco_lst_ds.rio.reproject_match(emit_sedgwick)       
        # Read in Shapefile and Clip
        shape = gp.read_file("../data/sedgwick_boundary_epsg4326.geojson")
        emit_sedgwick = emit_ds.rio.clip(shape.geometry.values,shape.crs, all_touched=True)
        
        # Write Output
        emit_sedgwick.to_netcdf(out_fp)
    else: 
        print("Output File Already Exists")
