# How to: Extracting EMIT Spectra using a Shapefile/GeoJSON

**Summary**  

In this notebook we will open a netCDF4 file from the Earth Surface Minteral Dust Source Investigation (EMIT) as an `xarray.Dataset`. We will then extract extract or clip to an area using a `.geojson` file (will also work with shapefile). The workflows outlined here will work with reflectance L2A or radiance L1B data.

**Requirements:**
+ A NASA [Earthdata Login](https://urs.earthdata.nasa.gov/) account is required to download EMIT data   
+ Selected the `emit_tutorials` environment as the kernel for this notebook.
  + For instructions on setting up the environment, follow the the `setup_instructions.md` included in the `/setup/` folder of the repository.  
+ Downloaded the necessary EMIT files to the `../data/` folder.
  + Instructions and a list of files can be found in the `setup_instructions.md` included in the `/setup/` folder of the repository.

**Learning Objectives**  
- How to open and EMIT Dataset as an `xarray.Dataset`
- How to extract values or clip an EMIT dataset to a region of interest
- How to write a new netCDF4 output using the clipped data

---

Import the required Python libraries.

In [None]:
# Import Packages
import os
from osgeo import gdal
import xarray as xr
import rasterio as rio
import rioxarray as rxr
import hvplot.xarray
import holoviews as hv
import geopandas as gp
import sys
sys.path.append('../modules/')
from emit_tools import emit_xarray

Set the path to the downloaded EMIT data as an object. In this example we use an L2A Reflectance file, but this workflow will also work for an L1B Radiance file.

In [None]:
fp = '/home/jovyan/shared/2023-emit-tutorials/data/EMIT_L2A_RFL_001_20220903T163129_2224611_012.nc'

Open the file downloaded and defined as `fp`. To do this, we will use the `emit_xarray` function from the `emit_tools` module. This module contains a few helpful functions that can be used with EMIT data.

In [None]:
ds = emit_xarray(fp)
ds

Using the `read_file()` function from `geopandas`, read in the `.geojson` file containing the polygon you wish to extract.

In [None]:
shape = gp.read_file('../data/isla_gaviota.geojson')
shape

Plot the polygon object we read onto a spatial plot of the dataset to understand where it's located.

In [None]:
ds.sel(bands=63).hvplot.image(cmap='greys', rasterize=True)*shape.hvplot(color='yellow',alpha=0.4, aspect = 'equal')

Use the `clip` function from `rasterio` to clip the dataset to polygons from the `geopandas.geodataframe`. Setting `all_touched` to `True` will include pixels that intersected with the edges of the polygon. 

In [None]:
clipped = ds.rio.clip(shape.geometry.values,shape.crs, all_touched=True)
clipped

To view the clipped image, select a band from the `clipped` dataset and plot it spatially.

In [None]:
clipped.isel(bands=40).hvplot.image(cmap='viridis', aspect = 'equal', frame_width=500, rasterize=True)

Now we can save the clipped `xarray.Dataset` as a netCDF4 output that can be reopened using the `xarray.open_dataset` function. 

In [None]:
clipped.to_netcdf('../data/clipped_data.nc')
# Example for Opening 
# ds = xr.open_dataset('../data/clipped_data.nc')

---

## Contact Info:  

Email: LPDAAC@usgs.gov  
Voice: +1-866-573-3222  
Organization: Land Processes Distributed Active Archive Center (LP DAAC)¹  
Website: <https://lpdaac.usgs.gov/>  
Date last modified: 01-09-2023  

¹Work performed under USGS contract G15PD00467 for NASA contract NNG14HH33I.  