# How to: Extracting EMIT Spectra using a Shapefile/GeoJSON

**Summary**  

In this notebook we will open a netCDF4 file from the Earth Surface Minteral Dust Source Investigation (EMIT) as an `xarray.Dataset`. We will then extract extract or clip to an area using a `.geojson` file (will also work with shapefile). The workflows outlined here will work with reflectance L2A or radiance L1B data.

**Requirements:**
+ A NASA [Earthdata Login](https://urs.earthdata.nasa.gov/) account is required to download EMIT data   
+ Selected the `emit_tutorials` environment as the kernel for this notebook.
  + For instructions on setting up the environment, follow the the `setup_instructions.md` included in the `/setup/` folder of the repository.  

**Learning Objectives**  
- How to open and EMIT Dataset as an `xarray.Dataset`
- How to extract values or clip an EMIT dataset to a region of interest
- How to write a new netCDF4 output using the clipped data

---

Import the required Python libraries.

In [1]:
# Import Packages
import os
import earthaccess
import xarray as xr
from osgeo import gdal
import rasterio as rio
import rioxarray as rxr
import hvplot.xarray
import hvplot.pandas
import holoviews as hv
import geopandas as gp
import sys
sys.path.append('../modules/')
from emit_tools import emit_xarray

Login to your NASA Earthdata account and create a `.netrc` file using the `login` function from the `earthaccess` library. If you do not have an Earthdata Account, you can create one [here](https://urs.earthdata.nasa.gov/home). 

In [2]:
earthaccess.login(persist=True)

EARTHDATA_USERNAME and EARTHDATA_PASSWORD are not set in the current environment, try setting them or use a different strategy (netrc, interactive)
You're now authenticated with NASA Earthdata Login
Using token with expiration date: 07/08/2023
Using .netrc file for EDL


<earthaccess.auth.Auth at 0x172dcb9bf10>

For this notebook we will download the files necessary using `earthaccess`. You can also access the data in place or stream it, but this can slow due to the file sizes. Provide a URL for an EMIT L2A Reflectance granule.

In [3]:
url = 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/EMITL2ARFL.001/EMIT_L2A_RFL_001_20220903T163129_2224611_012/EMIT_L2A_RFL_001_20220903T163129_2224611_012.nc'

Get an HTTPS Session using your earthdata login, set a local path to save the file, and download the granule asset - This may take a while, the reflectance file is approximately 1.8 GB. 

In [4]:
# Get Https Session using Earthdata Login Info
fs = earthaccess.get_fsspec_https_session()
# Retrieve granule asset ID from URL (to maintain existing naming convention)
granule_asset_id = url.split('/')[-1]
# Define Local Filepath
fp = f'../../data/{granule_asset_id}'
# Download the Granule Asset if it doesn't exist
if not os.path.isfile(fp):
    fs.download(url, fp)

Open the file downloaded and defined as `fp`. To do this, we will use the `emit_xarray` function from the `emit_tools` module. This module contains a few helpful functions that can be used with EMIT data.

In [5]:
ds = emit_xarray(fp, ortho=True)
ds

Using the `read_file()` function from `geopandas`, read in the `.geojson` file containing the polygon you wish to extract.

In [6]:
shape = gp.read_file('../../data/isla_gaviota.geojson')
shape

Unnamed: 0,geometry
0,"POLYGON ((-62.14758 -39.88951, -62.16900 -39.8..."


Now plot the polygon we've loaded overlayed on a plot of the dataset.

In [7]:
ds.sel(wavelengths=800,method='nearest').hvplot.image(cmap='greys', frame_width=500, rasterize=True)*shape.hvplot(color='#d95f02',aspect='equal', alpha=0.5)

Use the `clip` function from `rasterio` to clip the dataset to polygons from the `geopandas.geodataframe`. Setting `all_touched` to `True` will include pixels that intersected with the edges of the polygon. 

In [8]:
clipped = ds.rio.clip(shape.geometry.values,shape.crs, all_touched=True)
clipped

To view the clipped image, select a band from the `clipped` dataset and plot it spatially.

In [9]:
clipped.sel(wavelengths=800,method='nearest').hvplot.image(cmap='viridis', aspect = 'equal', frame_width=500, rasterize=True)

Now we can save the clipped `xarray.Dataset` as a netCDF4 output that can be reopened using the `xarray.open_dataset` function. 

In [10]:
clipped.to_netcdf('../../data/clipped_data.nc')
# Example for Opening 
#ds = xr.open_dataset('../../data/clipped_data.nc')

---

## Contact Info:  

Email: LPDAAC@usgs.gov  
Voice: +1-866-573-3222  
Organization: Land Processes Distributed Active Archive Center (LP DAAC)¹  
Website: <https://lpdaac.usgs.gov/>  
Date last modified: 06-30-2023  

¹Work performed under USGS contract G15PD00467 for NASA contract NNG14HH33I.  