# How to: Extracting EMIT Spectra at Specified Coordinates

**Summary**  

In this notebook we will open a netCDF4 file from the Earth Surface Minteral Dust Source Investigation (EMIT) as an `xarray.Dataset`. We will then extract the spectra at point coordinates from a `.csv` as a dataframe, then save and plot the data.

**Requirements:**
+ A NASA [Earthdata Login](https://urs.earthdata.nasa.gov/) account is required to download EMIT data   
+ Selected the `emit_tutorials` environment as the kernel for this notebook.
  + For instructions on setting up the environment, follow the the `setup_instructions.md` included in the `/setup/` folder of the repository.  

**Learning Objectives**
+ How to open an EMIT file as an `xarray.Dataset`
+ How to extract spectra at coordinates listed in a `.csv` file 

---

Import the required Python libraries.

In [1]:
# Import Packages
import sys
import os
import earthaccess
import numpy as np
import pandas as pd
import xarray as xr
import hvplot.pandas
import hvplot.xarray
import holoviews as hv
sys.path.append('../modules/')
from emit_tools import emit_xarray

Login to your NASA Earthdata account and create a `.netrc` file using the `login` function from the `earthaccess` library. If you do not have an Earthdata Account, you can create one [here](https://urs.earthdata.nasa.gov/home). 

In [2]:
earthaccess.login(persist=True)

EARTHDATA_USERNAME and EARTHDATA_PASSWORD are not set in the current environment, try setting them or use a different strategy (netrc, interactive)
You're now authenticated with NASA Earthdata Login
Using token with expiration date: 07/08/2023
Using .netrc file for EDL


<earthaccess.auth.Auth at 0x1e81614af50>

For this notebook we will download the files necessary using `earthaccess`. You can also access the data in place or stream it, but this can slow due to the file sizes. Provide a URL for an EMIT L2A Reflectance granule.

In [3]:
url = 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/EMITL2ARFL.001/EMIT_L2A_RFL_001_20220903T163129_2224611_012/EMIT_L2A_RFL_001_20220903T163129_2224611_012.nc'

Get an HTTPS Session using your earthdata login, set a local path to save the file, and download the granule asset - This may take a while, the reflectance file is approximately 1.8 GB.

In [4]:
# Get Https Session using Earthdata Login Info
fs = earthaccess.get_fsspec_https_session()
# Retrieve granule asset ID from URL (to maintain existing naming convention)
granule_asset_id = url.split('/')[-1]
# Define Local Filepath
fp = f'../../data/{granule_asset_id}'
# Download the Granule Asset if it doesn't exist
if not os.path.isfile(fp):
    fs.download(url, fp)

Open the file downloaded and defined as `fp`. To do this, we will use the `emit_tools` module which contains a few helpful functions that can be used with EMIT data. Use the `ortho=True` option to orthorectify the dataset.

In [5]:
ds = emit_xarray(fp, ortho=True)
ds

Now open the `.csv` included in the `/data/` directory as a `pandas.dataframe`.

> Note: The category values here are arbitrary and included as an example of an additional column users may want.

In [6]:
points = pd.read_csv('../../data/sample_coords.csv')
points

Unnamed: 0,ID,Category,Latitude,Longitude
0,0,1,-39.94,-62.36
1,1,1,-39.75,-61.74
2,2,3,-40.0,-62.1
3,3,2,-39.89,-61.85
4,4,3,-39.38,-62.03


Make a plot to visualize the points we're going to select on the dataset. Here we use the reflectance values at 850nm as our basemap.

In [7]:
ds.sel(wavelengths=850,method='nearest').hvplot.image(cmap='greys', frame_width=500, rasterize=True,aspect='equal')*\
points.hvplot.scatter(x='Longitude',y='Latitude', color='ID', cmap='Category10',aspect='equal')*\
points.hvplot.labels(x='Longitude',y='Latitude', text='ID', text_color='ID', cmap='Category10',aspect='equal').opts(xoffset=0.03)

Set the `points` dataframe index as `ID` to utilize our existing point ID's as an index.

In [8]:
points = points.set_index(['ID'])

Convert the dataframe to an `xarray.Dataset`

In [9]:
xp = points.to_xarray()
xp

Select the data from our EMIT dataset using the Latitude and Longitude coordinates from our point dataset, then convert the output to a `pandas.dataframe`.

In [10]:
extracted = ds.sel(latitude=xp.Latitude,longitude=xp.Longitude, method='nearest').to_dataframe()
extracted

Unnamed: 0_level_0,Unnamed: 1_level_0,reflectance,latitude,longitude,good_wavelengths,fwhm,elev,spatial_ref
ID,wavelengths,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
0,381.005585,0.021768,-39.939816,-62.359998,1.0,8.415,12.814081,0
0,388.409210,0.022659,-39.939816,-62.359998,1.0,8.415,12.814081,0
0,395.815826,0.023554,-39.939816,-62.359998,1.0,8.415,12.814081,0
0,403.225403,0.024465,-39.939816,-62.359998,1.0,8.415,12.814081,0
0,410.638000,0.025633,-39.939816,-62.359998,1.0,8.417,12.814081,0
...,...,...,...,...,...,...,...,...
4,2463.381592,0.084580,-39.380232,-62.029779,1.0,8.803,13.629602,0
4,2470.767822,0.081152,-39.380232,-62.029779,1.0,8.804,13.629602,0
4,2478.153076,0.072722,-39.380232,-62.029779,1.0,8.806,13.629602,0
4,2485.538574,0.061968,-39.380232,-62.029779,1.0,8.807,13.629602,0


The output is a longform dataframe using the `'ID'` field as an index. This is missing our `'Category'` column from our original dataframe. Use the `pd.join` function to add the `'Category'` column to our dataset using `'ID'` as an index.

In [11]:
df = extracted.join(points['Category'], on=['ID'])
df

Unnamed: 0_level_0,Unnamed: 1_level_0,reflectance,latitude,longitude,good_wavelengths,fwhm,elev,spatial_ref,Category
ID,wavelengths,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
0,381.005585,0.021768,-39.939816,-62.359998,1.0,8.415,12.814081,0,1
0,388.409210,0.022659,-39.939816,-62.359998,1.0,8.415,12.814081,0,1
0,395.815826,0.023554,-39.939816,-62.359998,1.0,8.415,12.814081,0,1
0,403.225403,0.024465,-39.939816,-62.359998,1.0,8.415,12.814081,0,1
0,410.638000,0.025633,-39.939816,-62.359998,1.0,8.417,12.814081,0,1
...,...,...,...,...,...,...,...,...,...
4,2463.381592,0.084580,-39.380232,-62.029779,1.0,8.803,13.629602,0,3
4,2470.767822,0.081152,-39.380232,-62.029779,1.0,8.804,13.629602,0,3
4,2478.153076,0.072722,-39.380232,-62.029779,1.0,8.806,13.629602,0,3
4,2485.538574,0.061968,-39.380232,-62.029779,1.0,8.807,13.629602,0,3


Now we have a dataframe containing our initial data, in addition to the extracted point data. This a a good place to save an output as a `.csv`. Go ahead and do that below.

In [12]:
df.to_csv('../../data/example_out.csv')

We can use our dataframe to plot the reflectance data we extracted, but first, mask the reflectance values of `-0.01`, which represent deep water vapor absorption regions. To do this, assign values where the reflectance = `-0.01` to `np.nan`. 

In [13]:
df.loc[:]['reflectance'][df.loc[:]['reflectance'] == -0.01] = np.nan

Plot the data using `hvplot`. We can use `by=` to separate the reflectances by their `ID`.

In [14]:
df.hvplot(x='wavelengths',y='reflectance', by=['ID']).opts(xlabel='Wavelengths (nm)',ylabel='Reflectance')

---

## Contact Info:  

Email: LPDAAC@usgs.gov  
Voice: +1-866-573-3222  
Organization: Land Processes Distributed Active Archive Center (LP DAAC)¹  
Website: <https://lpdaac.usgs.gov/>  
Date last modified: 06-30-2023  

¹Work performed under USGS contract G15PD00467 for NASA contract NNG14HH33I.  