> **Disclaimer:** The [**EMITL2BMIN**](https://doi.org/10.5067/EMIT/EMITL2BMIN.001) product is generated to support the EMIT mission objectives of constraining the sign of dust related radiative forcing. Ten mineral types are the core focus of this work: Calcite, Chlorite, Dolomite, Goethite, Gypsum, Hematite, Illite+Muscovite, Kaolinite, Montmorillonite, and Vermiculite.  The [**EMIT_L3_ASA**]() product contain the aggregate abundance of these minerals at a coarser resolution for use in Earth System Models. Additional minerals are included in the **EMITL2BMIN** product for transparency but were not the focus of this product. Further validation is required to use these additional mineral maps, particularly in the case of resource exploration. Similarly, the separation of minerals with similar spectral features, such as a fine-grained goethite and hematite, is an area of active research. The results presented here are an initial offering, but the precise categorization is likely to evolve over time, and the limits of what can and cannot be separated on the global scale is still being explored. The user is encouraged to read the [Algorithm Theoretical Basis Document (ATBD)](https://lpdaac.usgs.gov/documents/1659/EMITL2B_ATBD_v1.pdf) for more details.

# Working with EMIT L2B Mineralogy Data

**Summary**  

In this notebook we will open the [EMIT L2B Estimated Mineral Identification and Band Depth and Uncertainty (EMITL2BMIN)](https://doi.org/10.5067/EMIT/EMITL2BMIN.001) products, find a mineral of interest from the ten mineral types focused on by the EMIT mission, evaluate uncertainty, orthorectify the data, then create an output mask or vector file for the granule.

**Background**

The EMIT instrument is an imaging spectrometer that measures light in visible and infrared wavelengths. These measurements display unique spectral signatures that correspond to the composition on the Earth's surface. The EMIT mission focuses specifically on mapping the composition of minerals to better understand the effects of mineral dust throughout the Earth system and human populations now and in the future. More details about EMIT and its associated products can be found in the **README.md** and on the [EMIT website](https://earth.jpl.nasa.gov/emit/).

The [EMITL2BMIN](https://doi.org/10.5067/EMIT/EMITL2BMIN.001) data product provides estimated mineral identification, band depths and uncertainty in a spatially raw, non-orthocorrected format. Two spectral groups, which correspond to different regions of the spectra, are identified independently and often co-occur are used to identify minerals. These estimates are generated using the [Tetracorder system](https://www.usgs.gov/publications/tetracorder-user-guide-version-44?_gl=1*1eoj33d*_ga*MTU3MTA3ODgxNS4xNjQ5MTg1MDgx*_ga_0YWDZEJ295*MTY4NjkyNTg0Mi40NC4xLjE2ODY5MjU4NzMuMC4wLjA.)([code](https://github.com/PSI-edu/spectroscopy-tetracorder)) and are based on [EMITL2ARFL](https://doi.org/10.5067/EMIT/EMITL2ARFL.001) reflectance values. The product also consists of an EMIT_L2B_MINUNCERT file, which provides band depth uncertainty estimates calculated using surface Reflectance Uncertainty values from the [EMITL2ARFL](https://doi.org/10.5067/EMIT/EMITL2ARFL.001) data product. The band depth uncertainties are presented as standard deviations, and the fit score for each mineral identification is also provided as the coefficient of determination (r<sup>2</sup>) of the match between the continuum normalized library reference and the continuum normalized observed spectrum. Associated metadata indicates the name and reference information for each identified mineral, and additional information about aggregating minerals into different categories, and the code used for product generation is available in the [emit-sds-l2b repository]().

**Disclaimer**

The [EMIT_L2B_MIN](https://doi.org/10.5067/EMIT/EMITL2BMIN.001) product is generated to support the EMIT mission objectives of constraining the sign of dust related radiative forcing. Ten mineral types are the core focus of this work: Calcite, Chlorite, Dolomite, Goethite, Gypsum, Hematite, Illite+Muscovite, Kaolinite, Montmorillonite, and Vermiculite. A future product will aggregate these results for use in Earth System Models. Additional minerals are included in this product for transparency but were not the focus of this product. Further validation is required to use these additional mineral maps, particularly in the case of resource exploration. Similarly, the separation of minerals with similar spectral features, such as a fine-grained goethite and hematite, is an area of active research. The results presented here are an initial offering, but the precise categorization is likely to evolve over time, and the limits of what can and cannot be separated on the global scale is still being explored. The user is encouraged to read the [Algorithm Theoretical Basis Document (ATBD)](https://lpdaac.usgs.gov/documents/1659/EMITL2B_ATBD_v1.pdf) for more details.

**Requirements** 
 - Set up Python Environment - See **setup_instructions.md** in the `/setup/` folder 

**Learning Objectives**  
- How to open an EMIT L2B `.nc` file as an `xarray.Dataset`
- Apply the Geometry Lookup Table (GLT) to orthorectify the image.
- Find minerals of interest within a granule
- Visualize Mineral Identification and Band depth
- Evaluate mineral uncertainty
- Calculate and Visualize mineral Abundance

**Tutorial Outline**  

1.1 Setup  
1.2 #TODO

## 1. Setup

Import the required Python libraries.

In [None]:
import earthaccess
import geopandas as gp
import os
import sys
import numpy as np
import pandas as pd
import xarray as xr
import hvplot.xarray
import holoviews as hv
sys.path.append('../modules/')
import emit_tools as et

Login to your NASA Earthdata account and create a `.netrc` file using the `login` function from the `earthaccess` library. If you do not have an Earthdata Account, you can create one [here](https://urs.earthdata.nasa.gov/home). 

In [None]:
earthaccess.login(persist=True)

For this notebook we will download the files necessary using `earthaccess`. You can also access the data in place or stream it, but this can slow due to the file sizes. Provide a URL for an EMIT L2B Mineral Identification and Band Depth granule.

In [None]:
# List the browse images from the text file output of the previous notebook.
rgb_list = '../../data/rgb_browse_urls.txt'
with open(rgb_list) as f:
    rgb_urls = [line.rstrip('\n') for line in f]
rgb_urls

In [None]:
# List the browse images from the text file output of the previous notebook.
min_list = '../../data/results_urls.txt'
with open(min_list) as f:
    min_urls = [line.rstrip('\n') for line in f]
min_urls

Get an HTTPS Session using your earthdata login, set a local path to save the file, and download the granule asset.

In [None]:
fs = earthaccess.get_fsspec_https_session()
fp = fs.open(min_urls[1])

## 1.2 Downloaded Data

If you’ve already downloaded the data using the workflow shown in Section 6 of the [Finding EMIT L2B Mineralogy Data](Finding_EMIT_L2B_Mineralogy_Data.ipynb) , you can just set filepaths using the cell below.

In [None]:
fp = '../../data/EMIT_L2B_MIN_001_20230427T173309_2311711_010.nc' # Mineral
#fp_rgb = '../../data/EMIT_L2A_RFL_001_20230427T173309_2311711_010.png' # RGB
fp_un = '../../data/EMIT_L2B_MINUNC_001_20230427T173309_2311711_010.nc' # Mineral Uncertainty

---
## 2. Working with the L2B Mineral Identification and Band Depth

EMITL2BMIN data are distributed in a non-orthocorrected spatially raw NetCDF4 (.nc) format consisting of the data and its associated metadata. Inside the `.nc` file there are 3 groups. Groups can be thought of as containers to organize the data. 

1. The root group that can be considered the main dataset contains 4 data variables data described by the downtrack, and crosstrack dimensions. These variables are `group_1_mineral_id`, `group_1_band_depth`, `group_2_mineral_id`, and `group_2_band_depth`. These contain the ID and a band depth for each mineral group. These groups do not correspond to the `.netcdf` file groups, but rather the spectral library groups used to identify the minerals based on which region of the spectra the mineral features correspond to.
2. The `mineral_metadata`  group containing the spectral library entry name, index, record, group, and url for each entry.
3. The `location` group contains latitude and longitude values at the center of each pixel described by the crosstrack and downtrack dimensions, as well as a geometry lookup table (GLT) described by the ortho_x and ortho_y dimensions. The GLT is an orthorectified image (EPSG:4326) consisting of 2 layers containing downtrack and crosstrack indices. These index positions allow us to quickly project the raw data onto this geographic grid.

To access the `.nc` file, you can use the `netCDF4`, `xarray` libraries, or fuctions from the `emit_tools.py` library. Here we will use the `emit_xarray` function from this library, which will open and organize the data into an easy to work with `xarray.Dataset` object. We can also pass the `ortho=True` argument to orthorectify the data at this stage, but we will start just examining the data to get a better understanding. 

In [None]:
ds_min = et.emit_xarray(fp)
ds_min

If we look at the mineral `index` by printing the first 5 values, we can see that values start with 1. If we look at the minimum values of the mineral IDs we can see these have 0 as a possible value. 

In [None]:
print(ds_min.index.data[:5])

In [None]:
print(f'Group_1_minimum:{ds_min.group_1_mineral_id.data.min()} Group_2_minimum: {ds_min.group_2_mineral_id.data.min()}')

The 0 here represents no match.  For convenience, let's make a DataFrame that holds the mineral data, and add that 'No match' reference to it:

In [None]:
min_df = pd.DataFrame({x: ds_min[x].values for x in [var for var in ds_min.coords if 'mineral_name' in ds_min[var].dims]})
min_df.loc[-1] = {'index': 0, 'mineral_name': 'No_Match', 'record': -1.0, 'url': 'NA', 'group': 1.0, 'library': 'NA', 'spatial_ref': 0}
min_df = min_df.sort_index().reset_index(drop=True)
min_df

### 2.1 Orthorectification

The orthorectifation process has already been done for EMIT data. Here we are just using the crosstrack and downtrack indices contained in the GLT to place our spatially raw mineralogy data a into geographic grid with the `ortho_x` and `ortho_y` dimensions.

In [None]:
ds_min = et.ortho_xr(ds_min)
ds_min

We can see from these outputs that the dimensions are now latitude and longitude.

In this example, we'll just work with the group_1 mineral data. We can find the minerals present in the scene by finding unique values in the `group_1_mineral_id`, but first we will replace fill-values introduced during orthorectification with `np.nan`, to omit them from our analysis and improve visualizations.

In [None]:
# Assign fill to np.nan
for var in ds_min.data_vars:
    ds_min[var].data[ds_min[var].data == -9999] = np.nan


## 2.2 Visualize Group 1 Minerals

To visualize minerals present, plot Group 1 Minerals using a categorical color set. You can hover over a colored region to see the zero-based mineral id from the spectral library. Note that these values correspond to the 1-based index value.

In [None]:
ds_min['group_1_mineral_id'].hvplot.image(cmap='glasbey', geo=True, tiles='ESRI', alpha=0.8,frame_width=750).opts(title='Group 1 Mineral ID')

This figure shows the minerals present in the scene, but doesn't really quantify how well they matched with the spectral library. For that we can look at the band depth for each mineral. We can build an interactive tool to do this using the `panel` and `hvplot` libraries. This will take a bit of time to load for each selection.

Because many minerals are scarce, we'll start by updating the names to include relative fractions

In [None]:

g1_min_percent = [np.round(np.sum(ds_min.group_1_mineral_id.data.flatten() == g1min) / np.sum(ds_min.group_1_mineral_id.data.flatten() > 0),2) * 100 for g1min in range(len(min_df))]
g1_dropdown_names = [str(g1_min_percent[_x]) + ' %: ' + x for (_x, x) in enumerate(min_df.mineral_name.tolist()) if g1_min_percent[_x] > 0]
g1_dropdown_names = np.array(g1_dropdown_names)[np.argsort([float(x.split(' %:')[0]) for x in g1_dropdown_names])[::-1]].tolist()


In [None]:
import panel as pn
# Interactive Panel Control For Mineral Band Depth - Default 5 is present in this dataset

min_select = pn.widgets.Select(name='Mineral Name', options = g1_dropdown_names, value = g1_dropdown_names[0])
@pn.depends(min_select)
def min_browse(min_select):
    mask = ds_min['group_1_band_depth'].where(ds_min['group_1_mineral_id'] == min_df['mineral_name'].tolist().index(min_select.split('%: ')[-1]))
    map = mask.hvplot.image(cmap='viridis', geo=True, tiles='ESRI', alpha=0.8,frame_width=750, clim=(0,np.nanpercentile(mask,98))).opts(title=f'{min_select} Band Depth')
    return map
pn.Row(pn.WidgetBox(min_select),min_browse)

## 2.3 Visualize Group 2 Minerals

We can do the same thing with Group 2 Minerals.  Group 2 will show a more diverse set of minerals in this region, including clays and carbonates.


In [None]:
ds_min['group_2_mineral_id'].hvplot.image(cmap='glasbey', geo=True, tiles='ESRI', alpha=0.8,frame_width=750).opts(title='Group 2 Mineral ID')

In [None]:

g2_min_percent = [np.round(np.sum(ds_min.group_2_mineral_id.data.flatten() == g1min) / np.sum(ds_min.group_2_mineral_id.data.flatten() > 0),2) * 100 for g1min in range(len(min_df))]
g2_dropdown_names = [str(g2_min_percent[_x]) + ' %: ' + x for (_x, x) in enumerate(min_df.mineral_name.tolist()) if g2_min_percent[_x] > 0]
g2_dropdown_names = np.array(g2_dropdown_names)[np.argsort([float(x.split(' %:')[0]) for x in g2_dropdown_names])[::-1]].tolist()


In [None]:
import panel as pn
# Interactive Panel Control For Mineral Band Depth - Default 5 is present in this dataset

min_select_g2 = pn.widgets.Select(name='Mineral Name', options = g2_dropdown_names, value = g2_dropdown_names[0])
@pn.depends(min_select_g2)
def min_browse_g2(min_select):
    print(min_select)
    mask = ds_min['group_2_band_depth'].where(ds_min['group_2_mineral_id'] == min_df['mineral_name'].tolist().index(min_select.split('%: ')[-1]))
    map = mask.hvplot.image(cmap='viridis', geo=True, tiles='ESRI', alpha=0.8,frame_width=750, clim=(0,np.nanpercentile(mask,98))).opts(title=f'{min_select} Band Depth')
    return map
pn.Row(pn.WidgetBox(min_select_g2),min_browse_g2)

## 3. Aggregating and Mineral Abundance

The above visualizations walk through the identification of individual library contituents, and visualize band depths.  However, the Tetracorder library used by EMIT contains many substrates that are spectrally distinct, but which may be useful to group together for some science applications. The library also contains many mixtures - both aerial and intimate.

To start, the mineral_grouping_matrix from the emit-sds-l2b repository (coppied locally) contains information aggregated from laboratory XRD analyses to attempt to quantify the abundance of different minerals within each constituent.  A -1 in the spreadsheet indicates an unknown but non-zero quantity, which in the few cases in the EMIT-10 columns we assume to be 100%.  Let's open that spreadsheet and take a look:

In [None]:
# Open Mineral Groupings .csv
mineral_groupings = pd.read_csv('../../data/mineral_grouping_matrix_20230503.csv')
# The EMIT 10 Minerals are in columns 6 - 17.  Columns after 17 are experimental, and we'll drop for this tutorial:
mineral_groupings = mineral_groupings.drop([x for _x, x in enumerate(mineral_groupings) if _x >= 17], axis=1)

# Retrieve the EMIT 10 Mineral Names from Columns 7-16 (starting with 0) in .csv
mineral_names = [x for _x, x in enumerate(list(mineral_groupings)) if _x > 6 and _x < 17]
# Use EMIT 10 Mineral Names to Subset .csv to only columns with EMIT 10 mineral_names
mineral_abundance_ref = np.array(mineral_groupings[mineral_names])
# Replace Some values in the .csv
mineral_abundance_ref[np.isnan(mineral_abundance_ref)] = 0
mineral_abundance_ref[mineral_abundance_ref == -1] = 1

mineral_groupings

# 3.1 Approximating Kaolinite Abundance

If we make the assumption that the XRD analysis are accurate, and that band-depth scales linearly with abundance, we can approximate the mineral abundance at the surface.  It should be noted that both of these assumptions are fraught - XRD analyses break down for some very small grainsize particles, particularly the Iron Oxides (Goethite and Hematite), and band-depth is heavily influenced by particle grain size, and so the abundance-band depth relationship is not linear.  We will expand on these details a bit more later, but for now lets take a look at what happens if we hold both of these as true.

The first step is to run through each mineral that has a non-nan value in the mineral_groupings dataframe, and add up the sum of each of those within the scene.  A little matrix multiplication is all we need to do that.  Notably, in the emit-sds/emit-sds-l2b, this functionality is already built into the group_aggregator.py script in an efficient manner...but because the calculation is simple we'll reproduce here for learning purposes.

In [None]:
mineral_name = 'Kaolinite'
mineral_abundance = np.zeros(ds_min['group_1_band_depth'].shape)
for _c in range(mineral_groupings.shape[0]):
    if np.isnan(mineral_groupings[mineral_name][_c]) == False:
        group = mineral_groupings["Group"][_c]
        mineral_abundance += (ds_min[f'group_{group}_mineral_id'].values == mineral_groupings['Index'][_c]) * ds_min[f'group_{group}_band_depth'].values

mineral_abundance[mineral_abundance == 0] = np.nan
# cast as xarray
mineral_abundance = xr.DataArray(data=mineral_abundance, coords=ds_min['group_1_band_depth'].coords, attrs=ds_min.attrs)

In [None]:
mineral_abundance.hvplot.image(geo=True, tiles='EsriImagery', cmap='viridis', alpha=0.8, frame_width=750, clim=(0, np.nanpercentile(mineral_abundance, 98))).opts(title=f'{mineral_name} Spectral Abundance')

The reference spectra in the library may have abundances related to multiple EMIT 10 minerals. Depending on our interests, we may want to use more than one, but for this example, we'll just focus on goethite, which is index 3 in the EMIT10.

In [None]:
# Create Out filenames and set folder
out_folder = '../../data/output/' # may need to change based on your directory structure
# Create out_folder if it does not exist
if not os.path.exists(out_folder):
   os.makedirs(out_folder)

Export to cloud optimized geotiff (COG) - probably the easiest format to work with at this stage

## 4. Exporting To Cloud-Optimized Geotiffs

# TODO
- This should allow creation of a categorical cog for ID, and a float cog based on selected mineral for band depth and abundance.
If interested in producing figures with software like QGIS, you can export abundance we calculated, mineral_id or band_depth from selected minerals as cloud-optimized geotiffs. 

In [None]:
# Create Out filenames and set folder
out_folder = '../../data/output/' # may need to change based on your directory structure
# Create out_folder if it does not exist
if not os.path.exists(out_folder):
   os.makedirs(out_folder)

In [None]:
# Set output filename
abun_name = f'{ds_abundance.granule_id}_goethite_spectral_abundance.tif'
# Write data to COG
ds_abundance.rio.to_raster(raster_path=f'{out_folder}{abun_name}', driver='COG')

In [None]:
# Set output Filename
out_name = f'{ds_min.granule_id}_group_1_mineral_id.tif'
# Select Group to Output
dat_out = ds_min['group_1_mineral_id']
# Fix Datatype and values
dat_out.data = np.nan_to_num(dat_out.data, nan=-9999)
dat_out.data = dat_out.data.astype(int)

In [None]:
dat_out

In [None]:
dat_out = dat_out.assign_coords(longitude=(dat_out.coords['longitude'].astype('float32')))
dat_out = dat_out.assign_coords(latitude=(dat_out.coords['latitude'].astype('float32')))


In [None]:
dat_out

In [None]:
# Write data to COG
dat_out.rio.to_raster(raster_path=f'{out_folder}{out_name}', driver='COG', nodata=-9999)

## Contact Info:  

Email: LPDAAC@usgs.gov  
Voice: +1-866-573-3222  
Organization: Land Processes Distributed Active Archive Center (LP DAAC)¹  
Website: <https://lpdaac.usgs.gov/>  
Date last modified: 06-21-2024  

¹Work performed under USGS contract 140G0121D0001 for NASA contract NNG14HH33I. 