# Streaming data from NASA's Earth Surface Minteral Dust Source Investigation (EMIT)

This is a proof of concept notebook to demonstrate how [earthaccess](https://github.com/nsidc/earthaccess) can facilitate the use of cloud hosted data from NASA using xarray and holoviews. For a formal tutorial on EMIT please visit the official repository where things are explained in detail. [EMIT Science Tutorial](https://github.com/nasa/EMIT-Data-Resources/blob/main/python/tutorials/Exploring_EMIT_L2A_Reflectance.ipynb)


Prerequisites 

* NASA EDL [credentials](https://urs.earthdata.nasa.gov/)
* Openscapes [Conda environment installed](https://raw.githubusercontent.com/NASA-Openscapes/corn/main/ci/environment.yml)
* For direct access this notebook should run in AWS


**IMPORTANT: This notebook should run out of AWS but is not recommended as streaming HDF5 data is slow out of region**


In [None]:
from pprint import pprint

import earthaccess
import xarray as xr

print(f"using earthaccess version {earthaccess.__version__}")

auth = earthaccess.login()

### Searching for the dataset with `.search_datasets()`

> Note: API docs can be found at [earthaccess](https://nsidc.github.io/earthaccess/user-reference/api/api/)

In [None]:
results = earthaccess.search_datasets(short_name="EMITL2ARFL", cloud_hosted=True)

# Let's print our datasets
for dataset in results:
    pprint(dataset.summary())

### Searching for the data with `.search_data()` over Ecuador

In [None]:
# ~Ecuador = -82.05,-3.17,-76.94,-0.52
granules = earthaccess.search_data(
    short_name="EMITL2ARFL",
    bounding_box=(-82.05, -3.17, -76.94, -0.52),
    count=10,
)
print(len(granules))

### `earthaccess` can print a preview of the data using the metadata from CMR

> Note: there is a bug in earthaccess where the reported size of the granules are always 0, fix is coming next week

In [None]:
granules[7]

## Streaming data from S3 with fsspec 

Opening the data with `earthaccess.open()` and accessing the NetCDF as if it was local 

If we run this code in AWS(us-west-2), earthaccess can use direct S3 links. If we run it out of AWS, earthaccess can only use HTTPS links. Direct S3 access for NASA data is only allowed in region.

In [None]:
# open() accepts a list of results or a list of links
file_handlers = earthaccess.open(granules)
file_handlers

In [None]:
%%time

# we can use any file from the array
file_p = file_handlers[4]

refl = xr.open_dataset(file_p)
wvl = xr.open_dataset(file_p, group="sensor_band_parameters")
loc = xr.open_dataset(file_p, group="location")
ds = xr.merge([refl, loc])
ds = ds.assign_coords(
    {
        "downtrack": (["downtrack"], refl.downtrack.data),
        "crosstrack": (["crosstrack"], refl.crosstrack.data),
        **wvl.variables,
    }
)

ds

### Plotting non orthorectified data


Use the following code to plot the Panel widget when you run this code on AWS us-west-2


```python

import holoviews as hv
import hvplot.xarray
import numpy as np
import panel as pn

pn.extension()

# Find band nearest to value of 850 nm (NIR)
b850 = np.nanargmin(abs(ds["wavelengths"].values - 850))
ref_unc = ds["reflectance_uncertainty"]
image = ref_unc.sel(bands=b850).hvplot("crosstrack", "downtrack", cmap="viridis")
stream = hv.streams.Tap(source=image, x=255, y=484)


def wavelengths_histogram(x, y):
    histo = ref_unc.sel(crosstrack=x, downtrack=y, method="nearest").hvplot(
        x="wavelengths", color="green"
    )
    return histo


tap_dmap = hv.DynamicMap(wavelengths_histogram, streams=[stream])
pn.Column(image, tap_dmap)

```

