# How to: Use Direct S3 Access to work with EMIT Data

**Summary**  

EMIT datasets are available through NASA's Earthdata Cloud. NASA Earthdata on Cloud is always free and accessible via either HTTPS or direct S3 bucket access. With direct S3 access, you can bring your "code to the data", making your processing faster and scalable. Direct S3 access to NASA Earthdata on Cloud is only available if your Amazon Web Services (AWS) instance is set up in the `us-west-2` region. This notebook explains how to utilize direct S3 access to EMIT data. 

**Requirements**

+ A NASA [Earthdata Login](https://urs.earthdata.nasa.gov/) account is required to download EMIT data
+ A configured `.netrc` file to access NASA Earth Data
  + This can be configured in the first section of the `setup_instructions.md` included in the `/setup/` folder of the repository. 
+ Selected the `emit_tutorials` environment as the kernel for this notebook.
  + For instructions on setting up the environment, follow the `setup_instructions.md` included in the `/setup/` folder of the repository. 
+ **xarray v2022.12.0** or newer to read s3 files twice when open with s3fs
  + Version can be checked in Python using `xarray.__version__`
  + To update, activate your environment and enter `pip install xarray==2022.12.0` in command line.

**Learning Objectives**  
+ How to use Direct S3 Access to EMIT Data
+ How to add this functionality to any notebook from this repository

---

Import the required packages.

In [None]:
# Import Packages
import os
import requests
import s3fs
import netCDF4 as nc
from osgeo import gdal
import numpy as np
import xarray as xr
import hvplot.xarray
import holoviews as hv
sys.path.append('../modules/')
from emit_tools import emit_xarray

---

## Overview of s3 Access

NASA Earthdata Cloud data in S3 can be directly accessed via temporary credentials; this access is limited to requests made within the US West (Oregon) (code: us-west-2) AWS region. Direct S3 access is achieved by passing NASA supplied temporary credentials to AWS so we can interact with S3 objects from applicable Earthdata Cloud buckets. For now, each NASA DAAC has different AWS credentials endpoints. Below are some of the credential endpoints to various DAACs:

In [None]:
s3_cred_endpoint = {
    'podaac':'https://archive.podaac.earthdata.nasa.gov/s3credentials',
    'gesdisc': 'https://data.gesdisc.earthdata.nasa.gov/s3credentials',
    'lpdaac':'https://data.lpdaac.earthdatacloud.nasa.gov/s3credentials',
    'ornldaac': 'https://data.ornldaac.earthdata.nasa.gov/s3credentials',
    'ghrcdaac': 'https://data.ghrc.earthdata.nasa.gov/s3credentials'
}
s3_cred_endpoint

Create a function to make a request to an endpoint for temporary credentials. Remember, each DAAC has their own endpoint and credentials are not usable for cloud data from other DAACs.

In [None]:
# Define Function 
def get_temp_creds(provider):
    return requests.get(s3_cred_endpoint[provider]).json()


In [None]:
# Get Credentials
temp_creds_req = get_temp_creds('lpdaac')
#temp_creds_req

### Set up an s3fs Session for Direct Access

`s3fs` sessions are used for authenticated access to s3 bucket and allows for typical file-system style operations. Below we create session by passing in the temporary credentials we recieved from our temporary credentials endpoint.

In [None]:
# Pass Authentication to s3fs
fs_s3 = s3fs.S3FileSystem(anon=False, 
                          key=temp_creds_req['accessKeyId'], 
                          secret=temp_creds_req['secretAccessKey'], 
                          token=temp_creds_req['sessionToken'])

Below we specify the s3 URL to the data asset in Earthdata Cloud. This URL can be found via Earthdata Search or programmatically through the CMR and CMR-STAC APIs.

In [None]:
# Define S3 URL                          
s3_url = 's3://lp-prod-protected/EMITL2ARFL.001/EMIT_L2A_RFL_001_20220903T163129_2224611_012/EMIT_L2A_RFL_001_20220903T163129_2224611_012.nc'

Open with the netCDF file using the s3fs package, then load the cloud asset into an xarray dataset, or use directly with `emit_xarray` function from `emit_tools`.

In [None]:
fp = fs_s3.open(s3_url, mode='rb')

In [None]:
# Open s3 url
fp = fs_s3.open(s3_url, mode='rb')
# Open dataset with xarray
ds = xr.open_dataset(fp)
ds

In [None]:
# Open and Orthorectify
ds = emit_xarray(fp)
ds

In [None]:
#Plot Spatially
ds.isel(bands=40).hvplot.image(cmap='viridis', aspect = 'equal', frame_width=500, rasterize=True)

In [None]:
#Plot Spectra at a Location
ds.sel(longitude=-61.833,latitude=-39.710,method='nearest').hvplot.line(y='reflectance',x='wavelengths', color='black', frame_width=400)

--- 

## S3 Access for any Notebook in this Repository

Add the two code blocks below to any how-to or tutorial, by replacing the block that sets the local filepath(s) as `fp` with the two blocks below. The first block imports the additional packages required and retrieves temporary s3 credentials. The second uses `s3fs` to open the desired s3 URL and create an object readable by `xarray`.

**If you are on an openscapes 2i2c JupyterLab instance, check that xarray version is 2022.12.0**

To do check version type the following in the terminal:  

> `conda list`

To install xarray version 2022.12.0 type the following in the terminal:  

> `pip install xarray==2022.12.0`

In [None]:
import requests
import s3fs
temp_creds_req = requests.get('https://data.lpdaac.earthdatacloud.nasa.gov/s3credentials').json() # use lpdaac credential endpoint for EMIT data
# Create s3fs session
fs_s3 = s3fs.S3FileSystem(anon=False, 
                          key=temp_creds_req['accessKeyId'], 
                          secret=temp_creds_req['secretAccessKey'], 
                          token=temp_creds_req['sessionToken'])

In [None]:
# Set s3 url and open
s3_url = 's3://lp-prod-protected/EMITL2ARFL.001/EMIT_L2A_RFL_001_20220903T163129_2224611_012/EMIT_L2A_RFL_001_20220903T163129_2224611_012.nc' # S3 URL to L2A Reflectance File used throughout tutorial
#s3_url_mask = 's3://lp-prod-protected/EMITL2ARFL.001/EMIT_L2A_RFL_001_20220903T163129_2224611_012/EMIT_L2A_MASK_001_20220903T163129_2224611_012.nc' # Only used for Quality How-to
fp = fs_s3.open(s3_url, mode='rb')
#fp_mask = fs_s3.open(s3_url_mask, mode='rb') # Only used for Quality How-to

---

## Contact Info:  

Email: LPDAAC@usgs.gov  
Voice: +1-866-573-3222  
Organization: Land Processes Distributed Active Archive Center (LP DAAC)¹  
Website: <https://lpdaac.usgs.gov/>  
Date last modified: 01-09-2023  

¹Work performed under USGS contract G15PD00467 for NASA contract NNG14HH33I.  