# Direct S3 Access tutorial

This tutorial will show you how to authenticate with the PO.DAAC data archive, and gain access to the data in amazon S3 buckets. This allows you to natively list, copy, get data from the PO.DAAC archive using your preferred amazon methods (e.g. Python boto3, amazon SDK, aws cli).

**note** Direct S3 access is only available to users **running in AWS, us-west-2 region.** All other access must come from HTTP requests for PO.DAAC data

In [None]:
import boto3
import json
import requests
import xarray as xr
%matplotlib inline

## Get Temporary AWS Credentials for Access

S3 is an 'object store' hosted in AWS for cloud processing. Direct S3 access is achieved by passing NASA supplied temporary credentials to AWS so we can interact with S3 objects from applicable Earthdata Cloud buckets. Note, these temporary credentials are valid for only 1 hour. A netrc file is required to aquire these credentials. Use the [NASA Earthdata Authentication](https://github.com/NASA-Openscapes/2021-Cloud-Hackathon/blob/main/tutorials/04_NASA_Earthdata_Authentication.ipynb) to create a netrc file in your home directory. (Note: A NASA Earthdata Login is required to access data from the NASA Earthdata system. Please visit https://urs.earthdata.nasa.gov to register and manage your Earthdata Login account. This account is free to create and only takes a moment to set up.) 

The following crediential is for PODAAC, but other credentials are needed to access data from other NASA DAACs.

In [None]:
s3_cred_endpoint = 'https://archive.podaac.earthdata.nasa.gov/s3credentials'

Create a function to make a request to an endpoint for temporary credentials.

In [None]:
def get_temp_creds():
    temp_creds_url = s3_cred_endpoint
    return requests.get(temp_creds_url).json()

In [None]:
temp_creds_req = get_temp_creds()
#temp_creds_req                      # !!! BEWARE, removing the # on this line will print your temporary S3 credentials.

### List all datasets available using boto3

In [None]:
s3_client = boto3.client(
    's3',
    aws_access_key_id=temp_creds_req["accessKeyId"],
    aws_secret_access_key=temp_creds_req["secretAccessKey"],
    aws_session_token=temp_creds_req["sessionToken"]
)

In [None]:
s3_client.list_objects(Bucket="podaac-ops-cumulus-protected", Prefix="ECCO_L4_ATM_STATE_05DEG_DAILY_V4R4/")

### Download a specific file within the cloud, open and plot a variable from it

In [None]:
s3_client.download_file("podaac-ops-cumulus-protected", "ECCO_L4_ATM_STATE_05DEG_DAILY_V4R4/ATM_SURFACE_TEMP_HUM_WIND_PRES_day_mean_1992-01-01_ECCO_V4r4_latlon_0p50deg.nc","ATM_SURFACE_TEMP_HUM_WIND_PRES_day_mean_1992-01-01_ECCO_V4r4_latlon_0p50deg.nc")

In [None]:
ds = xr.open_dataset("ATM_SURFACE_TEMP_HUM_WIND_PRES_day_mean_1992-01-01_ECCO_V4r4_latlon_0p50deg.nc")
ds

In [None]:
ds.EXFwspee.plot()

### Set up an `s3fs` session for Direct Access without downloading within the cloud
`s3fs` sessions are used for authenticated access to s3 bucket and allows for typical file-system style operations. Below we create session by passing in the temporary credentials we recieved from our temporary credentials endpoint and then find the s3 paths to the data we want.

In [None]:
import s3fs
import os

os.environ["AWS_ACCESS_KEY_ID"] = temp_creds_req["accessKeyId"]
os.environ["AWS_SECRET_ACCESS_KEY"] = temp_creds_req["secretAccessKey"]
os.environ["AWS_SESSION_TOKEN"] = temp_creds_req["sessionToken"]

s3 = s3fs.S3FileSystem(anon=False)

s3path = 's3://podaac-ops-cumulus-protected/MUR-JPL-L4-GLOB-v4.1/202101*.nc'
#s3path = 's3://podaac-ops-cumulus-protected/ECCO_L4_ATM_STATE_05DEG_DAILY_V4R4/ATM_SURFACE_TEMP_HUM_WIND_PRES_day_mean_1992-01-*.nc'
remote_files = s3.glob(s3path)

In [None]:
remote_files

In [None]:
fileset = [s3.open(file) for file in remote_files]

Open all files and combine into one `xarray` dataset

In [None]:
data = xr.open_mfdataset(fileset, combine='by_coords', engine='h5netcdf' )
data

In [None]:
data.analysed_sst.sel(lat=21.00, lon=-21.00).plot()