# Cloud Computing for Science with AWS Lambda 

This tutorial demonstrates how to plot a timeseries of global mean sea surface temperature values using AWS Lambda to perform the global mean computation. We use the MUR 25km dataset. 

This is one example of how to take advantage of AWS Cloud Computing capabilities. Note that using AWS Compute services will incur costs that will be charged to your AWS account. Where possible we indicate estimates of the compute cost associated with this tutorial, but if you apply it to a larger time period or different dataset, the compute costs may be larger than indicated here. 

## Step 0: Set up AWS infrastructure

This tutorial takes advantage of numerous AWS Services including Lambda, Parameter Store, Elastic Compute Cloud (EC2), Elastic Container Registry (ECR), and Simple Storage Service (S3). 

#### Deploy Lambda function using Terraform

AWS Lambda is compute service that runs code in response to events. The Lambda code is packaged in a Docker image, and we use Terraform to handle setting up the AWS Lambda function.

#### Set up Earthdata credentials in AWS Parameter Store

** Do we need this? Maybe only for testing?
To avoid hard-coding Earthdata credentials or a .netrc file in the Docker image, we use the AWS parameter store to set the Earthdata credentials. This means that the same Lambda code can run without modification in any user environment and will assume the correct EDL that is set by each user in your AWS Parameter Store. 

#### Set up S3 bucket to hold granules and results



#### Test Lambda function

#### Connect to EC2 instance to run this notebook

This notebook cannot be run on a local computer, as it heavily depends on direct in-cloud access. To run this notebook in AWS, connect to an EC2 instance running in the us-west-2 region, following the instructions < here >. Once you have connected to the EC2 instance you can clone this repository into that environment, install the required packages, and run this notebook. 



## Step 1: Log in to Earthdata

## Step 2: Query CMR for granules, and copy to local S3 bucket

In [3]:
from CMRQuery import CMRQuery
import matplotlib.pyplot as plt

### Initialize the query object, login to Earthdata and set the authentication token

In [4]:
cmr_query = CMRQuery()

cmr_query.login_and_set_token()

KeyError: 'token'

### Get the shortnames for all L4 collections containing 'sst' or 'sea surface temperature' keywords

In [3]:
shortnames = cmr_query.query_collections_by_keyword(
                            provider='POCLOUD',
                            keywords=['sst', 'sea surface temperature'],
                            processinglevel='4')

for s in shortnames:
    print(s)

CMC0.1deg-CMC-L4-GLOB-v3.0
MUR-JPL-L4-GLOB-v4.1
K10_SST-NAVO-L4-GLOB-v01
MUR25-JPL-L4-GLOB-v04.2
AVHRR_OI-NCEI-L4-GLOB-v2.0
AVHRR_OI-NCEI-L4-GLOB-v2.1
CMC0.2deg-CMC-L4-GLOB-v2.0
DMI_OI-DMI-L4-GLOB-v1.0
GAMSSA_28km-ABOM-L4-GLOB-v01
Geo_Polar_Blended-OSPO-L4-GLOB-v1.0
Geo_Polar_Blended_Night-OSPO-L4-GLOB-v1.0
MITgcm_LLC4320_Pre-SWOT_JPL_L4_BassStrait_v1.0
MW_IR_OI-REMSS-L4-GLOB-v5.0
MW_IR_OI-REMSS-L4-GLOB-v5.1
MW_OI-REMSS-L4-GLOB-v5.0
MW_OI-REMSS-L4-GLOB-v5.1
OISST_HR_NRT-GOS-L4-BLK-v2.0
OISST_HR_NRT-GOS-L4-MED-v2.0
OISST_UHR_NRT-GOS-L4-BLK-v2.0
OISST_UHR_NRT-GOS-L4-MED-v2.0
OSTIA-UKMO-L4-GLOB-v2.0
RAMSSA_09km-ABOM-L4-AUS-v01
REMO_OI_SST_5km-UFRJ-L4-SAMERICA-v1.0
REYNOLDS_NCDC_L4_MONTHLY_V5


### get the granules belonging to each collection found above, within a time range

In [6]:
granule_URL_lists = cmr_query.query_granules_by_shortname(
                            shortnames="MUR25-JPL-L4-GLOB-v04.2", 
                            provider="POCLOUD", 
                            temporal_range="2022-12-01T00:00:00Z,2022-12-31T23:59:59Z")

for list in granule_URL_lists:
    for granule in list:
        print(u)

s3://podaac-ops-cumulus-protected/MUR25-JPL-L4-GLOB-v04.2/20221201090000-JPL-L4_GHRSST-SSTfnd-MUR25-GLOB-v02.0-fv04.2.nc
s3://podaac-ops-cumulus-protected/MUR25-JPL-L4-GLOB-v04.2/20221202090000-JPL-L4_GHRSST-SSTfnd-MUR25-GLOB-v02.0-fv04.2.nc
s3://podaac-ops-cumulus-protected/MUR25-JPL-L4-GLOB-v04.2/20221203090000-JPL-L4_GHRSST-SSTfnd-MUR25-GLOB-v02.0-fv04.2.nc
s3://podaac-ops-cumulus-protected/MUR25-JPL-L4-GLOB-v04.2/20221204090000-JPL-L4_GHRSST-SSTfnd-MUR25-GLOB-v02.0-fv04.2.nc
s3://podaac-ops-cumulus-protected/MUR25-JPL-L4-GLOB-v04.2/20221205090000-JPL-L4_GHRSST-SSTfnd-MUR25-GLOB-v02.0-fv04.2.nc
s3://podaac-ops-cumulus-protected/MUR25-JPL-L4-GLOB-v04.2/20221206090000-JPL-L4_GHRSST-SSTfnd-MUR25-GLOB-v02.0-fv04.2.nc
s3://podaac-ops-cumulus-protected/MUR25-JPL-L4-GLOB-v04.2/20221207090000-JPL-L4_GHRSST-SSTfnd-MUR25-GLOB-v02.0-fv04.2.nc
s3://podaac-ops-cumulus-protected/MUR25-JPL-L4-GLOB-v04.2/20221208090000-JPL-L4_GHRSST-SSTfnd-MUR25-GLOB-v02.0-fv04.2.nc
s3://podaac-ops-cumulus-protecte

In [16]:
import requests
import boto3
import s3fs
import xarray as xr
import numpy as np

s3_cred_endpoint = {
    'podaac':'https://archive.podaac.earthdata.nasa.gov/s3credentials'
}

def get_temp_creds(provider):
    return requests.get(s3_cred_endpoint[provider]).json()

temp_creds_req = get_temp_creds('podaac')

s3_client = s3fs.S3FileSystem(
        anon=False, 
        key=temp_creds_req['accessKeyId'], 
        secret=temp_creds_req['secretAccessKey'], 
        token=temp_creds_req['sessionToken']
    )

lambda_client = boto3.client('lambda')

s3_results_bucket = "s3://podaac-sst/"

In [None]:
for list in granule_URL_lists:
    for granule in list:

        lambda_payload = {"input_granule_s3path": granule, "output_granule_s3bucket": s3_results_bucket}

        lambda_client.invoke(
            FunctionName="podaac-sst",
            InvocationType="Event",
            Payload=lambda_payload
        )
        

### clean up and delete token

In [None]:
cmr_query.delete_token()

## Step 3: Plot results as timeseries

Open the resulting global mean files in xarray:

In [None]:
# set up the connection to the S3 bucket holding the results
s3_local = s3fs.S3FileSystem()

s3_files = s3_local.glob(s3_results_bucket)

# iterate through s3 files to create a fileset
fileset = [s3_local.open(file) for file in s3_files]

# open all files as an xarray dataset
data = xr.open_mfdataset(fileset, combine='by_coords')


Plot the data using matplotlib:

In [None]:
# set up the figure
fig = plt.Figure()

# plot the data
plt.plot(data.time, data.analysed_sst)
