## Getting Started
In this notebook, we will access monthly sea surface height from ECCO V4r4 (10.5067/ECG5D-SSH44). The data are provided as a time series of monthly netCDFs on a 0.5-degree latitude/longitude grid.

We will access the data from inside the AWS cloud (us-west-2 region, specifically) and load a time series made of multiple netCDF datasets into a single xarray dataset. This approach leverages S3 native protocols for efficient access to the data.

### Requirements
AWS
This notebook should be running in an EC2 instance in AWS region us-west-2, as previously mentioned. We recommend using an EC2 with at least 8GB of memory available.

The notebook was developed and tested using a t2.small instance (_ CPUs; 8GB memory). Python 3

Most of these imports are from the Python standard library. However, you will need to install these packages into your Python 3 environment if you have not already done so:

```
s3fs
xarray
matplotlib
cartopy
```

## Learning Objectives
* import needed libraries
* define dataset of interest
* authenticate for NASA Earthdata archive (Earthdata Login)
* obtain AWS credentials for Earthdata DAAC archive in AWS S3
* access DAAC data directly from the in-region S3 bucket without moving or downloading any files to your local (cloud) workspace
* plot the first time step in the data

In [None]:
import boto3
import json
import xarray as xr
%matplotlib inline

In [None]:
# Paste the result of your accessing and login to the s3Credential endpoint above into the 's3_credential' variable here:
s3_credential = ''
creds = json.loads(s3_credential)

In [None]:
# Let's start simple with a LocalCluster that makes use of all the cores and RAM we have on a single machine
from dask.distributed import Client, LocalCluster
cluster = LocalCluster()
# explicitly connect to the cluster we just created
client = Client(cluster)
client

In [None]:
import s3fs
import os

os.environ["AWS_ACCESS_KEY_ID"] = creds["accessKeyId"]
os.environ["AWS_SECRET_ACCESS_KEY"] = creds["secretAccessKey"]
os.environ["AWS_SESSION_TOKEN"] = creds["sessionToken"]

s3 = s3fs.S3FileSystem(anon=False) 

s3path = 's3://podaac-ops-cumulus-protected/SWOT_SIMULATED_L2_KARIN_SSH_GLORYS_SCIENCE_V1/SWOT_L2_LR_SSH_Expert_001_00*.nc'
#s3path = 's3://podaac-ops-cumulus-protected/ECCO_L4_ATM_STATE_05DEG_DAILY_V4R4/ATM_SURFACE_TEMP_HUM_WIND_PRES_day_mean_1992-01-*.nc'
remote_files = s3.glob(s3path)


In [None]:
remote_files

In [None]:
fileset = [s3.open(file) for file in remote_files]

In [None]:

%%time
data = xr.open_mfdataset(fileset,engine='h5netcdf', combine='nested', concat_dim="num_lines", decode_times=False)
# OR
# datasets = []
# for f in fileset:
#     print("Openning " + str(f) )
#     f = xr.open_dataset(f, engine='h5netcdf', decode_times=False)
#     datasets.append(f)
# xr.concat(datasets, dim="num_lines")
    


In [None]:
data.ssha_karin

In [None]:
import cartopy.crs as ccrs
from matplotlib import pyplot as plt
import hvplot.xarray  # noqa

plt.figure(figsize=(21, 12))
ax = plt.axes(projection=ccrs.PlateCarree())
ax.set_global()
data.ssha_karin.plot.pcolormesh(
 ax=ax, transform=ccrs.PlateCarree(), x="longitude", y="latitude", add_colorbar=False
)
ax.coastlines()
#ax.set_ylim([0, 180]);