# Access cloud data examples tutorial
 
Credits: Tutorial development
* [Dr. Chelle Gentemann](mailto:gentemann@faralloninstitute.org)    - Farallon Institute, USA
* [Dr. Ryan Abernathey](mailto:rpa@ldeo.columbia.edu) - LDEO

Data on the cloud can be stored in many different formats.  Here we will demonstrate some ways to get at the CMIP6, MUR SST, and AVISO data.  


First start by importing libraries


In [2]:
import warnings
import xarray as xr
import fsspec
from matplotlib import pyplot as plt
import numpy as np
#import pandas as pd
#import xesmf as xe
import cartopy
import intake

warnings.simplefilter('ignore') # filter some warning messages
xr.set_options(display_style="html")  #display dataset nicely 
%matplotlib inline
plt.rcParams['figure.figsize'] = 12, 6
%config InlineBackend.figure_format = 'retina' 

## Read MUR SST from AWS public dataset program on an s3 bucket

The MUR SSTs have been reformated into a cloud optimized zarr format.  Reading the entire global, 1 km, daily, 18 years of data is as simple as pointing to the directory the files are stored in.

In [None]:
file_location = 's3://mur-sst/zarr'
ds_sst = xr.open_zarr(fsspec.get_mapper(file_location, anon=True),consolidated=True)
ds_sst

In [None]:
sst_day = ds_sst.sel(time='2015-10-01T09')
cond = (sst_day.mask==1) & ((sst_day.sea_ice_fraction<.15) | np.isnan(sst_day.sea_ice_fraction))
sst_masked = sst_day['analysed_sst'].where(cond)
sst_masked.plot()

## Read CMIP6 data from Google Cloud using intake

The CMIP6 data is a huge collection of different experiements.  Access to these data uses the intake library which you then use the catalog to select specific variables, experiments, or activities.

In [3]:
col = intake.open_esm_datastore("https://raw.githubusercontent.com/NCAR/intake-esm-datastore/master/catalogs/pangeo-cmip6.json")
col

pangeo-cmip6-ESM Collection with 229696 entries:
	> 15 activity_id(s)

	> 30 institution_id(s)

	> 66 source_id(s)

	> 100 experiment_id(s)

	> 135 member_id(s)

	> 29 table_id(s)

	> 307 variable_id(s)

	> 10 grid_label(s)

	> 229696 zstore(s)

	> 60 dcpp_init_year(s)

In [4]:
cat_cmip = col.search(experiment_id=['ssp585','historical'],  # pick the `historical` forcing experiment
                 table_id='Amon',             # choose to look at atmospheric variables (A) saved at monthly resolution (mon)
                 variable_id='tas',           # choose to look at near-surface air temperature (tas) as our variable
                 member_id = 'r1i1p1f1')      # arbitrarily pick one realization for each model (i.e. just one set of initial conditions)


## Read in AVISO sea surface height data using intake from the Pangeo datastore on Google Cloud

In [9]:
cat_pangeo = intake.Catalog("https://raw.githubusercontent.com/pangeo-data/pangeo-datastore/master/intake-catalogs/ocean.yaml")
ds_aviso = cat_pangeo["sea_surface_height"].to_dask()
ds_aviso

KeyError: 'pangeo-cmems-duacs/.zmetadata'