# Access cloud data examples tutorial
 
Credits: Tutorial development
* [Dr. Chelle Gentemann](mailto:gentemann@faralloninstitute.org)    - Farallon Institute, USA
* [Dr. Ryan Abernathey](mailto:rpa@ldeo.columbia.edu) - LDEO
* [Henri Drake](mailto:hdrake@mit.edu) - MIT 

Data on the cloud can be stored in many different formats.  Here we will demonstrate some ways to get at the CMIP6, MUR SST, and AVISO data.  


First start by importing libraries


In [None]:
import warnings
import xarray as xr
import fsspec
from matplotlib import pyplot as plt
import numpy as np
import cartopy
import cartopy.crs as ccrs
import intake

warnings.simplefilter('ignore') # filter some warning messages
xr.set_options(display_style="html")  #display dataset nicely 
%matplotlib inline
plt.rcParams['figure.figsize'] = 12, 6
%config InlineBackend.figure_format = 'retina' 

### Start a cluster, the key to reading effectively on Cloud

- This will set up a cluster for you and give you a path that you can paste into the top of the Dask dashboard to visualize parts of your cluster.  
- You don't need to paste the link below into the Dask dashboard for this to work, only to see workers working.

In [None]:
from dask_kubernetes import KubeCluster
from dask.distributed import Client

In [None]:
cluster = KubeCluster(n_workers=200)

client = Client(cluster)

cluster

** ☝️ Don’t forget to click the link above or copy it to the Dask dashboard on the left to view the scheduler dashboard! **

## Read in AVISO sea surface height data using intake from the Pangeo datastore on Google Cloud

In [None]:
%%time
cat_pangeo = intake.Catalog("https://raw.githubusercontent.com/pangeo-data/pangeo-datastore/master/intake-catalogs/ocean.yaml")

ds_aviso = cat_pangeo["sea_surface_height"].to_dask()

ds_aviso

### Plot the data

In [None]:
%%time
ds_aviso['sla'].sel(time='2015-01-01').plot(vmin=-2,vmax=2,cmap='seismic')

### Plot a timeseries

In [None]:
sla_monthly = ds_aviso['sla'].resample(time='1MS').mean()

sla_monthly_timeseries = sla_monthly.mean({'latitude','longitude'})

sla_monthly_timeseries.plot(label='Full data')

plt.ylabel('Sea Level Anomaly [m]')
plt.title('Global Mean Sea Level')
plt.legend()
plt.grid()

### Read MUR SST from AWS public dataset program on an s3 bucket.  This Pangeo binder is running on Google Cloud and data access takes longer because the data is on AWS.  

In [None]:
%%time
file_location = 's3://mur-sst/zarr'

ds_sst = xr.open_zarr(fsspec.get_mapper(file_location, anon=True),consolidated=True)

ds_sst

### Read entire 18 years of data at 1 point.

In [None]:
%%time
sst_timeseries = ds_sst['analysed_sst'].sel(lat=47,lon=-145,method='nearest').load()

sst_timeseries.plot()

### The anomaly is more interesting... 

#### Use ``.groupby``to calculate the climatology and ``resample`` to then average it into 1-month bins

In [None]:
sst_climatology = sst_timeseries.groupby('time.dayofyear').mean()

sst_anomaly = sst_timeseries.groupby('time.dayofyear')-sst_climatology

sst_anomaly_monthly = sst_anomaly.resample(time='1MS').mean()

### Plot the data

In [None]:
sst_anomaly.plot()

sst_anomaly_monthly.plot()

plt.axhline(linewidth=2,color='k')

## Read CMIP6 data from Google Cloud using intake

The CMIP6 data is a huge collection of different experiements.  Access to these data uses the intake library which you then use the catalog to select specific variables, experiments, or activities.  There are some great tutorials [here](https://github.com/hdrake/cmip6-temperature-demo/) and [here](https://github.com/pangeo-data/pangeo-cmip6-examples/).

In [None]:
%%time
col = intake.open_esm_datastore("https://raw.githubusercontent.com/NCAR/intake-esm-datastore/master/catalogs/pangeo-cmip6.json")

col

## Search the collection for historical, monthly, air temperature, for one realization

In [None]:
cat_cmip = col.search(experiment_id=['ssp585','historical'],  # pick the `historical` forcing experiment
                 table_id='Amon',             # choose to look at atmospheric variables (A) saved at monthly resolution (mon)
                 variable_id='tas',           # choose to look at near-surface air temperature (tas) as our variable
                 member_id = 'r1i1p1f1')      # arbitrarily pick one realization for each model (i.e. just one set of initial conditions)


## Convert data catalog into a dictionary of xarray datasets

In [None]:
dset_dict = cat_cmip.to_dataset_dict(zarr_kwargs={'consolidated': True, 'decode_times': False})

time_slice = slice('1850','2015') # specific years that bracket our period of interest

ds_dict = {}

for name, ds in dset_dict.items():
    # rename spatial dimensions if necessary
    if ('longitude' in ds.dims) and ('latitude' in ds.dims):
        ds = ds.rename({'longitude':'lon', 'latitude': 'lat'}) 
        
    ds = xr.decode_cf(ds) # temporary hack, not sure why I need this but has to do with calendar-aware metadata on the time variable
    ds = ds.sel(time=time_slice) # subset the data for the time period of interest
    
    # drop redundant coordinates 
    for coord in ds.coords:
        if coord not in ['lat','lon','time']:
            ds = ds.drop(coord)
    
    # Add near-surface air temperature to dictionary
    ds_dict[name] = ds

## Look at how air temperatures have changed 

In [None]:
ds_cmip = ds_dict[list(ds_dict.keys())[0]]

now = ds_cmip.sel(time=slice('2000-01-01','2015-12-31')).mean('time')

then = ds_cmip.sel(time=slice('1850-01-01','1950-12-31')).mean('time')

temperature_change=(now-then)['tas']

temperature_change.attrs['long_name']='Change in air temperature (K)'


## Plot the change in temperature

In [None]:
ortho = ccrs.Orthographic(-90, 20)           # define target coordinate frame
geo = ccrs.PlateCarree()                     # define origin coordinate frame

plt.figure(figsize=(9,7))                    #set the figure size
ax = plt.subplot(1, 1, 1, projection=ortho)  #create the axis for plotting

q = temperature_change.plot(ax=ax, transform = geo, cmap='OrRd', vmin=0, vmax=2) # plot a colormap in transformed coordinates

ax.add_feature(cartopy.feature.COASTLINE)

ax.add_feature(cartopy.feature.BORDERS, linestyle='-')

plt.title('Global Warming Air Temp',fontsize=16, ha='center');

#plt.savefig('./../../airtemp_warming_patterns.png',dpi=100,bbox_inches='tight')