# W6: Open Earth and Environmental Data
- Contributer: Dr. Zhonghua Zheng, Yuan Sun
- Course Unit: Earth and Environmental Data Science (EART60702)
- Last modified date: 1 March, 2024

## Intended Learning Outcomes (ILOs)
- CMIP6 Data: Apply the steps to access and retrieve CMIP6 data, facilitating the integration of Earth system model simulations into environmental analysis (~20 mins).
- ERA5 Data: Demonstrate the ability to access ERA5 data, incorporating these datasets into climate studies to enhance the understanding of atmospheric dynamics (~20 mins).
- GEE and geemap: Execute basic operations on the Google Earth Engine (GEE) platform and the geemap Python package to analyze and visualize environmental data.

## 0. Setup

### 0.1 Please use bash commands to launch JupyterLab
```bash
# check if conda works in your local PC
conda --version
# load the environment that you created last week
conda activate myenv
# launch JupyterLab
jupyter lab
```

### 0.2 Please load the necessary Python packages

In [None]:
!conda --version

from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
import xarray as xr
import zarr 
import gcsfs
import cdsapi
import climetlab as cml

xr.set_options(display_style='html')
%matplotlib inline
%config InlineBackend.figure_format = 'retina' 

### **NOTE: If your local environment doesn't work, please run the code below to install necessary packages in Google Colab: https://colab.research.google.com/**
```python
# https://saturncloud.io/blog/how-to-install-conda-package-to-google-colab/
!pip install -q condacolab
import condacolab
condacolab.install()

# check if condacolab works
!conda --version

# please install packages below in your condacolab
!pip install --upgrade xarray zarr gcsfs cftime nc-time-axis climetlab
```

## 1. CMIP6 Data (20 mins)
- World Climate Research Programme (WCRP) Coupled Model Intercomparison Project (CMIP): https://www.wcrp-climate.org/wgcm-cmip
  > The objective of the Coupled Model Intercomparison Project (CMIP) is to better understand past, present and future climate changes arising from natural, unforced variability or in response to changes in radiative forcing in a multi-model context.
- The Google Cloud CMIP6 data are derived from the original CMIP6 data files, as distributed via the Earth System Grid Federation (ESGF): https://console.cloud.google.com/marketplace/product/noaa-public/cmip6
- A beginners' guide to these terms is available in this document: https://docs.google.com/document/d/1yUx6jr9EdedCOLd--CPdTfGDwEwzPpCF6p1jRmqx-0Q
- Consistent with the CMIP6 terms of use, some modifications have been made to render the data more analysis-ready, including concatenation of time slices and conversion from netCDF to Zarr: https://zarr.readthedocs.io/en/stable/
  > Zarr is a file storage format for chunked, compressed, N-dimensional arrays based on an open-source specification.

### 1.1 query the data catalogue

In [None]:
# The data catalogue is stored as a CSV file. Here we read it with pandas.
df = pd.read_csv('https://storage.googleapis.com/cmip6/cmip6-zarr-consolidated-stores.csv')
df.head()

In [None]:
df_tas = df.query("activity_id=='CMIP' & institution_id=='NCAR' & experiment_id=='historical' & table_id=='Amon' & variable_id=='tas'")
df_tas

### Qestion 1.1: 
- **based on the current `df_tas`, please create a `df_tas_CESM` with `CESM2` as the `source_id`, and `r1i1p1f1` as the `member_id`**
- 'tas' refers to Near-Surface Air Temperature [K]. 
- More information about CMIP variables: https://docs.google.com/spreadsheets/d/1UUtoz6Ofyjlpx5LdqhKcwHFz2SGoTQV2_yekHyMfL9Y/edit#gid=1221485271

### 1.2 load the data from Cloud

In [None]:
# load data using gcsfs, zarr, and xarray
# Google Cloud Storage (GCS) Authentication. this only needs to be created once
gcs = gcsfs.GCSFileSystem(token='anon') # use an anonymous token

# get the path to a specific zarr store (the first one from the dataframe above)
zstore = df_tas_CESM.zstore.values[-1] # retrieves the path from the last entry of the 'zstore' column 

# create a mutable-mapping-style interface to the store
mapper = gcs.get_mapper(zstore)

# open it using xarray and zarr
ds = xr.open_zarr(mapper, consolidated=True)
ds

In [None]:
# Plot a map from a specific date.
ds.tas.sel(time='2012-12-15').squeeze().plot()

### Question 1.2: 
- **What were the mean temperatures in Manchester (53.48° N, 2.24° W) and London (51.51° N, 0.13° W)?** 
- Please use `time = slice('2012-01', '2012-12')` for calculating the mean value
- Hints: https://docs.xarray.dev/en/latest/user-guide/indexing.html

### Question 1.3: 
- **What were the mean temperatures in Manchester (53.48° N, 2.24° W) and London (51.51° N, 0.13° W) for `CESM2-WACCM` and `r1i1p1f1`?**
- Please use time = slice('2012-01', '2012-12') for calculating the mean value
- Hints: https://docs.xarray.dev/en/latest/user-guide/indexing.html

## 2. ERA5 data (20 mins)

- Reanalysis: https://www.ecmwf.int/en/about/media-centre/focus/2023/fact-sheet-reanalysis
- ERA5 is the fifth generation ECMWF atmospheric reanalysis of the global climate covering the period from January 1940 to present. ERA5 is produced by the Copernicus Climate Change Service (C3S) at ECMWF: https://www.ecmwf.int/en/forecasts/dataset/ecmwf-reanalysis-v5
- C3S Climate Data Store: https://cds.climate.copernicus.eu/#!/home
- ERA5 data documentation: https://confluence.ecmwf.int/display/CKB/ERA5%3A+data+documentation
- ERA5 data is also available at Google Earth Engine: https://developers.google.com/earth-engine/datasets/catalog/ECMWF_ERA5_DAILY#colab-python

**Task: Please get the "2 metre temperature" for Manchester (53.48° N, 2.24° W) on 2012-12-12 at 12:00**

### Recommend
- via official api
>**Note**: Set up first. You can find `url` and `key (API token)` at https://cds.climate.copernicus.eu/ (Click the `Your profile` at the top right corner).  
Or follow the instruction of CDSAPI setup step by step: https://cds.climate.copernicus.eu/how-to-api.

### Option 1
- via ECMWF's CliMetLab: https://climetlab.readthedocs.io/en/latest/examples/03-source-cds.html

### Option 2
- via Google Cloud: https://cloud.google.com/storage/docs/public-datasets/era5

### Option 3
- via GEE: https://developers.google.com/earth-engine/datasets/catalog/ECMWF_ERA5_DAILY#colab-python

### Option 4
- via AWS: https://registry.opendata.aws/ecmwf-era5/

### Option 5
- via NCAR Research Data Archive: https://rda.ucar.edu/dssearch/?words=era5

### Option X
- ...

In [None]:
c = cdsapi.Client()
c.retrieve(
    'reanalysis-era5-single-levels',
    {
        'variable': '2t',
        'product_type': 'reanalysis',
        'area': [53.5, -2.3, 53.4, -2.2],
        'date': '2012-12-12',
        'time': '12:00',
        'format': 'netcdf',
    },
    'download.nc'
)

## 3. Google Earth Engine and geemap
- Please complete the tutorial: Introduction to Google Earth Engine: https://www.google.com/earth/outreach/learn/introduction-to-google-earth-engine/
- Please select and complete the tutorial(s) that you are interested in: https://geemap.org/tutorials/#geemap-tutorials

## 4. Data for Project 2 (reproducible research)
Please download the data here: https://www.dropbox.com/scl/fo/dmabz9pf3167l62612h5b/h?rlkey=ge8u486w7w7vq8vnpr2f1fvag&dl=0

In [None]:
ds = xr.open_dataset("~/Downloads/008_2006_2080_352_360.nc")
ds["TREFMXAV_U"].mean(dim=["time"]).plot()