# Google Cloud CMIP6 Public Data: Basic Python Example

This notebooks shows how to query the catalog and load the data using python

In [31]:
import pip
pip.main(["install","matplotlib", "pandas", "xarray", "zarr", "gcsfs", "cftime", "dask[array]", "toolz", "nc-time-axis", "openpyxl"])

Please see https://github.com/pypa/pip/issues/5599 for advice on fixing the underlying issue.
To avoid this problem you can invoke Python with '-m pip' instead of running pip directly.


Collecting openpyxl
  Downloading openpyxl-3.0.4-py2.py3-none-any.whl (241 kB)
Collecting et-xmlfile
  Downloading et_xmlfile-1.0.1.tar.gz (8.4 kB)
Collecting jdcal
  Downloading jdcal-1.4.1-py2.py3-none-any.whl (9.5 kB)
Building wheels for collected packages: et-xmlfile
  Building wheel for et-xmlfile (setup.py): started
  Building wheel for et-xmlfile (setup.py): finished with status 'done'
  Created wheel for et-xmlfile: filename=et_xmlfile-1.0.1-py3-none-any.whl size=8915 sha256=5ce1d615eb55c9be962489267560f191b26cb3f941e04133c46668a8dc778f82
  Stored in directory: /home/jovyan/.cache/pip/wheels/e2/bd/55/048b4fd505716c4c298f42ee02dffd9496bb6d212b266c7f31
Successfully built et-xmlfile
Installing collected packages: et-xmlfile, jdcal, openpyxl
Successfully installed et-xmlfile-1.0.1 jdcal-1.4.1 openpyxl-3.0.4


0

In [2]:
from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
import xarray as xr
import zarr
import gcsfs
import cftime
import dask
import toolz
import os


xr.set_options(display_style='html')
%matplotlib inline
%config InlineBackend.figure_format = 'retina' 

In [3]:
plt.rcParams['figure.figsize'] = 12, 6

## Browse Catalog

The data catatalog is stored as a CSV file. Here we read it with Pandas.

In [4]:
df = pd.read_csv('https://storage.googleapis.com/cmip6/cmip6-zarr-consolidated-stores.csv')
df.head()

Unnamed: 0,activity_id,institution_id,source_id,experiment_id,member_id,table_id,variable_id,grid_label,zstore,dcpp_init_year
0,AerChemMIP,AS-RCEC,TaiESM1,histSST,r1i1p1f1,AERmon,od550aer,gn,gs://cmip6/AerChemMIP/AS-RCEC/TaiESM1/histSST/...,
1,AerChemMIP,BCC,BCC-ESM1,histSST,r1i1p1f1,AERmon,mmrbc,gn,gs://cmip6/AerChemMIP/BCC/BCC-ESM1/histSST/r1i...,
2,AerChemMIP,BCC,BCC-ESM1,histSST,r1i1p1f1,AERmon,mmrdust,gn,gs://cmip6/AerChemMIP/BCC/BCC-ESM1/histSST/r1i...,
3,AerChemMIP,BCC,BCC-ESM1,histSST,r1i1p1f1,AERmon,mmroa,gn,gs://cmip6/AerChemMIP/BCC/BCC-ESM1/histSST/r1i...,
4,AerChemMIP,BCC,BCC-ESM1,histSST,r1i1p1f1,AERmon,mmrso4,gn,gs://cmip6/AerChemMIP/BCC/BCC-ESM1/histSST/r1i...,


The columns of the dataframe correspond to the CMI6 controlled vocabulary. A beginners' guide to these terms is available in [this document](https://docs.google.com/document/d/1yUx6jr9EdedCOLd--CPdTfGDwEwzPpCF6p1jRmqx-0Q). 

Here we filter the data to find monthly surface air temperature for historical experiments.

In [5]:
df_ta = df.query("activity_id=='CMIP' & source_id == 'BCC-CSM2-MR' & table_id == '3hr' & variable_id == 'tas' & experiment_id == 'historical'")
df_ta

Unnamed: 0,activity_id,institution_id,source_id,experiment_id,member_id,table_id,variable_id,grid_label,zstore,dcpp_init_year
6071,CMIP,BCC,BCC-CSM2-MR,historical,r1i1p1f1,3hr,tas,gn,gs://cmip6/CMIP/BCC/BCC-CSM2-MR/historical/r1i...,


Now we do further filtering to find just the models from NCAR.

## Load Data

Now we will load a single store using gcsfs, zarr, and xarray.

In [6]:
# this only needs to be created once
gcs = gcsfs.GCSFileSystem(token='anon')

# get the path to a specific zarr store (the first one from the dataframe above)
zstore = df_ta.zstore.values[-1]

# create a mutable-mapping-style interface to the store
mapper = gcs.get_mapper(zstore)

# open it using xarray and zarr
ds = xr.open_zarr(mapper, consolidated=True)
ds

Unnamed: 0,Array,Chunk
Bytes,2.56 kB,2.56 kB
Shape,"(160, 2)","(160, 2)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 2.56 kB 2.56 kB Shape (160, 2) (160, 2) Count 2 Tasks 1 Chunks Type float64 numpy.ndarray",2  160,

Unnamed: 0,Array,Chunk
Bytes,2.56 kB,2.56 kB
Shape,"(160, 2)","(160, 2)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,5.12 kB,5.12 kB
Shape,"(320, 2)","(320, 2)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 5.12 kB 5.12 kB Shape (320, 2) (320, 2) Count 2 Tasks 1 Chunks Type float64 numpy.ndarray",2  320,

Unnamed: 0,Array,Chunk
Bytes,5.12 kB,5.12 kB
Shape,"(320, 2)","(320, 2)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,3.04 MB,379.60 kB
Shape,"(189800, 2)","(47450, 1)"
Count,9 Tasks,8 Chunks
Type,object,numpy.ndarray
"Array Chunk Bytes 3.04 MB 379.60 kB Shape (189800, 2) (47450, 1) Count 9 Tasks 8 Chunks Type object numpy.ndarray",2  189800,

Unnamed: 0,Array,Chunk
Bytes,3.04 MB,379.60 kB
Shape,"(189800, 2)","(47450, 1)"
Count,9 Tasks,8 Chunks
Type,object,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,38.87 GB,122.88 MB
Shape,"(189800, 160, 320)","(600, 160, 320)"
Count,318 Tasks,317 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 38.87 GB 122.88 MB Shape (189800, 160, 320) (600, 160, 320) Count 318 Tasks 317 Chunks Type float32 numpy.ndarray",320  160  189800,

Unnamed: 0,Array,Chunk
Bytes,38.87 GB,122.88 MB
Shape,"(189800, 160, 320)","(600, 160, 320)"
Count,318 Tasks,317 Chunks
Type,float32,numpy.ndarray


In [19]:
ds_sample = ds.sel(time = slice ('1950-01-01', '1950-03-01'), lon = slice(120,121), lat = slice(0,1))

In [20]:
len(ds_sample)

1

In [18]:
ds_sample

Unnamed: 0,Array,Chunk
Bytes,16 B,16 B
Shape,"(1, 2)","(1, 2)"
Count,3 Tasks,1 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 16 B 16 B Shape (1, 2) (1, 2) Count 3 Tasks 1 Chunks Type float64 numpy.ndarray",2  1,

Unnamed: 0,Array,Chunk
Bytes,16 B,16 B
Shape,"(1, 2)","(1, 2)"
Count,3 Tasks,1 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,32 B,32 B
Shape,"(2, 2)","(2, 2)"
Count,3 Tasks,1 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 32 B 32 B Shape (2, 2) (2, 2) Count 3 Tasks 1 Chunks Type float64 numpy.ndarray",2  2,

Unnamed: 0,Array,Chunk
Bytes,32 B,32 B
Shape,"(2, 2)","(2, 2)"
Count,3 Tasks,1 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,7.68 kB,3.84 kB
Shape,"(480, 2)","(480, 1)"
Count,11 Tasks,2 Chunks
Type,object,numpy.ndarray
"Array Chunk Bytes 7.68 kB 3.84 kB Shape (480, 2) (480, 1) Count 11 Tasks 2 Chunks Type object numpy.ndarray",2  480,

Unnamed: 0,Array,Chunk
Bytes,7.68 kB,3.84 kB
Shape,"(480, 2)","(480, 1)"
Count,11 Tasks,2 Chunks
Type,object,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,3.84 kB,3.84 kB
Shape,"(480, 1, 2)","(480, 1, 2)"
Count,319 Tasks,1 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 3.84 kB 3.84 kB Shape (480, 1, 2) (480, 1, 2) Count 319 Tasks 1 Chunks Type float32 numpy.ndarray",2  1  480,

Unnamed: 0,Array,Chunk
Bytes,3.84 kB,3.84 kB
Shape,"(480, 1, 2)","(480, 1, 2)"
Count,319 Tasks,1 Chunks
Type,float32,numpy.ndarray


In [29]:
df = ds_sample.to_dataframe().reset_index()

In [32]:
df.to_excel("test_df.xlsx")