# ERA5 850 hPa Temperature (Zarr v3 on GCS) — Xarray Intro

This notebook introduces **xarray** concepts using **ERA5 hourly temperature on pressure levels** from Google Cloud (ARCO ERA5, Zarr v3). We will:
- Open a **Zarr v3** store with anonymous GCS access
- Select **850 hPa** and convert Kelvin → °C
- Pick a month to keep things lightweight
- Subset a geographic region
- Compute area-weighted time series and simple variability
- Make maps of month-mean, std, and range

**Deliverables:** Answer the **🧩 Questions** inline and complete the TODOs.


## 0) Setup

In [1]:
import xarray as xr
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import dask

plt.rcParams['figure.figsize'] = (10,5)
plt.rcParams['figure.dpi'] = 120

print('xarray', xr.__version__)


xarray 2025.9.0


In [2]:
from dask.distributed import Client
client = Client()  # set up local cluster on your machine
client

0,1
Connection method: Cluster object,Cluster type: distributed.LocalCluster
Dashboard: http://127.0.0.1:8787/status,

0,1
Dashboard: http://127.0.0.1:8787/status,Workers: 4
Total threads: 4,Total memory: 16.00 GiB
Status: running,Using processes: True

0,1
Comm: tcp://127.0.0.1:37817,Workers: 0
Dashboard: http://127.0.0.1:8787/status,Total threads: 0
Started: Just now,Total memory: 0 B

0,1
Comm: tcp://127.0.0.1:45351,Total threads: 1
Dashboard: http://127.0.0.1:34469/status,Memory: 4.00 GiB
Nanny: tcp://127.0.0.1:32941,
Local directory: /tmp/dask-scratch-space/worker-krwzfafk,Local directory: /tmp/dask-scratch-space/worker-krwzfafk

0,1
Comm: tcp://127.0.0.1:36515,Total threads: 1
Dashboard: http://127.0.0.1:40481/status,Memory: 4.00 GiB
Nanny: tcp://127.0.0.1:33743,
Local directory: /tmp/dask-scratch-space/worker-koekyymx,Local directory: /tmp/dask-scratch-space/worker-koekyymx

0,1
Comm: tcp://127.0.0.1:35431,Total threads: 1
Dashboard: http://127.0.0.1:44297/status,Memory: 4.00 GiB
Nanny: tcp://127.0.0.1:41473,
Local directory: /tmp/dask-scratch-space/worker-lnk8kddz,Local directory: /tmp/dask-scratch-space/worker-lnk8kddz

0,1
Comm: tcp://127.0.0.1:46483,Total threads: 1
Dashboard: http://127.0.0.1:45117/status,Memory: 4.00 GiB
Nanny: tcp://127.0.0.1:44923,
Local directory: /tmp/dask-scratch-space/worker-9y4v9nkb,Local directory: /tmp/dask-scratch-space/worker-9y4v9nkb


## 1) Open ARCO ERA5 (Zarr v3) on Google Cloud

We will open the **ARCO ERA5** store and then constrain to a single month (hourly data) to make the exercise fast.

**🧩 Questions**
1. Why do we pass `storage_options={'token': 'anon'}`?
2. What is Zarr **v3** and how is it different from v2 for cloud data?
3. After opening, what are the coordinates and units for temperature?


In [3]:
import gcsfs
fs = gcsfs.GCSFileSystem(token="anon")
# Should list arrays/metadata keys under the store
print(fs.ls("gcp-public-data-arco-era5/ar/full_37-1h-0p25deg-chunk-1.zarr-v3")[:10])

['gcp-public-data-arco-era5/ar/full_37-1h-0p25deg-chunk-1.zarr-v3/.zattrs', 'gcp-public-data-arco-era5/ar/full_37-1h-0p25deg-chunk-1.zarr-v3/.zgroup', 'gcp-public-data-arco-era5/ar/full_37-1h-0p25deg-chunk-1.zarr-v3/.zmetadata', 'gcp-public-data-arco-era5/ar/full_37-1h-0p25deg-chunk-1.zarr-v3/100m_u_component_of_wind', 'gcp-public-data-arco-era5/ar/full_37-1h-0p25deg-chunk-1.zarr-v3/100m_v_component_of_wind', 'gcp-public-data-arco-era5/ar/full_37-1h-0p25deg-chunk-1.zarr-v3/10m_u_component_of_neutral_wind', 'gcp-public-data-arco-era5/ar/full_37-1h-0p25deg-chunk-1.zarr-v3/10m_u_component_of_wind', 'gcp-public-data-arco-era5/ar/full_37-1h-0p25deg-chunk-1.zarr-v3/10m_v_component_of_neutral_wind', 'gcp-public-data-arco-era5/ar/full_37-1h-0p25deg-chunk-1.zarr-v3/10m_v_component_of_wind', 'gcp-public-data-arco-era5/ar/full_37-1h-0p25deg-chunk-1.zarr-v3/10m_wind_gust_since_previous_post_processing']


In [8]:
ds = xr.open_zarr(
    'gs://gcp-public-data-arco-era5/ar/full_37-1h-0p25deg-chunk-1.zarr-v3',
    chunks=None,
    storage_options=dict(token='anon'),
)
ds = ds.sel(time=slice(ds.attrs['valid_time_start'], ds.attrs['valid_time_stop']))

ds

## 2) Select variable/level and a single month

We'll select **temperature (`temperature`) at 850 hPa**, convert to **°C**, and then restrict to one calendar month to keep computations light.

**🧩 Questions**
1. Which dimension name does this store use for pressure levels (`isobaricInhPa`, `level`, or something else)?
2. What is the native unit of `temperature`?


In [None]:
# Pick a month (YYYY-MM) for the exercise
month = "2020-01"   # TODO: change to another month if desired
t0 = pd.to_datetime(month + "-01")
t1 = (t0 + pd.offsets.MonthEnd(1)).strftime("%Y-%m-%d")



## 3) Explore coordinates & variables

**🧩 Questions**
1. What are the dimensions of `t850` (order and sizes)?
2. What is the longitude convention (0–360 or −180–180)?
3. Are there any missing values in your month? How are they represented?


In [None]:
list(dss.data_vars), list(dss.coords), dss[var]

## 4) Region subsetting

Choose a geographic box (edit as desired):
- **Central U.S.:** lon −110 to −80, lat 25 to 45
- **Europe:** lon −10 to 30, lat 35 to 60
- **Argentina:** lon −75 to −55, lat −40 to −20

**🧩 Questions**
1. Why might we need to convert longitudes from [-180,180] to [0,360]?
2. How many grid points (lat×lon) and time steps are in your subset?


## 5) Area-weighted region-mean (hourly) and daily mean

Compute an **area-weighted** (cos(lat)) region-mean time series. Then, compute a **daily mean** series from the hourly data.


## 8) Export a small deliverable (optional)