### Overview

We have many high-quality global climate datasets that has historic data on various climatic variables. Using cloud-hosted datasets and XArray, we can compute pixel-wise long-term trends. This analysis helps us identify hotspots experienceing extreme climate change.

We will be using the [TerraClimate](https://www.climatologylab.org/terraclimate.html) gridded dataset of monthly climate and climatic water balance at high spatial resolution (~4km grid size) with a long time-series. (1958-current). This is a large dataset that is hosted on a [THREDDS Data Server (TDS)](https://www.unidata.ucar.edu/software/tds/) and served using the [OPeNDAP](https://www.opendap.org/) (Open Data Access Protocol) protocol. XArray has built-in support to efficiently read and process OPeNDAP data where we can stream and process only the required pixels without downloading entire dataset.

This notebook shows how to extract a time-series of monthly maximum temperatures and compute per-pixel linear trend. The results are processed in a distributed matter using Dask and the results are saved as a GeoTIFF file.


### Setup and Data Download



In [None]:
%%capture
if 'google.colab' in str(get_ipython()):
    !pip install geopandas rioxarray cartopy dask[distributed] netCDF4

In [None]:
import cartopy
import cartopy.crs as ccrs
import os
import geopandas as gpd
import xarray as xr
import rioxarray as rxr
import matplotlib.pyplot as plt
import dask

In [None]:
data_folder = 'data'
output_folder = 'output'

if not os.path.exists(data_folder):
    os.mkdir(data_folder)
if not os.path.exists(output_folder):
    os.mkdir(output_folder)

### Selecting a Region of Interest

[GeoBondaries](https://www.geoboundaries.org/) is an open databse of political administrative boundaries. We can download and filter for the chosen admin region.

In [None]:
def download(url):
    filename = os.path.join(data_folder, os.path.basename(url))
    if not os.path.exists(filename):
        from urllib.request import urlretrieve
        local, _ = urlretrieve(url, filename)
        print(f'Downloaded {filename}')
    else:
        print(f'File {filename} exists.')

admin_boundaries = 'geoBoundariesCGAZ_ADM0.gpkg'

geoboundaries_url = 'https://github.com/wmgeolab/geoBoundaries/raw/main/releaseData/CGAZ/'
download(geoboundaries_url + admin_boundaries)


File data/geoBoundariesCGAZ_ADM0.gpkg exists.


In [None]:
admin_boundaries_filepath = os.path.join(data_folder, admin_boundaries)
admin_gdf = gpd.read_file(admin_boundaries_filepath, encoding='utf-8')

In [None]:
countries = ['GHA']
baltics_gdf = admin_gdf[admin_gdf.shapeGroup.isin(countries)]
baltics_gdf

In [None]:
fig, ax = plt.subplots(1, 1)
fig.set_size_inches(5,5)
baltics_gdf.plot(
    ax=ax,
    edgecolor='#969696',
    facecolor='none',
    alpha=0.5)
ax.set_axis_off()
plt.show()

Get the bounding box to filter the XArray dataset.

In [None]:
bounds = baltics_gdf.total_bounds
lon_min, lat_min, lon_max, lat_max = bounds

### Local Compute Cluster

Setup a Local Dask Cluster

In [None]:
from dask.distributed import Client, progress
client = Client()  # set up local cluster on the machine
client

0,1
Connection method: Cluster object,Cluster type: distributed.LocalCluster
Dashboard: http://127.0.0.1:8787/status,

0,1
Dashboard: http://127.0.0.1:8787/status,Workers: 4
Total threads: 8,Total memory: 8.00 GiB
Status: running,Using processes: True

0,1
Comm: tcp://127.0.0.1:55407,Workers: 4
Dashboard: http://127.0.0.1:8787/status,Total threads: 8
Started: Just now,Total memory: 8.00 GiB

0,1
Comm: tcp://127.0.0.1:55419,Total threads: 2
Dashboard: http://127.0.0.1:55423/status,Memory: 2.00 GiB
Nanny: tcp://127.0.0.1:55410,
Local directory: /var/folders/19/9zfvytrj1gbc3sgt0xnm_sdr0000gn/T/dask-scratch-space/worker-4o7rx_ni,Local directory: /var/folders/19/9zfvytrj1gbc3sgt0xnm_sdr0000gn/T/dask-scratch-space/worker-4o7rx_ni

0,1
Comm: tcp://127.0.0.1:55421,Total threads: 2
Dashboard: http://127.0.0.1:55425/status,Memory: 2.00 GiB
Nanny: tcp://127.0.0.1:55412,
Local directory: /var/folders/19/9zfvytrj1gbc3sgt0xnm_sdr0000gn/T/dask-scratch-space/worker-x95ya80p,Local directory: /var/folders/19/9zfvytrj1gbc3sgt0xnm_sdr0000gn/T/dask-scratch-space/worker-x95ya80p

0,1
Comm: tcp://127.0.0.1:55420,Total threads: 2
Dashboard: http://127.0.0.1:55424/status,Memory: 2.00 GiB
Nanny: tcp://127.0.0.1:55414,
Local directory: /var/folders/19/9zfvytrj1gbc3sgt0xnm_sdr0000gn/T/dask-scratch-space/worker-emv0ezox,Local directory: /var/folders/19/9zfvytrj1gbc3sgt0xnm_sdr0000gn/T/dask-scratch-space/worker-emv0ezox

0,1
Comm: tcp://127.0.0.1:55422,Total threads: 2
Dashboard: http://127.0.0.1:55429/status,Memory: 2.00 GiB
Nanny: tcp://127.0.0.1:55416,
Local directory: /var/folders/19/9zfvytrj1gbc3sgt0xnm_sdr0000gn/T/dask-scratch-space/worker-gdgc4x5j,Local directory: /var/folders/19/9zfvytrj1gbc3sgt0xnm_sdr0000gn/T/dask-scratch-space/worker-gdgc4x5j




## Analysis

In [None]:
# Choose a location within the region for plots
location = (59.4362, 24.7234) # Tallinn

### Maximum Temperature

Analyse the trend of monthly maximum temperature using TerraClimate dataset.

In [None]:
terraclimate_url = 'http://thredds.northwestknowledge.net:8080/thredds/dodsC/'
variable = 'tmax'
filename = f'agg_terraclimate_{variable}_1958_CurrentYear_GLOBE.nc'

remote_file_path = os.path.join(terraclimate_url, filename)
ds = xr.open_dataset(
    remote_file_path,
    chunks='auto',
    engine='netcdf4',
)
ds

Unnamed: 0,Array,Chunk
Bytes,220.25 GiB,127.47 MiB
Shape,"(792, 4320, 8640)","(65, 358, 718)"
Dask graph,2197 chunks in 2 graph layers,2197 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 220.25 GiB 127.47 MiB Shape (792, 4320, 8640) (65, 358, 718) Dask graph 2197 chunks in 2 graph layers Data type float64 numpy.ndarray",8640  4320  792,

Unnamed: 0,Array,Chunk
Bytes,220.25 GiB,127.47 MiB
Shape,"(792, 4320, 8640)","(65, 358, 718)"
Dask graph,2197 chunks in 2 graph layers,2197 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray


In [None]:
# Select the variable
da = ds.tmax

# Select data within the bounding box

# Make sure the data is sorted for slice() to work correctly
da = da.sortby([da.lon, da.lat])

# Add 1 degree buffer
da_subset = da.sel(
    lon=slice(lon_min - 1, lon_max + 1),
    lat=slice(lat_min - 1, lat_max + 1)
)
da_subset

Unnamed: 0,Array,Chunk
Bytes,250.63 MiB,20.57 MiB
Shape,"(792, 186, 223)","(65, 186, 223)"
Dask graph,13 chunks in 5 graph layers,13 chunks in 5 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 250.63 MiB 20.57 MiB Shape (792, 186, 223) (65, 186, 223) Dask graph 13 chunks in 5 graph layers Data type float64 numpy.ndarray",223  186  792,

Unnamed: 0,Array,Chunk
Bytes,250.63 MiB,20.57 MiB
Shape,"(792, 186, 223)","(65, 186, 223)"
Dask graph,13 chunks in 5 graph layers,13 chunks in 5 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray


Aggregate the data to yearly means.

In [None]:
da_yearly = da_subset.groupby('time.year').mean('time')

In [None]:
# Process and load the data into memory
# This may take a few minutes
# Check the Dask dashboard to see the progress
# The dashboard is not available on Colab
da_yearly = da_yearly.compute()

Plot a time-series at a single location to see the trend.

In [None]:
time_series = da_yearly.interp(lat=location[0], lon=location[1])

fig, ax = plt.subplots(1, 1)

fig.set_size_inches(15, 7)
time_series.plot.line(ax=ax, x='year', marker='o', linestyle='-', linewidth=1)
ax.set_title('Annual Mean Monthly Maximum Temperature')
plt.show()

Fit a linear trendline.

In [None]:
trend = da_yearly.polyfit('year', 1) # fit polynomial of degree 1
slope = trend.polyfit_coefficients[0,...] * 100 # per year -> per century

Visualize the slope of trendline.

In [None]:
projection = ccrs.epsg(25884) # ETRS89 / TM Baltic93

cbar_kwargs = {
    'orientation':'horizontal',
    'fraction': 0.025,
    'pad': 0.05,
    'extend':'neither',
    'label': '°C per century'
}

fig, ax = plt.subplots(1, 1, subplot_kw={'projection': projection})
fig.set_size_inches(8, 8)
slope.plot.imshow(
    ax=ax,
    cmap='YlOrBr',
    transform=ccrs.PlateCarree(),
    add_labels=False,
    cbar_kwargs=cbar_kwargs)

ax.coastlines()
ax.set_extent((lon_min,lon_max,lat_min,lat_max), crs = ccrs.PlateCarree())

plt.title(f'Slope of Monthly Maximum Temperature Trend', fontsize = 12)
plt.show()

Save the resulting slope raster as a GeoTIFF.

In [None]:
# Assign a CRS to the DataArray
slope.rio.write_crs('EPSG:4326', inplace=True)
# Clip to the GeoDataFrame coundary
clipped = slope.rio.clip(baltics_gdf.geometry.values)
# Write the file
output_slope_file = f'{variable}_slope.tif'
output_slope_path = os.path.join(output_folder, output_slope_file)
if not os.path.exists(output_slope_path):
    slope.rio.to_raster(output_slope_path)
    print('Saved the file at ', output_slope_path)

Save the yearly aggregated subset as a NetCDF.

In [None]:
local_subset_file = f'{variable}_yearly_subset.nc'
local_subset_filepath = os.path.join(output_folder, local_subset_file)
if not os.path.exists(local_subset_filepath):
    da_yearly.to_netcdf(path=local_subset_filepath)
    print('Saved the file at ', local_subset_filepath)