![](https://lh3.googleusercontent.com/url?sa=i&url=https%3A%2F%2Faihub.org%2F2024%2F05%2F15%2Fintroducing-africlimate-ai%2F&psig=AOvVaw3Kq4dEgmPFIA9-UeqWb4lf&ust=1724598696792000&source=images&cd=vfe&opi=89978449&ved=0CBQQjRxqFwoTCKi70_f0jYgDFQAAAAAdAAAAABAE)

![](https://www.sarao.ac.za/wp-content/uploads/2024/07/geo-ai-africa-logo.jpg)

A project proposed in the context of the [5th Big Data Africa School](https://www.sarao.ac.za/students/5th-big-data-africa-school/) by Sabrina Amrouche on behalf of the GeoAI-Africa community.


# Benchmarking Reanalysis data with observation data from [TAHMO](https://tahmo.org/)

The Trans-African Hydro-Meteorological Observatory (TAHMO) aims to develop a vast network of weather stations across Africa. Current and historic weather data is important for agricultural, climate monitoring, and many hydro-meteorological applications.

<div>
<img src="https://tahmo.org/wp-content/uploads/2015/03/weatherstation_kenya.jpg" width="500"/>
</div>

TAHMO’s goal is to install 20,000 on-the-ground sensing stations every 30 km across the African continent, specifically designed to provide rainfall, temperature, and other critical data from robust redundant sensors.

The data recorded every five minutes is sent via a cellular network near real time to a server where the data undergoes quality control.

**In this tutorial we will be looking at data samples collected over Kenya.**

## Loading Observation data

In [None]:
# Run the code below to access cloud data files from the public AfriClimate AI bucket
from google.colab import auth
import pandas as pd
auth.authenticate_user()

In [None]:
# list of all available stations in Kenya, switch stations and see the impact
list_kenya_stations=['TA00018', 'TA00025', 'TA00029', 'TA00030', 'TA00057', 'TA00077', 'TA00130', 'TA00141', 'TA00156', 'TA00354']
import pandas as pd
def get_tahmo_station_data(station_id='TA00025', return_all=False):

  # Directly read the specific station data
  obs_path = f'gs://africlimateai/tahmo/kenya/{station_id}.csv'
  df_data = pd.read_csv(obs_path)

  return df_data
# we can see two years of hourly data with several variables.
selected_station=list_kenya_stations[1]
df_data=get_tahmo_station_data(selected_station)
df_data.head()

### Aggregate to daily observations - Try it

---


How would you convert these hourly measurements to daily ? Try it by completing the following code

In [None]:
import pandas as pd
df_data['timestamp'] = pd.to_...(df_data['timestamp'])
df_data_daily=df_data.set_index(..)
df_data_daily.resample(..)

#### Solution

In [None]:
# Solution ?
import pandas as pd
df_data['timestamp'] = pd.to_datetime(df_data['timestamp'])
df_data_ts=df_data.set_index('timestamp')
df_tahmo_daily=df_data_ts.resample('D').mean()
df_tahmo_daily.head()
# is this result correct? There is one special column that did not like our transformation. Which is it?


In [None]:
# Solution
df_tahmo_daily['precipitation (mm)'] = df_data_ts['precipitation (mm)'].resample('D').sum()
df_tahmo_daily.head()

# ERA5 

ERA5 is the latest climate reanalysis produced by ECMWF, providing hourly data on many atmospheric, land-surface and sea-state parameters together with estimates of uncertainty.

https://cds.climate.copernicus.eu/datasets/reanalysis-era5-single-levels?tab=overview

In [None]:
#!pip install -qq gcsfs zarr dask cartopy
!pip install -qq zarr cartopy

In [None]:
# loading ERA5 from WB2
import xarray as xr
era5 = xr.open_zarr(
    'gs://weatherbench2/datasets/era5/1959-2023_01_10-wb13-6h-1440x721_with_derived_variables.zarr')

In [None]:
era5