# NCI WeatherBench-1b: Create climatology and persistence forecasts

In this note book we will create the most basic baselines: persistence and climatology forecasts. We will do this for 500hPa geopotential, 850hPa temperature, precipitation and 2 meter temperature.

## Note: 
**Requires a 382GB ARE instance to load the entire dataset**

# For higher resolutions

Not up to date, but previous tests for Z500 and T850 showed that there was only a tiny difference in the scores for different resolutions.

In [1]:
from datetime import datetime
print( f'[{datetime.now().replace(microsecond=0)}]' )
! export DASK_LOGGING__DISTRIBUTED=error
import warnings
warnings.filterwarnings('ignore')
import os
import numpy as np
import xarray as xr
import matplotlib.pyplot as plt
import seaborn as sns
from score import *
import logging 
import xarray as xr
import dask
dask.config.set({'logging.distributed': 'error'})
from dask.distributed import Client
import gc
from dask.diagnostics import ProgressBar
import IPython
from dask.distributed import progress
from datetime import datetime

def create_persistence_forecast(ds, lead_time_h):
    assert lead_time_h > 0, 'Lead time must be greater than 0'
    ds_fc = ds.isel(time=slice(0, -lead_time_h))
    return ds_fc

def create_climatology_forecast(ds_train):
    return ds_train.mean('time')

def create_weekly_climatology_forecast(ds_train, valid_time):
    ds_train['week'] = ds_train['time.week']
    weekly_averages = ds_train.groupby('week').mean('time')
    valid_time['week'] = valid_time['time.week']
    fc_list = []
    for t in valid_time:
        fc_list.append(weekly_averages.sel(week=t.week))
    return xr.concat(fc_list, dim=valid_time)

def baseline_forecasts(res): 
    print (80*'-')
    print (f"Res: {res}")
    DATADIR = f'/g/data/wb00/NCI-Weatherbench/{res}deg/' 
    print("DATADIR:", DATADIR)
    PREDDIR = f"/scratch/vp91/{os.environ['USER']}/NCI-Weatherbench/pred_dir" # Location to store baseline forecasts
    print("PREDDIR:", PREDDIR)

    # Set the years data to load
    years       = list(range(1999, 2022+1))
    valid_years = list(range(2021, 2022+1))
    save_prefix = 'NCI_tutorial' 
    print ('load years :', years)
    print ('valid_years:',  valid_years)    
    print ('save_prefix :', save_prefix)
    
    z500_files = [ file for year in years for file in glob.glob (fr'{DATADIR}/geopotential/*{year}*')  ] 
    t850_files = [ file for year in years for file in glob.glob (fr'{DATADIR}/temperature/*{year}*')    ] 
     
    z500_valid_files = [ file for year in valid_years for file in glob.glob (fr'{DATADIR}/geopotential/*{year}*') ] 
    t850_valid_files = [ file for year in valid_years for file in glob.glob (fr'{DATADIR}/temperature/*{year}*')  ] 
           
    print (f'\nLoading data, Res: {res} ...')
    z500 = xr.open_mfdataset(z500_files, combine='by_coords', parallel=True, chunks={'time': 10}).z.sel(level=500).load()  
    t850 = xr.open_mfdataset(t850_files, combine='by_coords', parallel=True, chunks={'time': 10}).t.sel(level=850).load()  

    data = xr.merge([z500.drop('level'), t850.drop('level')])
    print (f'Loading validation data, Res: {res} ...')
    z500_valid = load_test_data(z500_valid_files, 'z', slice('2021', '2022'))
    t850_valid = load_test_data(t850_valid_files,  't', slice('2021', '2022'))   
 
    valid_data = xr.merge([z500_valid, t850_valid])
    
    print("\nPersistence forecast ...")      
    lead_times = xr.DataArray(
    np.arange(6, 126, 6), dims=['lead_time'], coords={'lead_time': np.arange(6, 126, 6)}, name='lead_time')

    persistence = []
    for l in lead_times:
        persistence.append(create_persistence_forecast(valid_data, int(l)))
    persistence = xr.concat(persistence, dim=lead_times)
    
    print ('Saving persistence forecast result:', f'{PREDDIR}/{save_prefix}_persistence_{res}.nc')
    persistence.to_netcdf(   f'{PREDDIR}/{save_prefix}_persistence_{res}.nc')
    print ( (os.path.getsize(f'{PREDDIR}/{save_prefix}_persistence_{res}.nc')/1024**3) )    
    
    print('\nClimatology forecast ...')
    train_data = data.sel(time=slice('1999', '2020'))
    climatology = create_climatology_forecast(train_data)
    print ('Saving climatology forecast result:', f'{PREDDIR}/{save_prefix}_climatology_{res}.nc')
    climatology.to_netcdf(   f'{PREDDIR}/{save_prefix}_climatology_{res}.nc')
    print ( (os.path.getsize(f'{PREDDIR}/{save_prefix}_climatology_{res}.nc')/1024**3) )
 
    print('\nWeekly climatology ...')
    weekly_climatology = create_weekly_climatology_forecast(train_data, valid_data.time)
    print ('Saving weekly climatology result:', f'{PREDDIR}/{save_prefix}_weekly_climatology_{res}.nc')
    weekly_climatology.to_netcdf(f'{PREDDIR}/{save_prefix}_weekly_climatology_{res}.nc')
    print ( (os.path.getsize    (f'{PREDDIR}/{save_prefix}_weekly_climatology_{res}.nc')/1024**3) )
    
    print ("Done")


[2024-03-23 17:43:32]


# '2.8125'

In [2]:
print( f'[{datetime.now().replace(microsecond=0)}]' )
client = Client(n_workers=12, threads_per_worker=1, silence_logs=logging.ERROR)
client

[2024-03-22 19:42:14]




0,1
Connection method: Cluster object,Cluster type: distributed.LocalCluster
Dashboard: /proxy/8787/status,

0,1
Dashboard: /proxy/8787/status,Workers: 12
Total threads: 12,Total memory: 95.00 GiB
Status: running,Using processes: True

0,1
Comm: tcp://127.0.0.1:36759,Workers: 12
Dashboard: /proxy/8787/status,Total threads: 12
Started: Just now,Total memory: 95.00 GiB

0,1
Comm: tcp://127.0.0.1:33649,Total threads: 1
Dashboard: /proxy/45963/status,Memory: 7.92 GiB
Nanny: tcp://127.0.0.1:40895,
Local directory: /jobfs/111500515.gadi-pbs/dask-scratch-space/worker-hdnyujcw,Local directory: /jobfs/111500515.gadi-pbs/dask-scratch-space/worker-hdnyujcw
GPU: Tesla V100-SXM2-32GB,GPU memory: 32.00 GiB

0,1
Comm: tcp://127.0.0.1:44367,Total threads: 1
Dashboard: /proxy/42529/status,Memory: 7.92 GiB
Nanny: tcp://127.0.0.1:42711,
Local directory: /jobfs/111500515.gadi-pbs/dask-scratch-space/worker-msi6rr2i,Local directory: /jobfs/111500515.gadi-pbs/dask-scratch-space/worker-msi6rr2i
GPU: Tesla V100-SXM2-32GB,GPU memory: 32.00 GiB

0,1
Comm: tcp://127.0.0.1:38087,Total threads: 1
Dashboard: /proxy/44273/status,Memory: 7.92 GiB
Nanny: tcp://127.0.0.1:41505,
Local directory: /jobfs/111500515.gadi-pbs/dask-scratch-space/worker-a0ept4tc,Local directory: /jobfs/111500515.gadi-pbs/dask-scratch-space/worker-a0ept4tc
GPU: Tesla V100-SXM2-32GB,GPU memory: 32.00 GiB

0,1
Comm: tcp://127.0.0.1:39465,Total threads: 1
Dashboard: /proxy/39379/status,Memory: 7.92 GiB
Nanny: tcp://127.0.0.1:35509,
Local directory: /jobfs/111500515.gadi-pbs/dask-scratch-space/worker-x51db15m,Local directory: /jobfs/111500515.gadi-pbs/dask-scratch-space/worker-x51db15m
GPU: Tesla V100-SXM2-32GB,GPU memory: 32.00 GiB

0,1
Comm: tcp://127.0.0.1:33405,Total threads: 1
Dashboard: /proxy/41329/status,Memory: 7.92 GiB
Nanny: tcp://127.0.0.1:46049,
Local directory: /jobfs/111500515.gadi-pbs/dask-scratch-space/worker-qp8e55q7,Local directory: /jobfs/111500515.gadi-pbs/dask-scratch-space/worker-qp8e55q7
GPU: Tesla V100-SXM2-32GB,GPU memory: 32.00 GiB

0,1
Comm: tcp://127.0.0.1:40917,Total threads: 1
Dashboard: /proxy/40977/status,Memory: 7.92 GiB
Nanny: tcp://127.0.0.1:41417,
Local directory: /jobfs/111500515.gadi-pbs/dask-scratch-space/worker-_tlsj719,Local directory: /jobfs/111500515.gadi-pbs/dask-scratch-space/worker-_tlsj719
GPU: Tesla V100-SXM2-32GB,GPU memory: 32.00 GiB

0,1
Comm: tcp://127.0.0.1:33565,Total threads: 1
Dashboard: /proxy/45781/status,Memory: 7.92 GiB
Nanny: tcp://127.0.0.1:34447,
Local directory: /jobfs/111500515.gadi-pbs/dask-scratch-space/worker-8xb97e6p,Local directory: /jobfs/111500515.gadi-pbs/dask-scratch-space/worker-8xb97e6p
GPU: Tesla V100-SXM2-32GB,GPU memory: 32.00 GiB

0,1
Comm: tcp://127.0.0.1:37225,Total threads: 1
Dashboard: /proxy/43613/status,Memory: 7.92 GiB
Nanny: tcp://127.0.0.1:35737,
Local directory: /jobfs/111500515.gadi-pbs/dask-scratch-space/worker-lknkf0h7,Local directory: /jobfs/111500515.gadi-pbs/dask-scratch-space/worker-lknkf0h7
GPU: Tesla V100-SXM2-32GB,GPU memory: 32.00 GiB

0,1
Comm: tcp://127.0.0.1:39799,Total threads: 1
Dashboard: /proxy/44785/status,Memory: 7.92 GiB
Nanny: tcp://127.0.0.1:46527,
Local directory: /jobfs/111500515.gadi-pbs/dask-scratch-space/worker-gmd96ff5,Local directory: /jobfs/111500515.gadi-pbs/dask-scratch-space/worker-gmd96ff5
GPU: Tesla V100-SXM2-32GB,GPU memory: 32.00 GiB

0,1
Comm: tcp://127.0.0.1:44493,Total threads: 1
Dashboard: /proxy/40199/status,Memory: 7.92 GiB
Nanny: tcp://127.0.0.1:41511,
Local directory: /jobfs/111500515.gadi-pbs/dask-scratch-space/worker-4ydlhnt9,Local directory: /jobfs/111500515.gadi-pbs/dask-scratch-space/worker-4ydlhnt9
GPU: Tesla V100-SXM2-32GB,GPU memory: 32.00 GiB

0,1
Comm: tcp://127.0.0.1:42191,Total threads: 1
Dashboard: /proxy/33893/status,Memory: 7.92 GiB
Nanny: tcp://127.0.0.1:45625,
Local directory: /jobfs/111500515.gadi-pbs/dask-scratch-space/worker-jepxqb0o,Local directory: /jobfs/111500515.gadi-pbs/dask-scratch-space/worker-jepxqb0o
GPU: Tesla V100-SXM2-32GB,GPU memory: 32.00 GiB

0,1
Comm: tcp://127.0.0.1:46381,Total threads: 1
Dashboard: /proxy/43007/status,Memory: 7.92 GiB
Nanny: tcp://127.0.0.1:33175,
Local directory: /jobfs/111500515.gadi-pbs/dask-scratch-space/worker-z0h963s9,Local directory: /jobfs/111500515.gadi-pbs/dask-scratch-space/worker-z0h963s9
GPU: Tesla V100-SXM2-32GB,GPU memory: 32.00 GiB


In [3]:
%%time
print( f'[{datetime.now().replace(microsecond=0)}]' )
baseline_forecasts('2.8125')  

[2024-03-22 19:42:15]
--------------------------------------------------------------------------------
Res: 2.8125
DATADIR: /g/data/wb00/NCI-Weatherbench/2.8125deg/
PREDDIR: /scratch/vp91/mah900/NCI-Weatherbench/pred_dir
load years : [1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022]
valid_years: [2021, 2022]
save_prefix : NCI_tutorial

Loading data, Res: 2.8125 ...
Loading validation data, Res: 2.8125 ...
load_test_data, var: z
load_test_data, var: t

Persistence forecast ...
Saving persistence forecast result: /scratch/vp91/mah900/NCI-Weatherbench/pred_dir/NCI_tutorial_persistence_2.8125.nc
21.3794065695256

Climatology forecast ...
Saving climatology forecast result: /scratch/vp91/mah900/NCI-Weatherbench/pred_dir/NCI_tutorial_climatology_2.8125.nc
6.98138028383255e-05

Weekly climatology ...
Saving weekly climatology result: /scratch/vp91/mah900/NCI-Weatherbench/pred_dir/NCI_tutorial_weekly

# '1.40625'
Start the Dask cluster

In [None]:
%%time
print( f'[{datetime.now().replace(microsecond=0)}]' )
baseline_forecasts('1.40625')

[2024-03-22 20:17:32]
--------------------------------------------------------------------------------
Res: 1.40625
DATADIR: /g/data/wb00/NCI-Weatherbench/1.40625deg/
PREDDIR: /scratch/vp91/mah900/NCI-Weatherbench/pred_dir
load years : [1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022]
valid_years: [2021, 2022]
save_prefix : NCI_tutorial

Loading data, Res: 1.40625 ...
Loading validation data, Res: 1.40625 ...
load_test_data, var: z
load_test_data, var: t

Persistence forecast ...
Saving persistence forecast result: /scratch/vp91/mah900/NCI-Weatherbench/pred_dir/NCI_tutorial_persistence_1.40625.nc


# The End