# Create noleap calendar version of COSMO-REA6 forcing data

created 2023-04-14

by Eva Lieungh, Lasse T. Keetz, Hui Tang, Yeliz Yilmaz. 

COMSO reanalysis data uses a gregorian calendar with leap years. Running long simulations (necessary for spin-up) with COMSO reanalysis data proved problematic, because the leap years don't align and the model terminates prematurely after ~100 years. 

This notebook will...
1. read in the cosmo reanalysis atmospheric forcing files
2. change the calendar attribute for all files to noleap
3. remove leap days (Feb 29) from all files
3. save the modified data

In [1]:
# import libraries
import os
import netCDF4 as nc
import datetime as dt
import numpy as np
import xarray as xr  # NetCDF data handling
import zipfile # for unzipping
import shutil # easiest whole-directory zipping
from pathlib import Path  # For easy path handling

Download COSMOREA data from evalieungh/FATES_INCLINE repo if necessary:

In [None]:
%%bash
pwd
cd ../data
pwd
wget https://raw.githubusercontent.com/evalieungh/FATES_INCLINE/main/data/ALP4_cosmorea.zip

In [2]:
# set path to data, where we have the original (gregorian) data and will save the modified version
cosmo_path = str(Path(f"C:/Users/evaler/OneDrive - Universitetet i Oslo/Eva/PHD/FATES_INCLINE/data"))

Unzip folder:

In [None]:
print("extracting ", cosmo_path + "/ALP4_cosmorea.zip")
with zipfile.ZipFile(cosmo_path + "/ALP4_cosmorea.zip", 'r') as zip_ref:
    zip_ref.extractall(cosmo_path + "/ALP4_cosmorea")

In file explorer, copy the ALP4_cosmorea folder and rename the new copy ALP4_cosmorea_noleap. Keep the original files in ALP4_cosmorea untouched.
 
Set paths to where the new files we make should be stored, overwriting the original copies:

In [3]:
# Set file path to data files
noleap_dir = str(Path(cosmo_path + f"/ALP4_cosmorea_noleap/datmdata/"))
print("Files to be modified stored here:", noleap_dir)

Files to be modified stored here: C:\Users\evaler\OneDrive - Universitetet i Oslo\Eva\PHD\FATES_INCLINE\data\ALP4_cosmorea_noleap\datmdata


List the atmospheric forcing files and check how they are structured.

In [4]:
# Print all NetCDF files in the input directory
files = [f for f in os.listdir(noleap_dir) if f.endswith('.nc')]
print(files)

['clm1pt_ALP4_1995-01.nc', 'clm1pt_ALP4_1995-02.nc', 'clm1pt_ALP4_1995-03.nc', 'clm1pt_ALP4_1995-04.nc', 'clm1pt_ALP4_1995-05.nc', 'clm1pt_ALP4_1995-06.nc', 'clm1pt_ALP4_1995-07.nc', 'clm1pt_ALP4_1995-08.nc', 'clm1pt_ALP4_1995-09.nc', 'clm1pt_ALP4_1995-10.nc', 'clm1pt_ALP4_1995-11.nc', 'clm1pt_ALP4_1995-12.nc', 'clm1pt_ALP4_1996-01.nc', 'clm1pt_ALP4_1996-02.nc', 'clm1pt_ALP4_1996-03.nc', 'clm1pt_ALP4_1996-04.nc', 'clm1pt_ALP4_1996-05.nc', 'clm1pt_ALP4_1996-06.nc', 'clm1pt_ALP4_1996-07.nc', 'clm1pt_ALP4_1996-08.nc', 'clm1pt_ALP4_1996-09.nc', 'clm1pt_ALP4_1996-10.nc', 'clm1pt_ALP4_1996-11.nc', 'clm1pt_ALP4_1996-12.nc', 'clm1pt_ALP4_1997-01.nc', 'clm1pt_ALP4_1997-02.nc', 'clm1pt_ALP4_1997-03.nc', 'clm1pt_ALP4_1997-04.nc', 'clm1pt_ALP4_1997-05.nc', 'clm1pt_ALP4_1997-06.nc', 'clm1pt_ALP4_1997-07.nc', 'clm1pt_ALP4_1997-08.nc', 'clm1pt_ALP4_1997-09.nc', 'clm1pt_ALP4_1997-10.nc', 'clm1pt_ALP4_1997-11.nc', 'clm1pt_ALP4_1997-12.nc', 'clm1pt_ALP4_1998-01.nc', 'clm1pt_ALP4_1998-02.nc', 'clm1pt_ALP

In [5]:
# print variables in the first nc file
example_file = str(Path(noleap_dir + "/" + f"clm1pt_ALP4_1995-01.nc"))
with nc.Dataset(example_file, 'r') as ds:
    # List all variables in the file
    print("Variables in the file:")
    print(ds.variables.keys())

Variables in the file:
dict_keys(['EDGEW', 'EDGEE', 'EDGES', 'EDGEN', 'LONGXY', 'LATIXY', 'SWDIFDS_RAD', 'SWDIRS_RAD', 'RAIN_CON', 'RAIN_GSP', 'SNOW_GSP', 'SNOW_CON', 'PRECTmms', 'TBOT', 'WIND', 'PSRF', 'SHUM', 'FLDS', 'time'])


In [6]:
# get more info on time variable
example_file = str(Path(noleap_dir + "/" + f"clm1pt_ALP4_1996-02.nc")) # 1996 was leap year
with nc.Dataset(example_file, 'r') as ds:
    # Access the "time" variable
    time_var = ds.variables['time']

    # Print some info
    print("Variable dimensions:", time_var.dimensions)
    print("Variable shape:", time_var.shape)
    print("Variable attributes:", time_var.ncattrs())
    print("Variable units:", time_var.units)
    print("Calendar:", time_var.calendar)

    # Print the variable values
    print("Variable values:", time_var[:])

Variable dimensions: ('time',)
Variable shape: (232,)
Variable attributes: ['standard_name', 'units', 'calendar', 'axis']
Variable units: hours since 1996-2-1 01:00:00
Calendar: standard
Variable values: [  0.   3.   6.   9.  12.  15.  18.  21.  24.  27.  30.  33.  36.  39.
  42.  45.  48.  51.  54.  57.  60.  63.  66.  69.  72.  75.  78.  81.
  84.  87.  90.  93.  96.  99. 102. 105. 108. 111. 114. 117. 120. 123.
 126. 129. 132. 135. 138. 141. 144. 147. 150. 153. 156. 159. 162. 165.
 168. 171. 174. 177. 180. 183. 186. 189. 192. 195. 198. 201. 204. 207.
 210. 213. 216. 219. 222. 225. 228. 231. 234. 237. 240. 243. 246. 249.
 252. 255. 258. 261. 264. 267. 270. 273. 276. 279. 282. 285. 288. 291.
 294. 297. 300. 303. 306. 309. 312. 315. 318. 321. 324. 327. 330. 333.
 336. 339. 342. 345. 348. 351. 354. 357. 360. 363. 366. 369. 372. 375.
 378. 381. 384. 387. 390. 393. 396. 399. 402. 405. 408. 411. 414. 417.
 420. 423. 426. 429. 432. 435. 438. 441. 444. 447. 450. 453. 456. 459.
 462. 465. 468.

******************************

### Remove leap years

I.e., remove all variable values for Feb 29 all years. In a loop through all the files in the datm folder, if the file ends with -02 (February), remove all data values when time > 670 (the last time value for Feb 28 is 699). NB! This overwrites existing files, so make sure the originals are stored safely somewhere else.

The code below (thanks chatGPT for draft and Lasse for finding a solution that actually worked!) uses xarray to open the netCDF file, decode the time values, and then convert them to Python datetime objects for further processing. It then creates a mask based on the day of the month, where days less than 29 are marked as 'True' and days greater than or equal to 29 are marked as 'False'. The xarray function where() is then used to apply the mask to the original dataset, effectively removing the leap years. Finally, the resulting dataset is saved to a new netCDF file.

In [8]:
# Directory containing netCDF files
print("Working in folder:", noleap_dir)
datm_folder = Path(noleap_dir)
output_dir = Path(noleap_dir)

# Loop through all files in the directory
for file in os.listdir(datm_folder):
   # Check if file ends with "-02.nc"
   if file.endswith("-02.nc"):

        file_path = datm_folder / file
        print("Processing file:", file_path)
        # Open input file in read-write mode
        with xr.open_dataset(file_path) as ds:
            # Get 'time' variable
            time_var = ds.variables['time']

            # Date format in NetCDF files
            date_format = '%Y-%m-%dT%H:%M:%S.%f'

            # Convert to Python datetime object
            time_dt_list = [dt.datetime.strptime(str(cur_time)[:-3], date_format)
                            for cur_time in time_var.values]
            #_ = [print(cur_dt.day) for cur_dt in time_dt_list if cur_dt.day >= 28]
            mask = [True if cur_dt.day < 29 else False for cur_dt in time_dt_list]
            
            if sum(mask) == len(mask):
                continue
            
            else:
                print(f"Leap year in {file_path}! Creating new file...")

                mask_da = xr.DataArray(mask, dims=('time',))
                
                ds_no_leap = ds.where(mask_da, drop=True)
                
                ds_no_leap.to_netcdf(output_dir / file)

Working in folder: C:\Users\evaler\OneDrive - Universitetet i Oslo\Eva\PHD\FATES_INCLINE\data\ALP4_cosmorea_noleap\datmdata
Processing file: C:\Users\evaler\OneDrive - Universitetet i Oslo\Eva\PHD\FATES_INCLINE\data\ALP4_cosmorea_noleap\datmdata\clm1pt_ALP4_1995-02.nc
Processing file: C:\Users\evaler\OneDrive - Universitetet i Oslo\Eva\PHD\FATES_INCLINE\data\ALP4_cosmorea_noleap\datmdata\clm1pt_ALP4_1996-02.nc
Leap year in C:\Users\evaler\OneDrive - Universitetet i Oslo\Eva\PHD\FATES_INCLINE\data\ALP4_cosmorea_noleap\datmdata\clm1pt_ALP4_1996-02.nc! Creating new file...
Processing file: C:\Users\evaler\OneDrive - Universitetet i Oslo\Eva\PHD\FATES_INCLINE\data\ALP4_cosmorea_noleap\datmdata\clm1pt_ALP4_1997-02.nc
Processing file: C:\Users\evaler\OneDrive - Universitetet i Oslo\Eva\PHD\FATES_INCLINE\data\ALP4_cosmorea_noleap\datmdata\clm1pt_ALP4_1998-02.nc
Processing file: C:\Users\evaler\OneDrive - Universitetet i Oslo\Eva\PHD\FATES_INCLINE\data\ALP4_cosmorea_noleap\datmdata\clm1pt_ALP4

Check that it worked by printing time values for a leap year February file, which should stop at 669 if the change was successful:

In [11]:
# get more info on time variable
example_file = str(Path(noleap_dir + "/" + f"clm1pt_ALP4_1996-02.nc"))
with nc.Dataset(example_file, 'r') as ds:
    # Access the "time" variable
    time_var = ds.variables['time']

    # Print the variable values
    print("Variable values:", time_var[:])

Variable values: [  0.   3.   6.   9.  12.  15.  18.  21.  24.  27.  30.  33.  36.  39.
  42.  45.  48.  51.  54.  57.  60.  63.  66.  69.  72.  75.  78.  81.
  84.  87.  90.  93.  96.  99. 102. 105. 108. 111. 114. 117. 120. 123.
 126. 129. 132. 135. 138. 141. 144. 147. 150. 153. 156. 159. 162. 165.
 168. 171. 174. 177. 180. 183. 186. 189. 192. 195. 198. 201. 204. 207.
 210. 213. 216. 219. 222. 225. 228. 231. 234. 237. 240. 243. 246. 249.
 252. 255. 258. 261. 264. 267. 270. 273. 276. 279. 282. 285. 288. 291.
 294. 297. 300. 303. 306. 309. 312. 315. 318. 321. 324. 327. 330. 333.
 336. 339. 342. 345. 348. 351. 354. 357. 360. 363. 366. 369. 372. 375.
 378. 381. 384. 387. 390. 393. 396. 399. 402. 405. 408. 411. 414. 417.
 420. 423. 426. 429. 432. 435. 438. 441. 444. 447. 450. 453. 456. 459.
 462. 465. 468. 471. 474. 477. 480. 483. 486. 489. 492. 495. 498. 501.
 504. 507. 510. 513. 516. 519. 522. 525. 528. 531. 534. 537. 540. 543.
 546. 549. 552. 555. 558. 561. 564. 567. 570. 573. 576. 579.

***********************

## Change the value of the calendar attribute from 'standard' to 'noleap'

Calendar is specified within the time variable. The most reliable way to change it is with 'ncatted' from the nco library.

In [10]:
%%bash
pwd
cd ../data/ALP4_cosmorea_noleap/datmdata
for f in clm1pt_ALP4*; do ncatted -O -a calendar,time,o,c,noleap $f; done

/mnt/c/Users/evaler/OneDrive - Universitetet i Oslo/Eva/PHD/FATES_INCLINE/src


Open a terminal in the datmdata folder and print info for a file to check that it worked, using
`ncdump -h clm1pt_ALP4_2001-02.nc`

paste the end of the dump here:

```
        float SHUM(time, lat, lon) ;
                SHUM:long_name = "specific humidity at the lowest atm level" ;
                SHUM:units = "kg/kg" ;
                SHUM:mode = "time-dependent" ;
                SHUM:_FillValue = 1.e+36f ;
                SHUM:missing_value = 1.e+36f ;
        float FLDS(time, lat, lon) ;
                FLDS:long_name = "incident longwave radiation" ;
                FLDS:units = "W/m**2" ;
                FLDS:mode = "time-dependent" ;
                FLDS:_FillValue = 1.e+36f ;
                FLDS:missing_value = 1.e+36f ;
        double time(time) ;
                time:standard_name = "time" ;
                time:units = "hours since 2001-2-1 01:00:00" ;
                time:calendar = "noleap" ;
                time:axis = "T" ;

// global attributes:
                :creation_date = "ti 21.2.2023 13.57.09 +0100" ;
                :history = "Mon Apr 17 14:52:36 2023: ncatted -O -a calendar,time,o,c,noleap clm1pt_ALP4_2001-02.nc\n",
                        "Original data from COSMOREA6 data" ;
                :title = "CLM single point datm input data" ;
                :conventions = "CF-1.0" ;
                :case_title = "COSMOREA6: SEEDCLIM" ;
                :NCO = "netCDF Operators version 5.0.6 (Homepage = http://nco.sf.net, Code = http://github.com/nco/nco)" ;
}

```

Re-zip folder

In [12]:
shutil.make_archive(cosmo_path + "/ALP4_cosmorea_noleap",
                    'zip', cosmo_path + "/ALP4_cosmorea_noleap")
print("zipping complete")

zipping complete


Finally, commit and push changes back to github repository so the data can be downloaded from there.