# RRTMGP-NN training

*Last edited: 2024-10-30*

The goal of this Notebook is to generate the trained neural network (NN) model, which will later be used in the RRTMGP-NN model to simulate the optical scheme, as described by Ukkonen & Hoggan (2023). The training implementation uses TensorFlow/Python and, in addition, Fortran routines (RRTMGP) are used to generate the training dataset. Since the idea is to simulate the RRTMGP, then it itself is used to generate the training dataset. The trained model is saved to disk, for future use, using the NetCDF file format. The RRTMGP is used to calculate gas optics and is part of ecRad which in turn is a modular radiation scheme within the IFS. It uses a k-distribution method to represent the absorption and scattering properties of gases, aerosols, and clouds.

Inputs of RRTMGP:

- Atmospheric Profiles: Temperature, pressure, and gas concentrations (e.g., CO2, O3, CH4) at different atmospheric levels.

- Surface Properties: Albedo (reflectivity), surface temperature, and surface type (land, sea, ice, etc.).

- Cloud Properties: Cloud cover, cloud type, and cloud optical properties.

- Aerosol Properties: Presence and characteristics of aerosols in the atmosphere.

- Radiation Data: Solar and thermal infrared radiation data.

Outputs of RRTMGP:

- Radiative Fluxes: Calculated radiative fluxes (shortwave and longwave) at different atmospheric levels.

- Heating Rates: Radiative heating and cooling rates, which are crucial for understanding atmospheric temperature changes.

- Optical Properties: Information on the absorption and scattering properties of gases, aerosols, and clouds.

The ecRad module in the Integrated Forecasting System (IFS) of the ECMWF (European Centre for Medium-Range Weather Forecasts) is designed to calculate atmospheric radiative transfer.

Outputs of ecRad:

- Irradiance Profiles: Computes solar (shortwave) and thermal infrared (longwave) radiation profiles through different atmospheric layers.

- Radiative Heating and Cooling Rates: These rates are crucial for understanding how radiation influences atmospheric temperature and dynamics.

- Optical Properties: Information about the optical characteristics of aerosols, clouds, and gases.

Inputs of ecRad:

- Atmospheric Profiles: Data on temperature, pressure, humidity, and wind at different atmospheric levels

- Surface Properties: Information about the Earth's surface, such as albedo (reflectivity), surface temperature, and type (land, sea, ice, etc.)

- Cloud Properties: Data on cloud cover, cloud type, and cloud optical properties

- Aerosol Properties: Information about the presence and characteristics of aerosols in the atmosphere

- Radiation Data: Solar and thermal infrared radiation data

- Gas Concentrations: Concentrations of gases like CO2, O3, CH4, and others that affect radiative transfer

## Based on

- [UKK23] Ukkonen, P., & Hogan, R. J. (2023). Implementation of a machine-learned gas optics parameterization in the ECMWF Integrated Forecasting System: RRTMGP-NN 2.0. Geoscientific Model Development, 16(11), 3241–3261. <https://doi.org/10.5194/gmd-16-3241-2023>. Sources: <https://github.com/peterukk/rte-rrtmgp-nn>
- Most of the code and data contained in this repository are forks of the originals, and contain some experiments I did.

## Documentation

- ECMWF Radiation Scheme Home. <https://confluence.ecmwf.int/display/ECRAD>
- [EUG22] Hogan, R. J. ecRad Radiation Scheme: User Guide. Version 1.5 (June 2022) applicable to ecRad version 1.5.x . [[Unofficial Markdown/HTML version](ecrad-radiation-user-guide-2022.md)]. [[Original PDF version](https://confluence.ecmwf.int/download/attachments/70945505/ecrad_documentation.pdf?version=5&modificationDate=1655480733414&api=v2)].
- Documentation described in the work of Ukkonen&Hogan (2023).
- The README.md at the root of the repo contains additional information.

## Data and code sources

- Ukkonen, P. (2024). "peterukk/rte-rrtmgp-nn". <https://github.com/peterukk/rte-rrtmgp-nn>
- Ukkonen, P., Pincus, R., Hillman, B. R., Norman, M., fomics, & Heerwaarden, C. van. (2022). "peterukk/rte-rrtmgp-nn: 2.0" (Version 2.0). <https://doi.org/10.5281/zenodo.7413935>
- Ukkonen, P. (2022). Code and extensive data for training neural networks for radiation, used in “Implementation of a machine-learned gas optics parameterization in the ECMWF Integrated Forecasting System: RRTMGP-NN 2.0”". Dataset. <https://doi.org/10.5281/zenodo.7413952>
- Ukkonen, P. (2022). Optimized version of the ecRad radiation scheme with new RRTMGP-NN gas optics. <https://doi.org/10.5281/zenodo.7852526>

## Dataset

Due to space constraints on Github, the datasets are not hosted in this repository and must be downloaded from the links mentioned in the README.md file. Additionally, some data files are created during the execution of the routines, such as the NN training dataset. Some of the data directories used in this Notebook are:

- neural/data
- rrtmgp/data
- examples/rfmip-clear-sky/data
- examples/rfmip-clear-sky/output_fluxes
- examples/rrtmgp-nn-training/data
- examples/rrtmgp-nn-training/inputs_to_RRTMGP

## Fortran and dependencies

```bash
# apt install gfortran libopenblas-dev libnetcdf-dev libnetcdff-dev
```

## Notes

- The training, which uses TensorFlow, is located in the `examples/rrtmgp-nn-training/` directory.
- To perform the prediction, "g-point" vectors are used, containing:
    - LW: Planck fraction, absorption cross-section, or both
    - SW: Absorption cross-section, or Rayleigh cross-section
- Models are saved to `data/` directory with a file name containing the custom radiation scores.

## Organization of this Notebook

This notebook is basically divided into 2 parts

- Part 1 - Generation of the training dataset
    - Uses Fortran
- Part 2 - Training the neural network
    - Uses TensorFlow/Python

## Input files

Described in [UKK23]. Different subsets of the gas optics training data. The original data contained only profiles (or columns) and were extended to sample different greenhouse gas scenarios and/or temperature ranges in different experiments. For further information: Sect. 3.2 of Ukkonen et al. (2020).

    RFMIP INPUT FILE
        - inputs_Garand_BIG.nc  (Garand et al., 2001)
        - inputs_AMON_ssp245_ssp585_2054_2100.nc  (GCM data. Pincus et al., 2019)
        - inputs_CAMS_new_CKDMIPstyle.nc  (Inness et al., 2019)
        - inputs_CKDMIP-MM-Big.nc  (Hogan and Matricardi, 2020)
    
    K-DISTRIBUTION FILE
        - rrtmgp-data-lw-g128-210809.nc

- CAMS = Copernicus Atmosphere Monitoring Service
- AMON = Ambient Ammonia Monitoring Network
- CKDMIP = Correlated K-Distribution Model Intercomparison Project
- GCM = Global Climate Models

## Output File Format

Please see [ecRad Radiation Scheme: User Guide](ecrad-radiation-user-guide-2022.md).


---

## Part 1 - Generation of the training dataset

Based on: <https://github.com/peterukk/rte-rrtmgp-nn/blob/main/examples/rrtmgp-nn-training/Readme.md>

The `rrtmgp_lw_gendata_rfmipstyle.F90` program is used to generate training data to be used in the RRTMGP-LW emulator neural network.

- The RRTMGP inputs, used as NN inputs, are layer-wise RRTMGP input variables (T, p, gas concentrations).

- The RRTMGP outputs are
    - optical depth *tau*
    - Single-Scattering Albedo (SSA).

These may be utilized as NN outputs, but to complement Ukkonen's (2020) methodology, which uses two different NNs to

- forecast the absorption cross section, and
- Planck fractions

the number of dry air molecule layers (N) is additionally recorded on disk.

## rrtmgp-nn-training

From the base directory (RRTMGP/), go to the rrtmgp-nn-training directory where the Fortran sources are for generating the training data sets.

In [1]:
import os
import netCDF4 as nc

In [3]:
DIRBASE="/home/x/git/radnn/"
DIRDATA="/home/x/data/"

In [4]:
os.chdir(DIRBASE+"ukk23test01/examples/rrtmgp-nn-training/")

## Build the Fortran code

Environment variables configuration:

In [69]:
%env FC=gfortran
%env FCFLAGS=-ffree-line-length-none -m64 -march=native -O3 -lcurl
%env NCHOME=/usr
%env NFHOME=/usr
%env BLASLIB=openblas

env: FC=gfortran
env: FCFLAGS=-ffree-line-length-none -m64 -march=native -O3 -lcurl
env: NCHOME=/usr
env: NFHOME=/usr
env: BLASLIB=openblas


In [70]:
! make clean

VAR="../../"
rm rrtmgp_lw_gendata_rfmipstyle rrtmgp_sw_gendata_rfmipstyle *.o *.mod *.optrpt
rm: cannot remove '*.optrpt': No such file or directory
make: [Makefile:151: clean] Error 1 (ignored)


In [71]:
! make

VAR="../../"
gfortran -ffree-line-length-none -m64 -march=native -O3 -lcurl -DNC_NETCDF4 -I../..//build -I/usr/include -c ../mo_simple_netcdf.F90 -fopenmp
gfortran -ffree-line-length-none -m64 -march=native -O3 -lcurl -DNC_NETCDF4 -I../..//build -I/usr/include -c easy_netcdf.F90 -fopenmp
gfortran -ffree-line-length-none -m64 -march=native -O3 -lcurl -DNC_NETCDF4 -I../..//build -I/usr/include -c mo_io_rfmipstyle_generic.F90 -fopenmp
gfortran -ffree-line-length-none -m64 -march=native -O3 -lcurl -DNC_NETCDF4 -I../..//build -I/usr/include -c ../mo_load_coefficients.F90 -fopenmp
gfortran -ffree-line-length-none -m64 -march=native -O3 -lcurl -DNC_NETCDF4 -I../..//build -I/usr/include -c rrtmgp_sw_gendata_rfmipstyle.F90 -fopenmp
gfortran -ffree-line-length-none -m64 -march=native -O3 -lcurl -DNC_NETCDF4 -o rrtmgp_sw_gendata_rfmipstyle rrtmgp_sw_gendata_rfmipstyle.o mo_simple_netcdf.o easy_netcdf.o mo_io_rfmipstyle_generic.o mo_load_coefficients.o ../..//build/librte.a ../..//build/librrtmgp.

Usage:

```bash
$ ./rrtmgp_lw_gendata_rfmipstyle \
        [block_size] \
        [rfmip input file] \
        [k-distribution file] \
        [file to save NN inputs/output]
```

- Once built, the next step is to generate the training data.
- The block size is the number of columns to be computed at a time, and must be an integer such that the remainder of dividing ncol*nexp by block_size is zero (block_size = 3 worked in all cases).

In [72]:
%env BLOCK_SIZE = 3

env: BLOCK_SIZE=3


## Training data generation

Using Fortran executables

In [77]:
%%bash
./rrtmgp_lw_gendata_rfmipstyle \
    $BLOCK_SIZE \
    inputs_to_RRTMGP/inputs_Garand_BIG.nc           \ # [rfmip input file]
    ../../rrtmgp/data/rrtmgp-data-lw-g128-210809.nc \ # [k-distribution file]
    data/ml_training_lw_g128_Garand_BIG.nc

 Usage: rrtmgp_rfmip_lw [block_size] [rfmip_file] [k-distribution_file] input_output file]
 input fileinputs_to_RRTMGP/inputs_Garand_BIG.nc                                                                                               
 ncol:          42 nexp:         322 nlay:          42
 Doing         4508 blocks of size            3
 Calculation uses gases: water_vapor ozone carbon_dioxide methane nitrous_oxide oxygen nitrogen cfc11 cfc12 carbon_monoxide carbon_tetrachloride hcfc22 hfc143a hfc125 hfc23 hfc32 hfc134a cf4 
 min of play   19.2751713     k_dist%get_press_min()   1.00518358    
 -------------------------------------------------------------------------
 starting clear-sky longwave computations
 Finished with computations!
 mean of flux_down is:   84.6862488    
 mean of flux_up is:   277.106110    
 -------------------------------------------------------------------------
 Attempting to save RRTMGP input/output to data/ml_training_lw_g128_Garand_BIG.nc                    

In [78]:
%%bash
./rrtmgp_lw_gendata_rfmipstyle \
    $BLOCK_SIZE \
    inputs_to_RRTMGP/inputs_AMON_ssp245_ssp585_2054_2100.nc \
    ../../rrtmgp/data/rrtmgp-data-lw-g128-210809.nc \
    data/ml_training_lw_g128_AMON_ssp245_ssp585_2054_2100.nc

 Usage: rrtmgp_rfmip_lw [block_size] [rfmip_file] [k-distribution_file] input_output file]
 input fileinputs_to_RRTMGP/inputs_AMON_ssp245_ssp585_2054_2100.nc                                                                             
 ncol:         420 nexp:         200 nlay:          19
 Doing        28000 blocks of size            3
 Calculation uses gases: water_vapor ozone carbon_dioxide methane nitrous_oxide oxygen nitrogen cfc11 cfc12 carbon_monoxide carbon_tetrachloride hcfc22 hfc143a hfc125 hfc23 hfc32 hfc134a cf4 
 min of play   100.000000     k_dist%get_press_min()   1.00518358    
 -------------------------------------------------------------------------
 starting clear-sky longwave computations
 Finished with computations!
 mean of flux_down is:   83.7984009    
 mean of flux_up is:   279.662506    
 -------------------------------------------------------------------------
 Attempting to save RRTMGP input/output to data/ml_training_lw_g128_AMON_ssp245_ssp585_2054_2100.nc  

In [79]:
%%bash
./rrtmgp_lw_gendata_rfmipstyle \
    $BLOCK_SIZE \
    inputs_to_RRTMGP/inputs_CAMS_new_CKDMIPstyle.nc \
    ../../rrtmgp/data/rrtmgp-data-lw-g128-210809.nc \
    data/ml_training_lw_g128_CAMS_new_CKDMIPstyle.nc

 Usage: rrtmgp_rfmip_lw [block_size] [rfmip_file] [k-distribution_file] input_output file]
 input fileinputs_to_RRTMGP/inputs_CAMS_new_CKDMIPstyle.nc                                                                                     
 ncol:        1000 nexp:          42 nlay:          60
 Doing        14000 blocks of size            3
 Calculation uses gases: water_vapor ozone carbon_dioxide methane nitrous_oxide oxygen nitrogen cfc11 cfc12 carbon_monoxide carbon_tetrachloride hcfc22 hfc143a hfc125 hfc23 hfc32 hfc134a cf4 
 min of play   10.0000000     k_dist%get_press_min()   1.00518358    
 -------------------------------------------------------------------------
 starting clear-sky longwave computations
 Finished with computations!
 mean of flux_down is:   94.9205475    
 mean of flux_up is:   257.382263    
 -------------------------------------------------------------------------
 Attempting to save RRTMGP input/output to data/ml_training_lw_g128_CAMS_new_CKDMIPstyle.nc          

In [80]:
%%bash
./rrtmgp_lw_gendata_rfmipstyle \
    $BLOCK_SIZE \
    inputs_to_RRTMGP/inputs_CKDMIP-MM-Big.nc \
    ../../rrtmgp/data/rrtmgp-data-lw-g128-210809.nc \
    data/ml_training_lw_g128_CKDMIP-MMM-Big.nc

 Usage: rrtmgp_rfmip_lw [block_size] [rfmip_file] [k-distribution_file] input_output file]
 input fileinputs_to_RRTMGP/inputs_CKDMIP-MM-Big.nc                                                                                            
 ncol:         243 nexp:          58 nlay:          52
 Doing         4698 blocks of size            3
 Calculation uses gases: water_vapor ozone carbon_dioxide methane nitrous_oxide oxygen nitrogen cfc11 cfc12 carbon_monoxide carbon_tetrachloride hcfc22 hfc143a hfc125 hfc23 hfc32 hfc134a cf4 
 min of play  0.504999995     k_dist%get_press_min()   1.00518358    
 -------------------------------------------------------------------------
 starting clear-sky longwave computations
 Finished with computations!
 mean of flux_down is:   39.1128159    
 mean of flux_up is:   281.537994    
 -------------------------------------------------------------------------
 Attempting to save RRTMGP input/output to data/ml_training_lw_g128_CKDMIP-MMM-Big.nc                

These generated files will be used for training the NN:

- ml_training_lw_g128_Garand_BIG.nc
- ml_training_lw_g128_AMON_ssp245_ssp585_2054_2100.nc
- ml_training_lw_g128_CAMS_new_CKDMIPstyle.nc
- ml_training_lw_g128_CKDMIP-MMM-Big.nc

<hr style="height:10px;border-width:0;background-color:blue">

## Part 2 - Training the neural network

Based on: <https://github.com/peterukk/rte-rrtmgp-nn/blob/main/examples/rrtmgp-nn-training/ml_train.py> (main branch)

- See `ml_train.py` for more info.
- There is python code for training in both the `main` and `nn_dev` branches of peterukk's git repository.
- In [UKK22b] paper the branch used is `nn_dev`.

This program takes existing input-output data generated with RRTMGP and user-specified hyperparameters such as the number of neurons,  scales the data, and trains a neural network.

Currently supported predictands are g-point vectors containing: 
- LW
    - Planck fraction           (predictand = 'lw_absorption')
    - Absorption cross-section  (predictand = 'lw_planck_frac')
    - both                      (predictand = 'lw_both') (old)
- SW
    - Absorption cross-section  (predictand = 'sw_absorption')
    - Rayleigh cross-section    (predictand = 'sw_rayleigh')

**g-point** : The spatial distribution of different climate variables (temperature, humidity, wind speed, etc.) throughout the model's grid is represented by "g-point" vectors. A complete image of the climatic system under modeling is created by combining the data from each g-point vector, which represents a particular point on the grid.

In [81]:
import os, sys
from sys import getsizeof as sizeof
import numpy as np
import tensorflow as tf
from tensorflow.keras import losses, optimizers
from tensorflow.keras.utils import Sequence

Check GPU availability:

In [82]:
print(tf.config.list_physical_devices("GPU"))

[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]


Load routines contained in code files in the working directory:

In [83]:
from ml_load_save_preproc import (save_model_netcdf, load_rrtmgp,
                                  scale_outputs_wrapper,
                                  preproc_pow_standardization_reverse,
                                  preproc_tau_to_crossection,
                                  preproc_minmax_inputs_rrtmgp)

In [84]:
from ml_scaling_coefficients import xcoeffs_all, input_names_all

In [85]:
from ml_trainfuncs_keras import (create_model_mlp, expdiff, hybrid_loss_wrapper)

## Configure predictand, NN complexity, etc

Choose one of the following predictands (target output): 'lw_absorption', 'lw_planck_frac', 'sw_absorption', 'sw_rayleigh'

In [86]:
# predictand = 'sw_absorption'
# predictand = 'sw_rayleigh'
# predictand = 'lw_absorption'
# predictand = 'lw_planck_frac'
predictand = 'lw_both' # old

In [87]:
scaling_method = "Ukkonen2020"  # only option currently

For `use_existing_input_scaling_coefficients`, `True` is generally a safe choice, min max coefficients have been computed using a large dataset spanning both LGM (Last Glacial Maximum) and high future emissions scenarios. However, check that your scaled inputs fall somewhere in the 0-1 range. Negative values in particular might cause problems:

In [88]:
use_existing_input_scaling_coefficients = True

## Loss function, metrics

In [89]:
patience = 70
epochs = 200    # without early_stop_on_rfmip_fluxes

In [90]:
lossfunc = losses.MeanSquaredError
mymetrics = ["mean_absolute_error"]
expfirst = False

In [91]:
lr = 0.001    # learning rate
batch_size = 2048

## NN hyperparameters

Number of neurons in each hidden layer:

In [92]:
if predictand == 'lw_absorption':
    # neurons     = [80,80]
    # neurons     = [72,72]
    neurons     = [64,64]
    # neurons     = [58,58]
elif predictand == 'lw_planck_frac':
    neurons     = [24,24]
elif predictand == 'lw_both':
    # neurons     = [80,80]
    neurons     = [72,72]
    # neurons     = [64,64]
    # neurons     = [56,56]
else:
    # neurons     = [16,16] 
    # neurons     = [24,24] 
    neurons     = [32,32] 
    # 16 in two hidden layers seems enough for all but the LW absorption model

Activation functions used after each layer: first the input layer, and then the hidden layers:

In [93]:
activ = ["softsign", "softsign", "linear"]
# activ = ['relu', 'relu','linear']

In [94]:
if np.size(activ) != np.size(neurons) + 1:
    print("Number of activations must be number of hidden layers + 1!")

Weight initializer: the default is probably an OK choice  (glorot):

In [95]:
initializer = "glorot_uniform"

## Routine for concatenating existing datasets containing raw inputs and outputs

In [96]:
def add_dataset(fpath, predictand, expfirst, x, y, col_dry, input_names, kdist,
                data_str):
    x_new, y_new, col_dry_new, input_names_new, kdist_new = load_rrtmgp(
        fpath, predictand, expfirst=expfirst)
    if not (kdist == kdist_new):
        print("Kdist does not match previous dataset!")
        return None
    if not (input_names == input_names_new):
        print("Input_names does not match previous dataset!")
        return None
    ns = x.shape[0]
    x = np.concatenate((x, x_new), axis=0)
    y = np.concatenate((y, y_new), axis=0)
    col_dry = np.concatenate((col_dry, col_dry_new), axis=0)
    print("{:.2e} samples previously, {:.2e} after adding data from: {}".format(
        ns, x.shape[0],
        fpath.split("/")[-1]))
    data_str = data_str + " , " + fpath.split("/")[-1]
    return x, y, col_dry, data_str

## Provide data containing inputs and outputs

- Profiles used:
    - Expanded Garand
    - GCM data (AMON_...)
    - CAMS data
    - Extended CKDMIP-Average-Maximum-Minimum profiles
- RFMIP ised used for validation.

The full dataset consumes a lot of RAM/VRAM

In [97]:
DIRDATANN = DIRDATA + "7413952-ukk23-code-data/examples/rrtmgp-nn-training/data/"
fpath  = DIRDATANN + "ml_training_lw_g128_Garand_BIG.nc"  # 0.6 GB
fpath2 = DIRDATANN + "ml_training_lw_g128_AMON_ssp245_ssp585_2054_2100.nc"  # 1.7 GB
fpath3 = DIRDATANN + "ml_training_lw_g128_CAMS_new_CKDMIPstyle.nc"  # 2.6 GB
fpath4 = DIRDATANN + "ml_training_lw_g128_CKDMIP-MMM-Big.nc"  # 0.8 GB

Let's use the (expanded!) Garand profiles, GCM data (AMON_...), CAMS data, and extended CKDMIP-Mean-Maximum-Minimum profiles

In [98]:
fpaths = [fpath, fpath2, fpath3, fpath4]

In [None]:
if (predictand=='sw_absorption' or predictand=='sw_rayleigh'):
    fpaths = [sub.replace('lw_g128', 'sw_g112') for sub in fpaths]

## Load training data 

In [99]:
x_tr_raw, y_tr_raw, col_dry_tr, input_names, kdist = load_rrtmgp(
    fpaths[0], predictand, expfirst=expfirst)

input_names found in file
there are 13524 profiles in this dataset (322 experiments, 42 columns)


In [100]:
data_str = fpath.split("/")[-1]

The full training dataset is split into multiple files:

In [101]:
%%time
# We can have different datasets that we merge
for fpath in fpaths[1:]:
    x_tr_raw, y_tr_raw, col_dry_tr, data_str = add_dataset(
        fpath,
        predictand,
        expfirst,
        x_tr_raw,
        y_tr_raw,
        col_dry_tr,
        input_names,
        kdist,
        data_str,
    )

input_names found in file
there are 84000 profiles in this dataset (200 experiments, 420 columns)
5.68e+05 samples previously, 2.16e+06 after adding data from: ml_training_lw_g128_AMON_ssp245_ssp585_2054_2100.nc
input_names found in file
there are 42000 profiles in this dataset (42 experiments, 1000 columns)
2.16e+06 samples previously, 4.68e+06 after adding data from: ml_training_lw_g128_CAMS_new_CKDMIPstyle.nc
input_names found in file
there are 14094 profiles in this dataset (58 experiments, 243 columns)
4.68e+06 samples previously, 5.42e+06 after adding data from: ml_training_lw_g128_CKDMIP-MMM-Big.nc
CPU times: user 3.43 s, sys: 4.62 s, total: 8.06 s
Wall time: 8.04 s


In [102]:
nx = x_tr_raw.shape[1]  # temperature + pressure + gases
ny = y_tr_raw.shape[1]  # number of g-points

It can be a cell that consumes a lot of time and memory, depending on the dataset size:

## Input and output scaling

In [104]:
%%time
if scaling_method != "Ukkonen2020":
    print("Only one type of pre-processing currently supported!")
else:
    # Input scaling - min-max
    if use_existing_input_scaling_coefficients:
        if xcoeffs_all == None:
            sys.exit("Input scaling coefficients (xcoeffs) missing!")
        (xmin_all, xmax_all) = xcoeffs_all
        # input_names loaded from file, describes inputs in order of x_tr_raw
        # input_names_all corresponds to xmin_all and xmax_all
        # Order of inputs may be different than in the existing coefficients,
        # account for that by indexing
        a = np.array(input_names_all)
        b = np.array(input_names)
        indices = np.where(b[:, None] == a[None, :])[1]
        xmin = xmin_all[indices]
        xmax = xmax_all[indices]
        x_tr = preproc_minmax_inputs_rrtmgp(x_tr_raw, (xmin, xmax))
    else:
        x_tr, xmin, xmax = preproc_minmax_inputs_rrtmgp(x_tr_raw)

    # Output scaling
    # first, do y = y / N if y is optical depth, to get cross-sections
    # then, square root scaling y: y=y**(1/nfac); cheaper and weaker version of
    # log scaling. nfac = 8 for cross-sections, 2 for Planck fraction
    # After this, use standard-scaling (not for Planck fraction)

    y_tr, ymean, ystd = scale_outputs_wrapper(y_tr_raw, col_dry_tr, predictand)

CPU times: user 2min 2s, sys: 14.3 s, total: 2min 17s
Wall time: 1min 9s


## I/O

RRTMGP-NN models are saved as NetCDF files which contain metadata describing how to obtain the physical outputs, as well as the training data

In [105]:
x_scaling_str = (
    "To get the required NN inputs, do the following: "
    "x(i) = log(x(i)) for i=pressure; "
    "x(i) = x(i)**(1/4) for i=H2O and O3; "
    "x(i) = (x(i) - xmin(i)) / (xmax(i) - xmin(i)) for all inputs"
)
if predictand == 'lw_planck_frac':
    y_scaling_str = "Model predicts the square root of Planck fraction."        
else:
    y_scaling_str = "Model predicts scaled cross-sections. Given the raw NN output y,"\
            " do the following to obtain optical depth: "\
            "y(igpt,j) = ystd(igpt)*y(igpt,j) + ymean(igpt); y(igpt,j) "\
            "= y(igpt,j)**8; y(igpt,j) = y(igpt,j) * layer_dry_air_molecules(j)"
        
# data_str = "Extensive training data set comprising of reanalysis, climate model,"\
#     " and idealized profiles, which has then been augmented using statistical"\
#     " methods (Hypercube sampling). See https://doi.org/10.1029/2020MS002226"

if (predictand == 'sw_absorption'):
    model_str = "Shortwave model predicting ABSORPTION CROSS-SECTION"
elif (predictand == 'sw_rayleigh'):
    model_str = "Shortwave model predicting RAYLEIGH CROSS-SECTION"
elif (predictand == 'lw_absorption'):
    model_str = "Longwave model predicting ABSORPTION CROSS-SECTION"
elif (predictand == 'lw_planck_frac'):
    model_str = "Longwave model predicting PLANCK FRACTION"
else: 
    model_str = ""

If need, try to reduce memory consumption:

In [None]:
# import gc
# gc.collect()

## TensorFlow Training

Create and compile model

In [108]:
devstr = "/gpu:0"
os.environ["CUDA_VISIBLE_DEVICES"] = "0"

optim = optimizers.Adam(learning_rate=lr)

model = create_model_mlp(nx=nx,
                         ny=ny,
                         neurons=neurons,
                         activ=activ,
                         kernel_init=initializer)

model.compile(loss=lossfunc, optimizer=optim, metrics=mymetrics)

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


In [109]:
print(f"{sizeof(x_tr)/1024**3:.1f} GB")
print(f"{sizeof(y_tr)/1024**3:.1f} GB")

0.4 GB
5.2 GB


Without using DataGenerator it is not possible to run:

Ref.: https://stackoverflow.com/questions/62916904/failed-copying-input-tensor-from-cpu-to-gpu-in-order-to-run-gatherve-dst-tensor

In [110]:
class DataGenerator(Sequence):

    def __init__(self, x_set, y_set, batch_size):
        self.x, self.y = x_set, y_set
        self.batch_size = batch_size

    def __len__(self):
        return int(np.ceil(len(self.x) / float(self.batch_size)))

    def __getitem__(self, idx):
        batch_x = self.x[idx * self.batch_size:(idx + 1) * self.batch_size]
        batch_y = self.y[idx * self.batch_size:(idx + 1) * self.batch_size]
        return batch_x, batch_y


train_gen = DataGenerator(x_tr, y_tr, batch_size)

## NN training

Time consuming part:

In [111]:
%%time
with tf.device(devstr):
    history = model.fit(train_gen, epochs=epochs, shuffle=True)  # see DataGenerator above
    history = history.history

Epoch 1/200


  self._warn_if_super_not_called()
I0000 00:00:1728494299.212181   26712 service.cc:146] XLA service 0x729768005160 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
I0000 00:00:1728494299.212217   26712 service.cc:154]   StreamExecutor device (0): NVIDIA GeForce RTX 3050, Compute Capability 8.6


[1m  40/2645[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m10s[0m 4ms/step - loss: 0.5129 - mean_absolute_error: 0.5263

I0000 00:00:1728494302.204501   26712 device_compiler.h:188] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.


[1m2645/2645[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m15s[0m 4ms/step - loss: 0.1044 - mean_absolute_error: 0.1613
Epoch 2/200
[1m2645/2645[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 3ms/step - loss: 0.0126 - mean_absolute_error: 0.0531
Epoch 3/200
[1m2645/2645[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 3ms/step - loss: 0.0082 - mean_absolute_error: 0.0411
Epoch 4/200
[1m2645/2645[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 3ms/step - loss: 0.0062 - mean_absolute_error: 0.0349
Epoch 5/200
[1m2645/2645[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 3ms/step - loss: 0.0048 - mean_absolute_error: 0.0301
Epoch 6/200
[1m2645/2645[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 3ms/step - loss: 0.0040 - mean_absolute_error: 0.0264
Epoch 7/200
[1m2645/2645[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 3ms/step - loss: 0.0038 - mean_absolute_error: 0.0266
Epoch 8/200
[1m2645/2645[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m

## Parameters

In [112]:
model.summary()

## Save model

- Save model to NN model directory `../../neural/data` after training.
- File name includes loss values, so shouldn't override anything.

### Get a descriptive filename for the model

In [119]:
comm = "test01"
neurons_str = (np.array2string(np.array(neurons)).strip("[]").replace(" ", "_"))
source = kdist[12:].strip(".nc")
fpath_keras = ("../../neural/data/" + source + "_" + predictand[3:] + "_" +
               neurons_str + "_" + comm + ".keras")
fpath_netcdf = fpath_keras[:-6] + ".nc"

In [121]:
fpath_netcdf

'../../neural/data/lw-g128-210809_both_72_72_test01..nc'

Full path:

/home/x/git/radnn/ukk23test01/neural/data/lw-g128-210809_both_72_72_test01.nc

### Saving model in netCDF format

> The `save_format` argument is deprecated in Keras 3.

In [2]:
# model.save(fpath_keras, save_format="h5")    # deprecated

In [124]:
save_model_netcdf(fpath_netcdf,
                  model,
                  activ,
                  input_names,
                  kdist,
                  xmin,
                  xmax,
                  ymean,
                  ystd,
                  y_scaling_comment=y_scaling_str,
                  x_scaling_comment=x_scaling_str,
                  data_comment=data_str,
                  model_comment=model_str)

## Show dataset variables

In [5]:
PATH = "/home/x/git/radnn/ukk23test01/neural/data/"
FILE = "lw-g128-210809_both_72_72_test01.nc"
print(list(nc.Dataset(PATH+FILE).variables))

['nn_dimsize', 'nn_activation', 'nn_inputs', 'nn_input_coeffs_max', 'nn_input_coeffs_min', 'nn_activation_char', 'nn_inputs_char', 'nn_weights_1', 'nn_bias_1', 'nn_weights_2', 'nn_bias_2', 'nn_weights_3', 'nn_bias_3', 'nn_output_coeffs_mean', 'nn_output_coeffs_std']


In [8]:
FILE = "lw-g128-210809_both_56_56_HR_1.11e+00_FRC_7.57e-01.nc"
print(list(nc.Dataset(PATH+FILE).variables))

['nn_dimsize', 'nn_activation', 'nn_inputs', 'nn_input_coeffs_max', 'nn_input_coeffs_min', 'nn_activation_char', 'nn_inputs_char', 'nn_weights_1', 'nn_bias_1', 'nn_weights_2', 'nn_bias_2', 'nn_weights_3', 'nn_bias_3', 'nn_output_coeffs_mean', 'nn_output_coeffs_std']


In [6]:
FILE = "lw-g256-2018-12-04_absorption_58_58.nc"
print(list(nc.Dataset(PATH+FILE).variables))

['nn_dimsize', 'nn_activation', 'nn_inputs', 'nn_input_coeffs_max', 'nn_input_coeffs_min', 'nn_activation_char', 'nn_inputs_char', 'nn_weights_1', 'nn_bias_1', 'nn_weights_2', 'nn_bias_2', 'nn_weights_3', 'nn_bias_3', 'nn_output_coeffs_mean', 'nn_output_coeffs_std']


In [7]:
FILE = "lw-g256-2018-12-04_planck_frac_16_16.nc"
print(list(nc.Dataset(PATH+FILE).variables))

['nn_dimsize', 'nn_activation', 'nn_inputs', 'nn_input_coeffs_max', 'nn_input_coeffs_min', 'nn_activation_char', 'nn_inputs_char', 'nn_weights_1', 'nn_bias_1', 'nn_weights_2', 'nn_bias_2', 'nn_weights_3', 'nn_bias_3']


In [9]:
FILE = "lw-g128-210809_planck_frac_24_24_HR_1.15e+00_FRC_7.06e-01.nc"
print(list(nc.Dataset(PATH+FILE).variables))

['nn_dimsize', 'nn_activation', 'nn_inputs', 'nn_input_coeffs_max', 'nn_input_coeffs_min', 'nn_activation_char', 'nn_inputs_char', 'nn_weights_1', 'nn_bias_1', 'nn_weights_2', 'nn_bias_2', 'nn_weights_3', 'nn_bias_3']


## References

- [UKK24] Ukkonen, P., & Hogan, R. J. (2024). Twelve Times Faster yet Accurate: A New State-Of-The-Art in Radiation Schemes via Performance and Spectral Optimization. Journal of Advances in Modeling Earth Systems, 16(1), e2023MS003932. https://doi.org/10.1029/2023MS003932
- [UKK23b] Ukkonen, P., & Hogan, R. J. (2023). Implementation of a machine-learned gas optics parameterization in the ECMWF Integrated Forecasting System: RRTMGP-NN 2.0. Geoscientific Model Development, 16(11), 3241–3261. https://doi.org/10.5194/gmd-16-3241-2023
- [UKK23a] Ukkonen, P., & Hogan, R. J. (2023). Fast computation of cloud 3D radiative effects in dynamical models by optimizing the ecRad scheme [Preprint]. Preprints. https://doi.org/10.22541/essoar.168298700.07329865/v1
- [UKK22a] Ukkonen, P. (2022). Improving the trade-off between accuracy and efficiency of atmospheric radiative transfer computations by using machine learning and code optimization. http://dx.doi.org/10.13140/RG.2.2.27880.03846
- [UKK22b] Ukkonen, P. (2022). Exploring Pathways to More Accurate Machine Learning Emulation of Atmospheric Radiative Transfer. Journal of Advances in Modeling Earth Systems, 14(4), e2021MS002875. https://doi.org/10.1029/2021MS002875
- [UKK20] Ukkonen, P., Pincus, R., Hogan, R. J., Pagh Nielsen, K., & Kaas, E. (2020). Accelerating Radiation Computations for Dynamical Models With Targeted Machine Learning and Code Optimization. Journal of Advances in Modeling Earth Systems, 12(12), e2020MS002226. https://doi.org/10.1029/2020MS002226am
- [YAO23] Yao, Y., Zhong, X., Zheng, Y., & Wang, Z. (2023). A Physics-Incorporated Deep Learning Framework for Parameterization of Atmospheric Radiative Transfer. Journal of Advances in Modeling Earth Systems, 15(5), e2022MS003445. https://doi.org/10.1029/2022MS003445

## Save environment

Saves the environment for documentation purposes

In [4]:
os.chdir(DIRBASE)

In [5]:
%%bash
source ${HOME}/conda/bin/activate tf2
conda export --file aux/ukk23test01-train-v2.yml

Data

In [6]:
%%bash
ls -1 ukk23test01/neural/data/ > aux/neur_data.txt
ls -1 ukk23test01/rrtmgp/data/ > aux/rrtm_data.txt
ls -1 ukk23test01/examples/rfmip-clear-sky/data/ > aux/exam_rfmi_data.txt
ls -1 ukk23test01/examples/rfmip-clear-sky/output_fluxes/ > aux/exam_rfmi_flux.txt
ls -1 ukk23test01/examples/rrtmgp-nn-training/data/ > aux/exam_rrtm_data.txt
ls -1 ukk23test01/examples/rrtmgp-nn-training/inputs_to_RRTMGP/ > aux/exam_rrtm_rrtm.txt