### Introduction

The goal of this notebook is to give you a quick sense about graph neural networks for reactivity prediction. A GNN operates directly on reaction SMILES strings, and thus requires no (accurate) geometries as input. This has both upsides and downsides: on one hand, representing interacting molecules as graphs introduces an inherent loss of accuracy, as all stereochemical subtleties are lost; on the other hand, having the possibility to skip conformer generation etc. makes it possible to screen through millions of reaction possibilities at marginal computational cost.

We will focus here on a model that has been trained to predict activation and reaction energies of dipolar cycloaddition reactions. Below, an overview of the model architecture can be found.

Note that another environment needs to be used to run this Notebook; the environment.yml file is included in the **multitask_QM_GNN** folder.

<div align="center">
    <img src="data/chem202300387-fig-0003-m.jpg" width="500">
</div>

In [1]:
import os

os.chdir('multitask_QM_GNN')

!python reactivity.py --data_path datasets/iteration0_data.csv --atom_desc_path descriptors/atom_desc_iteration0_wln.pkl --reaction_desc_path descriptors/reaction_desc_iteration0_wln.pkl --depth 2 --ini_lr 0.00165 --lr_ratio 0.93 --w_atom 0.5 --w_reaction 0.3 --hidden_size_multiplier 0 --depth_mol_ffn 1 --random_state 0 --ensemble_size 10 --splits 0 5 95 --model_dir model_iteration0 -p

In [2]:
!python reactivity.py --data_path datasets/iteration0_data.csv --atom_desc_path descriptors/atom_desc_iteration0_wln.pkl --reaction_desc_path descriptors/reaction_desc_iteration0_wln.pkl --depth 2 --ini_lr 0.00165 --lr_ratio 0.93 --w_atom 0.5 --w_reaction 0.3 --hidden_size_multiplier 0 --depth_mol_ffn 1 --random_state 0 --ensemble_size 10 --splits 0 5 95 --model_dir model_iteration0 -p

2025-06-08 14:39:49,566 - model_iteration0 - INFO - The considered atom-level descriptors are: ['partial_charge', 'fukui_elec', 'fukui_neu', 'nmr']
postprocessing atom-wise scaling
100%|█████████████████████████████████████| 3293/3293 [00:01<00:00, 2423.52it/s]
2025-06-08 14:39:53,722 - model_iteration0 - INFO - The considered reaction descriptors are: ['G', 'G_alt1', 'G_alt2']
2025-06-08 14:39:54.606705: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)
2025-06-08 14:39:54.608086: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz
Model: "wln_regressor"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
wln__layer (WLN_Layer)       multiple                  15000     
_________________________________________________________________
dense_1 (Dense)              multiple                  