<a href="https://colab.research.google.com/github/schwallergroup/ai4chem_course/blob/main/notebooks/09%20-%20Reaction%20properties/01_atom_mapping.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Week 9: Reaction properties

Reaction property prediction is a crucial task, as it enables us to better understand chemical reactions and their outcomes. This not only contributes to the development of new chemical compounds and materials but also helps in streamlining the reaction process and reducing the time and resources required for experimentation.

Two significant aspects of reaction property prediction are `yield prediction` and `atom mapping`. Yield prediction refers to forecasting the amount of product generated by a particular chemical reaction. **Accurate yield prediction can help in optimizing reaction conditions, minimizing waste, and identifying the most efficient synthetic routes for a target molecule.** 

Atom mapping, on the other hand, is the process of determining the correspondence between atoms in the reactants and products of a chemical reaction. This information is essential for understanding the mechanism of the reaction and tracking the transformation of individual atoms during the reaction process. **Atom mapping plays a vital role in various applications, including reaction database management, reaction classification, and the development of reaction templates for computer-aided synthesis planning.**

# Atom mapping

In this notebook, we will explore some applications of atom mapping as well as one of the tools that exist for the calculation of this property.

In [None]:
! pip install rdkit rdchiral
! mkdir data/
! curl -L https://www.dropbox.com/sh/6ideflxcakrak10/AADN-TNZnuGjvwZYiLk7zvwra/schneider50k -o data/uspto50k.zip
! unzip data/uspto50k.zip -d data/

# 0. Relevant packages 

## RXNMapper

RXNMapper is a deep learning tool for calculating the atom mapping for any reaction. This open-source tool uses the attention weights produced by a pretrained transformer model, and shows remarkable performance on atom mapping, compared to other tools available. This is an excellent example of the posible uses of unsupervised learning in chemistry. See more details [here](https://www.science.org/doi/10.1126/sciadv.abe4166).

## RDChiral

RDChiral is a wrapper for RDKit's functionalities for reaction handling, that improves stereochemistry handling. This package will allow us to extract `reaction templates` from a reaction dataset, which are a standard way of encoding **transformation rules**.

RDChiral then also lets us apply the `reaction template` to a target molecule, to discover the reactants that will afford the target molecule under the given transformation.

Learn more from [the code](https://github.com/connorcoley/rdchiral) and [the paper](https://pubs.acs.org/doi/10.1021/acs.jcim.9b00286).

# 1. Obtaining the atom mapping

To obtain the atom mapping of a reaction, you can go to [this site](http://rxnmapper.ai/demo.html) and paste your reaction SMILES. The application will then show you the mapped reaction smiles, as well as some visualization options, including:

- The atom mapping of the reaction: which atoms in the reactants correspond to each atom in the products.

- The attention maps: What the underlying model is computing, that is the conection between each pair of tokens.


![image.png](rxnmapper.png)


## NOTE: This model is also accessible through a programming interface. For this, follow the instructions [here](https://github.com/rxn4chemistry/rxnmapper).

# TODO:

- [ ] Get a reaction and mapped rxn
- [ ] Get dataset of molecules
- [ ] Obtain possible reactant sets
- [ ] Get all reactant sets that can react like this, visualize

In [20]:
# Let's take the reactant molecules from the test set of USPTO-50k

import pandas as pd
from itertools import chain
import re
from utils import *

df = pd.read_csv('data/raw_test.csv').iloc[:,2].rename('reactants')

    
def remove_atom_mapping(smiles):
    mol = re.sub(r"(?<=[^\*])(:\d+)]", "]", smiles)
    return canonicalize_smiles(mol)

molecs = (
    df
    .apply(lambda x: x.split('>>')[0])
    .apply(remove_atom_mapping)
    .str.split('.')
    .values
)
molecs = set(chain(*molecs))
    
len(molecs)

6907

In [22]:
from itertools import product

prod = product(molecs, molecs)

In [24]:
# Let's start iterating
for i,rxn in enumerate(prod):
    print(rxn)
    if i==10:
        break

('C[Si](C)(C)C#Cc1cc2ncnc(Cl)c2s1', 'Nc1c(C(O)(C(F)(F)F)C(F)(F)F)ccc2ccccc12')
('C[Si](C)(C)C#Cc1cc2ncnc(Cl)c2s1', 'CCCN(CCC)CCN')
('C[Si](C)(C)C#Cc1cc2ncnc(Cl)c2s1', 'CCCCCC(CC(=O)Nc1cc(C(N)=O)ccc1C(C)(C)C)c1ccc(OC)cc1OC')
('C[Si](C)(C)C#Cc1cc2ncnc(Cl)c2s1', 'COC1=C(OC)C(=O)C(Cc2ccc(C(=O)O)c(-c3cccnc3)c2)=C(C)C1=O')
('C[Si](C)(C)C#Cc1cc2ncnc(Cl)c2s1', 'N#CC1(N(Cc2ccccc2)Cc2ccccc2)CCOC1')
('C[Si](C)(C)C#Cc1cc2ncnc(Cl)c2s1', 'COC(=O)C/C=C/c1ccc(Nc2ncccn2)cc1')
('C[Si](C)(C)C#Cc1cc2ncnc(Cl)c2s1', 'NCc1cccc(Cl)c1')
('C[Si](C)(C)C#Cc1cc2ncnc(Cl)c2s1', 'COc1ccc(CCl)cc1')
('C[Si](C)(C)C#Cc1cc2ncnc(Cl)c2s1', 'NC(=S)c1cc(O)c2sccc2c1')
('C[Si](C)(C)C#Cc1cc2ncnc(Cl)c2s1', 'COc1ccc(C(=O)O)cc1[N+](=O)[O-]')
('C[Si](C)(C)C#Cc1cc2ncnc(Cl)c2s1', 'Cc1csc(N)n1')


In [None]:
from utils import load_data, visualize_chemical_reaction

train_df, val_df, test_df = load_data()

# 1. Reaction templates

Let's take as an example the following coupling reaction.


In [None]:
rxn_example = train_df.iloc[5,0]

visualize_chemical_reaction(rxn_example)

### To extract the reaction template, use the `extract_template` function from utils.py 

A reaction template describes a general transformation of some type. It describes what bonds form and break in a transformation, as well as the chemical environment of these bonds.

In [None]:
from utils import extract_template

tplt_example = extract_template(rxn_example)

# A reaction template looks like this
print(tplt_example)

### Now we can use this reaction template. Use the `apply_template` function from utils.py

If we use it on the same product, we should get the same reactants as above.

In [None]:
# Apply the extracted template to the product above.
from utils import apply_template, visualize_mols

prod_1 = rxn_example.split('>>')[1]
pred_reactants = apply_template(tplt_example, prod_1)

# This is the result of applying the template.
visualize_mols(pred_reactants[0])