# Reaction SMILES Augmentation Using Permutations Example

#### This example shows how to augment a Reaction SMILES dataset using SMILESAugmentation (Permutations).

In [3]:
import pandas as pd

from smiles_augmentation.reaction_smiles_enumerator import PermutationsReactionSmilesEnumerator

**Load the reaction SMILES data:**

In [2]:
reaction_smiles = pd.read_csv('data/reactions.csv').reaction_SMILES.values
reaction_smiles

array(['CC(C)C[Mg+].CON(C)C(=O)c1ccc(O)nc1>>CC(C)CC(=O)c1ccc(O)nc1',
       'CN.O=C(O)c1ccc(Cl)c([N+](=O)[O-])c1>>CNc1ccc(C(=O)O)cc1[N+](=O)[O-]',
       'CCn1cc(C(=O)O)c(=O)c2cc(F)c(-c3ccc(N)cc3)cc21.O=CO>>CCn1cc(C(=O)O)c(=O)c2cc(F)c(-c3ccc(NC=O)cc3)cc21',
       'COCC(C)Oc1cc(Oc2cnc(C(=O)N3CCC3)cn2)cc(C(=O)O)c1.Cc1cnc(N)cn1>>COCC(C)Oc1cc(Oc2cnc(C(=O)N3CCC3)cn2)cc(C(=O)Nc2cnc(C)cn2)c1',
       'Clc1cc2c(Cl)nc(-c3ccncc3)nc2s1.NCc1ccc(Cl)c(Cl)c1>>Clc1cc2c(NCc3ccc(Cl)c(Cl)c3)nc(-c3ccncc3)nc2s1',
       'Cc1c(Cl)nnc(C(C#N)c2ccc(F)c(C#N)c2)c1C>>Cc1c(Cl)nnc(Cc2ccc(F)c(C#N)c2)c1C',
       'CC(N)c1ccc(F)c(Cl)c1.O=C(N1CCc2ccc(Cl)c(OS(=O)(=O)C(F)(F)F)c2CC1)C(F)(F)F>>CC(Nc1c(Cl)ccc2c1CCN(C(=O)C(F)(F)F)CC2)c1ccc(F)c(Cl)c1',
       'CC(C)N1CCN(C(=O)c2ccc3[nH]c(C(=O)N4CCN(S(C)(=O)=O)CC4)cc3c2)CC1.CCOC(=O)N1CCNCC1>>CCOC(=O)N1CCN(C(=O)c2cc3cc(C(=O)N4CCN(C(C)C)CC4)ccc3[nH]2)CC1',
       'CC(C(=O)O)C(=O)NCc1ccc(F)cc1.CN1C(=O)C(N)c2ccccc2-c2ccccc21>>CC(C(=O)NCc1ccc(F)cc1)C(=O)NC1C(=O)N(C)c2ccccc2-c2cccc

**Create an PermutationsReactionSmilesEnumerator object and enumerate the reaction SMILES by calling the enumerate method:**

You can define if you want to keep or remove duplicates, define a seed for reproducibility, the number of jobs to run in parallel, the level of verbosity and the maximum number of SMILES to enumerate.

It permutates both reactants and products.

In [4]:
enumerator = PermutationsReactionSmilesEnumerator(reaction_smiles=reaction_smiles, remove_duplicates=True, seed=123, n_jobs=1, verbose=0)
enumerated_reaction_smiles = enumerator.enumerate(n_max=10)
enumerated_reaction_smiles

[['CON(C)C(=O)c1ccc(O)nc1.CC(C)C[Mg+]>>CC(C)CC(=O)c1ccc(O)nc1',
  'CC(C)C[Mg+].CON(C)C(=O)c1ccc(O)nc1>>CC(C)CC(=O)c1ccc(O)nc1'],
 ['CN.O=C(O)c1ccc(Cl)c([N+](=O)[O-])c1>>CNc1ccc(C(=O)O)cc1[N+](=O)[O-]',
  'O=C(O)c1ccc(Cl)c([N+](=O)[O-])c1.CN>>CNc1ccc(C(=O)O)cc1[N+](=O)[O-]'],
 ['O=CO.CCn1cc(C(=O)O)c(=O)c2cc(F)c(-c3ccc(N)cc3)cc21>>CCn1cc(C(=O)O)c(=O)c2cc(F)c(-c3ccc(NC=O)cc3)cc21',
  'CCn1cc(C(=O)O)c(=O)c2cc(F)c(-c3ccc(N)cc3)cc21.O=CO>>CCn1cc(C(=O)O)c(=O)c2cc(F)c(-c3ccc(NC=O)cc3)cc21'],
 ['COCC(C)Oc1cc(Oc2cnc(C(=O)N3CCC3)cn2)cc(C(=O)O)c1.Cc1cnc(N)cn1>>COCC(C)Oc1cc(Oc2cnc(C(=O)N3CCC3)cn2)cc(C(=O)Nc2cnc(C)cn2)c1',
  'Cc1cnc(N)cn1.COCC(C)Oc1cc(Oc2cnc(C(=O)N3CCC3)cn2)cc(C(=O)O)c1>>COCC(C)Oc1cc(Oc2cnc(C(=O)N3CCC3)cn2)cc(C(=O)Nc2cnc(C)cn2)c1'],
 ['NCc1ccc(Cl)c(Cl)c1.Clc1cc2c(Cl)nc(-c3ccncc3)nc2s1>>Clc1cc2c(NCc3ccc(Cl)c(Cl)c3)nc(-c3ccncc3)nc2s1',
  'Clc1cc2c(Cl)nc(-c3ccncc3)nc2s1.NCc1ccc(Cl)c(Cl)c1>>Clc1cc2c(NCc3ccc(Cl)c(Cl)c3)nc(-c3ccncc3)nc2s1'],
 ['Cc1c(Cl)nnc(C(C#N)c2ccc(F)c(C#N)c2)c1C>>Cc1c

**Let’s see the enumerated reaction SMILES for the first reaction:**

In [6]:
original_reaction_smiles = reaction_smiles[0]
print(f"Original reaction SMILES: {original_reaction_smiles}")

new_enumerated_reaction_smiles = enumerated_reaction_smiles[0]
print(f"New enumerated SMILES: {new_enumerated_reaction_smiles}")

Original reaction SMILES: CC(C)C[Mg+].CON(C)C(=O)c1ccc(O)nc1>>CC(C)CC(=O)c1ccc(O)nc1
New enumerated SMILES: ['CON(C)C(=O)c1ccc(O)nc1.CC(C)C[Mg+]>>CC(C)CC(=O)c1ccc(O)nc1', 'CC(C)C[Mg+].CON(C)C(=O)c1ccc(O)nc1>>CC(C)CC(=O)c1ccc(O)nc1']
