# Bulk Reaction Product SMILES Generator
This notebook generates standard SMILES strings for the reaction between two reactants specified by a SMARTS reaction. All possible combinations of each reactant from the two columns are considered when ouputting the products (the Cartesian product of the two columns is taken). The program's input is a two column CSV file of exact reactant names with the format detailed in the `Data import` section. Contact Prajit Rajkumar (prajkumar@ucsd.edu) for any questions or if any issues occur.

In [None]:
%%capture
# @title Package installation & imports
# @markdown Install and import RDKit and the SMILES retriever module.
# @markdown This may take a couple seconds.
!pip install rdkit
!curl -o structureretriever.py https://raw.githubusercontent.com/prajitrr/acyl-amides-project7/main/modules/structureretriever.py

from rdkit import Chem
from rdkit.Chem import rdChemReactions
from rdkit.Chem import Draw
from rdkit.Chem import AllChem

from structureretriever import retriever

import numpy as np
import pandas as pd

import ast
from IPython.display import display
from itertools import product

In [None]:
%%capture
# @title Data import
# @markdown Upload a CSV file with the exact reactant names, in the format show [here](https://docs.google.com/spreadsheets/d/1AmyMTBuwYbVkzffIpXCZKIWxRVZG4h6jf0L03YnF238/edit?usp=sharing).
# @markdown Within a column, order reactants in the order that they are desired to be outputted.
# @markdown Specify non-abbreviated reactant names that preferably conform to IUPAC standards. Use this [link](https://cactus.nci.nih.gov/chemical/structure) to verify any reactant names whose validity seems to be uncertain.
# @markdown The results will be outputted in an order that first goes down the
# @markdown first column, then followed by the second.
# @markdown > For example, an input
# @markdown with REACTANT_1: {X, Y} and REACTANT_2: {A, B} will produce
# @markdown PRODUCT: {XA, YA, XB, YB}
# @markdown
# @markdown For the reaction between acid chlorides and amines, either the prefix of the acid chloride (without the phrase "chloride") or the full name may be used.
# @markdown Uploading can be done by downloading the sheet with the reactants from Google Sheets or Excel as a CSV, going to the Files tab of Google Colab, and right clicking and uploading the CSV.
# @markdown Make sure that the CSV is NOT uploaded under a folder in Google Colab. Then, specify the exact full name of the file in the field below, which should include the .csv extension.

Filename = "" # @param {type:"string"}
reactant_input = pd.read_csv(Filename)


In [None]:
# @title Reaction SMARTS
# @markdown The general reaction between the two reactants can be specified as a SMARTS string here.
# @markdown The names of several reactions are provided in the dropdown field if desired, with reactants in the order present.
# @markdown Custom reactions must be inputted using the exact reaction SMARTS, using RDKit's rules, which can be found [here](https://www.rdkit.org/docs/source/rdkit.Chem.rdChemReactions.html).
# @markdown Note that the reactants in the SMARTS reaction must match the order of reactants in the input CSV.
# @markdown If no reaction is specified, the default will be used, an amide bond formation between an acid chloride and an amine specifically, with reactants in that order. The image of the reaction will be printed for visual user verification.
# @markdown
# @markdown **Note**: If a reaction from the dropdown menu appears to be invalid, clear the output field and select the reaction again.
Reaction = "" # @param ["Amide formation between ACID CHLORIDE and AMINE", "Amide (peptide bond) formation between CARBOXYLIC ACID and AMINE"] {allow-input: true}

Reaction = Reaction.replace('\n','')

if Reaction == "":
    rxn_smarts = "[O:1]=[C:2][Cl:3].[N:4]>>[O:1]=[C:2][N:4].[Cl:3]"
elif Reaction == "Amide (peptide bond) formation between CARBOXYLIC ACID and AMINE":
    rxn_smarts = "[O:1]=[C:2][O:3].[N:4]>>[O:1]=[C:2][N:4].[O:3]"
elif Reaction == "Amide formation between ACID CHLORIDE and AMINE":
    rxn_smarts = "[O:1]=[C:2][Cl:3].[N:4]>>[O:1]=[C:2][N:4].[Cl:3]"
else:
    rxn_smarts = Reaction

try:
    reaction_eqn = rdChemReactions.ReactionFromSmarts(rxn_smarts)
    reaction_img = Draw.ReactionToImage(reaction_eqn)
    display(reaction_img)
except ValueError as error:
    print("Invalid reaction. See next line for RDKit's reason for error.")
    print(error)


In [None]:
# @title Process Reactants and Run Reaction
# @markdown This step runs all the code necessary to process the inputted
# @markdown reactants and run the reaction to produce the product SMILES.
# @markdown If an error within the reaction running step occurs, the resulting
# @markdown output will have the string `ERROR` for the responsible entry in
# @markdown the `PRODUCT_SMILES` column. If a SMILES parse error is printed
#@markdown below the cell, check the names in the input and make sure to expand
#@markdown out any abbreviations (e.g. GABA -> Gamma aminobuytric acid).
#@markdown Use the [link](https://cactus.nci.nih.gov/chemical/structure) from
#@markdown the data import step to make sure the compound names are valid.
#@markdown If other errors occur during the
# @markdown process, they will be printed below the cell and the inputs should
# @markdown be verified before rerunning.

def apply_reaction(reactant_1, reactant_2):
    try:
        return Chem.MolToSmiles(reaction_eqn.RunReactants((Chem.MolFromSmiles(reactant_1), Chem.MolFromSmiles(reactant_2)))[0][0])
    except:
        return "ERROR"


reactant_set_1 = list(reactant_input["REACTANT_1"].dropna())
reactant_set_2 = list(reactant_input["REACTANT_2"].dropna())

if rxn_smarts == "[O:1]=[C:2][Cl:3].[N:4]>>[O:1]=[C:2][N:4].[Cl:3]":
    for i in range(len(reactant_set_1)):
        if "chloride" not in reactant_set_1[i]:
            reactant_set_1[i] = reactant_set_1[i] + " chloride"

reactant_smiles_1 = list(retriever(reactant_set_1)["SMILES"])
reactant_smiles_2 = list(retriever(reactant_set_2)["SMILES"])

df = pd.DataFrame(list(product(reactant_set_1, reactant_set_2)),
                  columns=['REACTANT_1', 'REACTANT_2']).dropna()
df2 = pd.DataFrame(list(product(reactant_smiles_1, reactant_smiles_2)),
                   columns=['SMILES_1', 'SMILES_2']).dropna()
output_df = pd.concat([df, df2], axis=1, ignore_index=True)
output_df = output_df.rename(columns={0:'REACTANT_1', 1:'REACTANT_2', 2:'SMILES_1', 3:'SMILES_2'})

output_df["PRODUCT_SMILES"] = output_df.apply(lambda x: apply_reaction(x['SMILES_1'], x['SMILES_2']), axis=1)

In [None]:
# @title View and Download Results
# @markdown If the previous step was successful, this cell can be run to view
# @markdown the results and download them as a CSV file, which can be uploaded
# @markdown as a sheet to a spreadsheet application. The filename of the output
# @markdown can be specified below. Once the cell runs, the output file can be
# @markdown downloaded from the files tab of Google Colab.
# @markdown Right click the files tab and click `Refresh` in order for the
# @markdown output file to appear.

Output_Filename = "SMILES_product_output.csv" # @param {type:"string"}

output_df.to_csv(r""+Output_Filename, index=False)

output_df