# Example 1 - MOF Organic Linker Design for CO2 Adsorption

In [1]:
import warnings
warnings.filterwarnings('ignore')
import sys
import dziner
import os
from langchain._api import LangChainDeprecationWarning
warnings.simplefilter("ignore", category=LangChainDeprecationWarning)

os.environ["OPENAI_API_KEY"] = "YOUR_OPENAI_API_KEY"
os.environ["ANTHROPIC_API_KEY"] = 'YOUR_ANTHROPIC_API_KEY'

## Defining Customized Agent Tools

### CO2 asorption inference and chemical feasibility assessment

We first define two tools: One for predicting CO2 adsorption, given the SMILES of the MOF's organic linker. This is our surrogate model for predicting the adsorption capacity (mol/kg). The second tool helps with predicting the chemical feasibility of the molecule, estimating the synthetic accessibility score and validity of the SMILES, based on RDKit sanitization.

In [2]:
from dziner.tools import mol_chemical_feasibility

from langchain.agents import tool
from dziner.surrogates.MOFormer.CO2_adsorption import predict_CO2_adsorption_with_uncertainty
import torch
import numpy as np

@tool
def predict_CO2_adsoprtion(smiles):
    '''
    This tool predicts the mean CO2 Adsorption and the epistemic uncertainty in predictions for an input SMILES.
    This model is based on this paper: https://pubs.acs.org/doi/10.1021/jacs.2c11420 fine-tuned on the hmof dataset.
    '''
    smiles = smiles.replace("\n", "")
    mean_prediction, std_prediction = predict_CO2_adsorption_with_uncertainty(smiles)    
    return np.round(mean_prediction, 3), np.round(std_prediction, 3)

print(f'Check chemical feasibility tool:\n{mol_chemical_feasibility.description}\n')
print(f'Predict CO2 adsoprtion tool:\n{predict_CO2_adsoprtion.description}')

Check chemical feasibility tool:
This tool inputs a SMILES of a molecule and outputs chemical feasibility, Synthetic Accessibility (SA) scores,
  molecular weight, and the SMILES. SA the ease of synthesis of compounds according to their synthetic complexity
    which combines starting materials information and structural complexity. Lower SA, means better synthesiability.

Predict CO2 adsoprtion tool:
This tool predicts the mean CO2 Adsorption and the epistemic uncertainty in predictions for an input SMILES.
This model is based on this paper: https://pubs.acs.org/doi/10.1021/jacs.2c11420 fine-tuned on the hmof dataset.


### Retrieving domain-knowledge from scientific literature

We can either include our own design constraints and guidelines or leverage the exisiting knowledge from the scientific literature. Guidelines include information of possible modulations of the core structure of the molecule or addition of elements and functional groups that can help with optimizing the property of interest.

In [3]:
from langchain.vectorstores import FAISS
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.document_loaders import PyPDFLoader
from langchain_openai import ChatOpenAI
Embedding_model = 'text-embedding-3-large' 

from langchain.text_splitter import CharacterTextSplitter

RetrievalQA_prompt = """What are the design guidelines for the making the organic linker adsorb CO2 better? 
    This can be based on making changes to the elements or the functional groups or other changes to the molecule.
    Summarize your answer and cite paper by the title, DOI and the journal and year the document was published on."""

@tool
def domain_knowledge(prompt):
    '''Useful for getting chemical intuition for the problem.
    This tool looks up design guidelines for organic linkers with higher CO2 adsorption by looking through research papers.
    It also includes information on the paper citation or DOI.
    '''
    guide_lines = []
    for m in range(4):
        text_splitter = CharacterTextSplitter(
            chunk_size=1000, chunk_overlap=50)
        paper_file =f'../data/papers/CO2_adsorption/{m}.pdf'
        pages = PyPDFLoader(paper_file).load_and_split()
        sliced_pages = text_splitter.split_documents(pages)
        faiss_vectorstore = FAISS.from_documents(sliced_pages, OpenAIEmbeddings(model=Embedding_model))
        
        llm=ChatOpenAI(
                        model_name='gpt-4o',
                        temperature=0.3,
                        )
        g = dziner.RetrievalQABypassTokenLimit(faiss_vectorstore, RetrievalQA_prompt, llm)
        guide_lines.append(g)
    return " ".join(guide_lines)

## Defining dZiner agent

We use Claude Sonnet 3.5 to drive our agent.

In [4]:
from dziner.agents import dZiner
from langchain_anthropic import ChatAnthropic

agent_model = ChatAnthropic(model="claude-3-5-sonnet-20240620", api_key=os.environ["ANTHROPIC_API_KEY"],
                      temperature=0.3, max_tokens=8192)

tools = [predict_CO2_adsoprtion, domain_knowledge, mol_chemical_feasibility]
tool_names = [tool.name for tool in tools]  
tool_desc = [tool.description for tool in tools]
agent = dZiner(tools, property="CO2 Adsoprtion",
               model=agent_model, verbose=True, n_design_iterations=5, temp=0.1,get_cost=True).agent

In [5]:
initial_organic_linker =  "Nc1cc(C(=O)[O-])c(cc1c1ccc(cc1)c1c(N)cc(c(c1N)N)C(=O)[O-])N(C)C#N" 
Human_prompt = f"Make changes to the organic linker {initial_organic_linker} to increase its CO2 adsroption. Suggest 20 new candidates."

input_data = {
            "input": Human_prompt,
            "prompt": RetrievalQA_prompt,
            "tools": tools,
            "tool_names": tool_names,
            "tool_desc": tool_desc
        }

In [6]:
response = agent.invoke(input_data)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: To begin, I need to look up design guidelines for organic linkers with higher CO2 adsorption. Then, I'll evaluate the initial molecule and start making changes based on the guidelines.

Action: domain_knowledge
Action Input: design guidelines for organic linkers with higher CO2 adsorption
[0m
Observation: [33;1m[1;3mTo design organic linkers that adsorb CO2 more effectively, several guidelines can be followed:

1. **Functional Group Modification**: Introducing different functional groups such as -OSO3H, -NO2, -OH, -NH2, and other polar groups like -N=C(R)-, -N=N- can significantly enhance CO2 adsorption. These groups improve the interaction between CO2 and the organic linkers, thereby increasing the selectivity and stability of the frameworks.

2. **Pore Size and Shape Optimization**: Adjusting the pore size and shape of the MOFs can enhance CO2 capture by providing more adsorption sites and promoting effective sc

[22:20:26] Can't kekulize mol.  Unkekulized atoms: 30 31 32 33 34


[32;1m[1;3mThought: The triazole modification resulted in an invalid SMILES. I'll revert to the previous valid structure with the best CO2 adsorption and try a different modification.

3. Let's try adding a nitro group (-NO2) to enhance CO2 interactions, as suggested in the design guidelines.

Action: predict_CO2_adsoprtion
Action Input: Nc1cc(C(=O)[O-])c(cc1c1ccc(cc1)c1c(N)cc(c(c1N)N)C(=O)[O-])N(C)C(=O)[N+](=O)[O-]
[0m
Observation: [36;1m[1;3m(1.136, 0.142)[0m
Thought:[32;1m[1;3mThought: The addition of a nitro group has significantly increased the predicted CO2 adsorption to 1.136, which is a substantial improvement from both the initial molecule and our previous modification. Let's check the chemical feasibility of this new molecule.

Action: mol_chemical_feasibility
Action Input: Nc1cc(C(=O)[O-])c(cc1c1ccc(cc1)c1c(N)cc(c(c1N)N)C(=O)[O-])N(C)C(=O)[N+](=O)[O-]
[0m
Observation: [38;5;200m[1;3m('Valid SMILES', 3.758868901499005, 478.4210000000003, 'Nc1cc(C(=O)[O-])c(cc1c1ccc