# Chaining LLMs

<center><img src="tweet_langchain.jpg" width=400 display="block"></center>


# So what is langchain?

#### Langchain is a framework for developing applications powered by language models.


It aims to:

- Be data-aware: connect a language model to other sources of data

- Be agentic: allow a language model to interact with its environment

In [1]:
!pip install langchain openai wikipedia rdkit

OPENAI_API_KEY = ""

Collecting langchain
  Downloading langchain-0.0.139-py3-none-any.whl (530 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m530.5/530.5 kB[0m [31m10.2 MB/s[0m eta [36m0:00:00[0m00:01[0m
[?25hCollecting openai
  Using cached openai-0.27.4-py3-none-any.whl (70 kB)
Collecting wikipedia
  Using cached wikipedia-1.4.0.tar.gz (27 kB)
  Preparing metadata (setup.py) ... [?25ldone
[?25hCollecting rdkit
  Downloading rdkit-2022.9.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (29.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m29.4/29.4 MB[0m [31m60.8 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hCollecting tenacity<9.0.0,>=8.1.0
  Using cached tenacity-8.2.2-py3-none-any.whl (24 kB)
Collecting dataclasses-json<0.6.0,>=0.5.7
  Using cached dataclasses_json-0.5.7-py3-none-any.whl (25 kB)
Collecting gptcache>=0.1.7
  Downloading gptcache-0.1.10-py3-none-any.whl (44 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0

In [2]:
import os
from utils import *
from langchain import (
    PromptTemplate,
    OpenAI,
    chat_models,
    chains
)

# Prompt Templates

In [3]:
# First, define a prompt template
prompt_template = PromptTemplate(
    input_variables = ["smiles"],
    template = (
        "What functional groups are present in this molecule? {smiles}. "
        "This molecule contains the following functional groups:"
    )
)

In [4]:
smiles = "c1(Br)ccc(CC(=O)Cl)cc1"

cdk(smiles)
prompt_template.format(smiles=smiles)

'What functional groups are present in this molecule? c1(Br)ccc(CC(=O)Cl)cc1. This molecule contains the following functional groups:'

# Language Models

In [5]:
# Let's now define a language model.

llm = OpenAI(
    model_name="text-davinci-003",
    temperature=0.03,
    openai_api_key = OPENAI_API_KEY,
)

print(llm.generate(prompts=[
    "LIAC is a research group and we're doing a tutorial. Your task is to say hi to LIAC!"
]).generations[0][0].text)



Hi LIAC! It's nice to meet you!


# Putting it together: Chains

In [6]:
func_groups = chains.LLMChain(
    prompt = prompt_template,
    llm = llm
)

cdk(smiles)
print(func_groups.run(smiles))



- Bromine (Br)
- Carbonyl (C=O)
- Chlorine (Cl)


In [7]:
smiles_2 = "c1ccc(-c2cncc(C3CCCCC3)c2)cc1"
cdk(smiles_2)
print(func_groups.run(smiles_2))



-Aromatic ring (c1ccc)
-Alkyne (-c2cncc)
-Alcohol (C3CCCCC3)


# Improving prompting: Formatting, in-context learning, etc.

In [8]:
prompt_template_2 = PromptTemplate(
    input_variables = ["smiles"],
    template = (
        "You are an expert chemist and your task is to identify the functional groups of the given molecules."\
        "You should give the name and the SMILES of each functional group. Begin!"\
        "Input: Brc1cncc(C2CCCCC2)c1"\
        "Output: 1. Halogen (Br)\n 2. Pyridine (c1cnccc1)\n 3. Cyclohexane (C2CCCCC2)"\
        "Input: c1ccccc1CC(=O)Cl"\
        "Output: 1. Phenyl (c1ccccc1)\n 2. Carbonyl (C=O)\n 3. Acyl halide (C(=O)Cl)\n 4. Halogen (Cl)"\
        "Input: {smiles}"\
        "Output:"
    )
)

func_groups = chains.LLMChain(
    prompt = prompt_template_2,
    llm = llm
)
def fcs(smiles):
    cdk(smiles)
    print(func_groups.run(smiles))

In [9]:
smiles_list = [
    smiles,
    smiles_2,
    "C/C=C/C(=O)I",
    "C/C=C/C(=O)S"
]

for s in smiles_list:
    fcs(s)

 1. Halogen (Br)
 2. Phenyl (c1ccccc1)
 3. Carbonyl (C=O)
 4. Acyl halide (C(=O)Cl)
 5. Halogen (Cl)


 1. Phenyl (c1ccccc1)
 2. Pyridine (c2cnccc2)
 3. Cyclohexane (C3CCCCC3)
 4. Alkene (-c2cnccc2)


 1. Alkene (C/C=C/C)
 2. Ester (C(=O)I)
 3. Halogen (I)


 1. Alkene (C/C=C/C)
 2. Thioester (C(=O)S)


# Agents:

We would like to integrate other things:
- User input
- External knowledge sources
- External tools
- Memory storage
- ...

In [10]:
# Let's define a tool!
from langchain.utilities import WikipediaAPIWrapper

wikipedia = WikipediaAPIWrapper()
wikipedia.run("Ozzy Osbourne")

'Page: Ozzy Osbourne\nSummary: John Michael "Ozzy" Osbourne (born 3 December 1948) is an English singer, songwriter, and television personality. He rose to prominence during the 1970s as the lead vocalist of the heavy metal band Black Sabbath, during which period he adopted the nickname "Prince of Darkness".Born and raised in Birmingham, Osbourne became a founding member of Black Sabbath in 1968, and provided lead vocals from their self-titled debut album in 1970 to Never Say Die! in 1978. The band was highly influential on the development of heavy metal music, in particular their critically acclaimed releases Paranoid, Master of Reality and Sabbath Bloody Sabbath. Osbourne was fired from Black Sabbath in 1979 due to alcohol and drug problems, but went on to have a successful solo career, releasing 13 studio albums, the first seven of which received multi-platinum certifications in the US. Osbourne has since reunited with Black Sabbath on several occasions. He rejoined in 1997 and help

In [11]:
from langchain.agents import initialize_agent, Tool

# Define a toolset 
toolset = [
    Tool(
        name="wikipedia search",
        func=wikipedia.run,
        description="Useful to get accurate information from wikipedia."
    )
]

# Define an agent
jamesbond = initialize_agent(
    toolset,
    llm,
    agent="zero-shot-react-description",
    verbose=True
)

In [12]:
jamesbond.run(f"What are the functional groups of {smiles}")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m I need to research the functional groups of this molecule
Action: wikipedia search
Action Input: functional groups of c1(Br)ccc(CC(=O)Cl)cc1[0m
Observation: [36;1m[1;3mPage: Clofazimine
Summary: Clofazimine, sold under the brand name Lamprene, is a medication used together with rifampicin and dapsone to treat leprosy. It is specifically used for multibacillary (MB) leprosy and erythema nodosum leprosum. Evidence is insufficient to support its use in other conditions though a retrospective study found it 95% effective in the treatment of Mycobacterium avium complex (MAC) when administered with a macrolide and ethambutol, as well as the drugs amikacin and clarithromycin. However, in the United States, clofazimine is considered an orphan drug, is unavailable in pharmacies, and its use in the treatment of MAC is overseen by the Food and Drug Administration. It is taken orally.Common side effects include abdominal pain, diarrh

'The functional groups of c1(Br)ccc(CC(=O)Cl)cc1 are a bromine, a chlorine, a carbonyl, and a cyclopentadienyliron dicarbonyl dimer.'

# Probably not the most useful tool for this task...

In [13]:
from linuslingo.mol_utils.generals import list_functional_groups

toolset += [
    Tool(
        name="Get functional groups",
        func=list_functional_groups,
        description="Use this tool to find the functional groups of some molecule. Input the smiles string of a single molecule."
    )
]


# Define an agent
jamesbond = initialize_agent(
    toolset,
    llm,
    agent="zero-shot-react-description",
    verbose=True
)

In [14]:
jamesbond.run(f"What are the functional groups of {smiles}")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m I need to find out what the functional groups are
Action: Get functional groups
Action Input: c1(Br)ccc(CC(=O)Cl)cc1[0m
Observation: [33;1m[1;3mThis molecule contains ketones, and halogens.[0m
Thought:[32;1m[1;3m I now know the final answer
Final Answer: The functional groups of c1(Br)ccc(CC(=O)Cl)cc1 are ketones and halogens.[0m

[1m> Finished chain.[0m


'The functional groups of c1(Br)ccc(CC(=O)Cl)cc1 are ketones and halogens.'

# What's happening behind the courtain?

In [15]:
prompt = jamesbond.agent.create_prompt(toolset)
james_prompt = jamesbond.agent.llm_chain.prompt

print(james_prompt.input_variables)

['input', 'agent_scratchpad']


In [16]:
print(james_prompt.template)

Answer the following questions as best you can. You have access to the following tools:

wikipedia search: Useful to get accurate information from wikipedia.
Get functional groups: Use this tool to find the functional groups of some molecule. Input the smiles string of a single molecule.

Use the following format:

Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [wikipedia search, Get functional groups]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question

Begin!

Question: {input}
Thought:{agent_scratchpad}


# A more useful agent

In [20]:
from linuslingo.mol_utils.generals import (
    query2smiles,
)

toolset += [
    Tool(
        name="Get smiles of molecule",
        func=query2smiles,
        description="Useful when you need the smiles of a molecule. Input the name of the molecule"
    )
]


llm = chat_models.ChatOpenAI(
    model_name="gpt-4",
    temperature=0.03,
    openai_api_key = OPENAI_API_KEY,
)

# Define an agent
jamesbond = initialize_agent(
    toolset,
    llm,
    agent="zero-shot-react-description",
    verbose=True
)

final_answer = jamesbond.run(
    "Find the functional groups of cafeine, p-bromobenzaldehyde, and cyclosarin, "
    "then find what they have in common, and find some information on about it in wikipedia."
)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mI need to find the smiles of each molecule first. I will start with caffeine.
Action: Get smiles of molecule
Action Input: caffeine[0m
Observation: [36;1m[1;3mCn1c(=O)c2c(ncn2C)n(C)c1=O[0m
Thought:[32;1m[1;3mNow I have the smiles of caffeine, I will find the functional groups of caffeine.
Action: Get functional groups
Action Input: Cn1c(=O)c2c(ncn2C)n(C)c1=O[0m
Observation: [33;1m[1;3mThis molecule contains ketones.[0m
Thought:[32;1m[1;3mI have found the functional groups of caffeine. Now I will find the smiles of p-bromobenzaldehyde.
Action: Get smiles of molecule
Action Input: p-bromobenzaldehyde[0m
Observation: [36;1m[1;3mO=Cc1ccc(Br)cc1[0m
Thought:[32;1m[1;3mNow I have the smiles of p-bromobenzaldehyde, I will find the functional groups of p-bromobenzaldehyde.
Action: Get functional groups
Action Input: O=Cc1ccc(Br)cc1[0m
Observation: [33;1m[1;3mThis molecule contains aldehydes, ketones, and halogens.

In [21]:
print(final_answer)

The common functional group among caffeine, p-bromobenzaldehyde, and cyclosarin is ketones. Ketones are a functional group with the structure R−C(=O)−R', where R and R' can be a variety of carbon-containing substituents. Ketones contain a carbonyl group −C(=O)− (which contains a carbon-oxygen double bond C=O). Many ketones are of great importance in biology and in industry, such as sugars (ketoses), steroids (e.g., testosterone), and the solvent acetone.
