# Create Large Languge Model (LLM) object

You can use any LLM supported by LangChain or a third-party package. Or create your own class.

If you use OpenAI, then you need to set OPENAI_API_KEY as an environment variable.

In [1]:
from langchain_openai import ChatOpenAI
from rich.console import Console

console = Console()
llm = ChatOpenAI(model="gpt-4o-mini")

## Basic LLM prompts

The LLM object can respond to a prompt in plain text. 
While we are testing it out, let's test the knowledge cut off.

In [2]:
question = "Who will run in the 2024 US Presidential election?"
console.print(llm.invoke(question))

# Creating a chain

You might notice that the output of the LLM is a big complex `AIMessage` object. If we just want to extract the text as a string, we can use the `StrOutputParser` class.

You string a `chain` together with the pipe symbol `|`.

In [3]:
from langchain.schema.output_parser import StrOutputParser

parser = StrOutputParser()
chain = llm | parser
console.print(chain.invoke(question))

## Prompt templates

To create a prompt template, you can use the `PromptTemplate` class. This class allows you to create a template with placeholders that you can fill in later.

It is useful when you want to create a prompt that you can reuse with different values.

In [4]:
from langchain_core.prompts.prompt import PromptTemplate

template = PromptTemplate.from_template("Write a couplet about {topic}.")
chain = template | llm | parser
console.print(chain.invoke("love"))

# Batch processing

You can also process multiple prompts in a batch. This is useful when you want to process multiple prompts at once.

Behind the scenes, the LangChain will run the prompts in parallel to speed up the processing.

In [5]:
topics = ["cats", "dogs", "birds"]
couplets = chain.batch(topics)
for topic, couplet in zip(topics, couplets):
    console.print(f"[bold]Couplet about {topic}:[/bold]\n{couplet}")

# Prompt Structure
The fine-tuning of Chat/Instruct LLMs often uses a `System` role, a `Human` role and an `AI` role. 
The `System` role is the one that provides the context. 
The `Human` role is the one that provides the input.
The `AI` role is the one that provides the output from the LLM.

In [6]:
from langchain_core.prompts import ChatPromptTemplate

template = ChatPromptTemplate([
    ("system", "You are a 5 year old kid."),
    ("human", "Write a couplet about {topic}."),
    ("ai", "Sure, here it is:"),
])

chain = template | llm | parser
console.print(chain.invoke("love"))

# Reading from an XML file
Let's build a chain that reads from a file, gets the abstract and summarises the content. 

First let's write a function to get an abstract from an XML file.


In [9]:
from pathlib import Path
from bs4 import BeautifulSoup

pubmed_dir = Path("pubmed")
def read_abstract(pmid:str) -> str:
    path = pubmed_dir / f"{pmid}.xml"
    xml_data = path.read_text()
    soup = BeautifulSoup(xml_data, 'xml')
    abstract_text = ' '.join(tag.get_text() for tag in soup.find_all('AbstractText'))
    return abstract_text

console.print(read_abstract("22883424"))

# Use a function with a chain

You can often use use a function with a chain.

In [11]:
template = ChatPromptTemplate([
    ("system", "You are a bioinformatician with experience in statistics and data science."),
    ("human", "Read the abstract and summarize the methodology used in around 50 words:\n{abstract}"),
    ("ai", "Certainly, here is a summary of the methodology:"),
])
chain =  read_abstract | template | llm | parser
console.print(chain.invoke("22883424"))

In [17]:
pmids = [path.stem for path in pubmed_dir.glob("*.xml")]
summaries = chain.batch(pmids)
for pmid, summary in zip(pmids, summaries):
    console.print(f"[bold]Summary of {pmid}:[/bold]\n{summary}")

# Chain of Thought (CoT)

Let's create a CoT that reads from an XML file, gets the abstract and answers a question.

In [19]:
template = ChatPromptTemplate([
    ("system", "You are a knowledgeable bioinformatician specializing in statistics and data science."),
    ("human", 
        "Analyze the abstract and provide a step-by-step explanation of whether the article explicitly or implicitly uses Bayesian inference in its methodology. "
        "Focus on identifying key statistical methods and reasoning patterns:\n{abstract}"
    ),
    ("ai", "Certainly, here is my step-by-step analysis:"),
])
chain =  read_abstract | template | llm | parser
console.print(chain.invoke("22883424"))

# Multiple LLM calls
You can combine multiple LLM calls in a chain.

In [22]:
yes_no_template = ChatPromptTemplate([
    ("system", "You are a critical evaluator of text."),
    ("human", "Based on the following step-by-step analysis, did it affirmatively conclude that Bayesian inference was used? Answer with 'Y' or 'N' only:\n{analysis}"),
    ("ai", "Certainly, here is my response:"),
])

def yes_no_parser(text:str) -> bool:
    return text.strip().lower() == "y"

complex_chain = chain | yes_no_template | llm | StrOutputParser() | yes_no_parser
complex_chain.invoke("22883424")

True

# Forking chains
Now, let's combine that chain with another one to get a boolean value of whether or not the article used Bayesian inference.

In [25]:
from langchain_core.runnables import RunnablePassthrough

forked_chain = chain | dict(
    is_bayesian=RunnablePassthrough() | yes_no_template | llm | StrOutputParser() | yes_no_parser,
    justification=RunnablePassthrough(),
)

console.print(forked_chain.invoke("22883424"))