<img src="https://www.rp.edu.sg/images/default-source/default-album/rp-logo.png" width="200" alt="Republic Polytechnic"/>

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/koayst-rplesson/SST_DP2025/blob/main/Day_02/L09/L09.ipynb)

# Setup and Installation

You can run this Jupyter notebook either on your local machine or run it at Google Colab.

* For local machine, it is recommended to install Anaconda and create a new development environment called `SST_DP2025`.
* Pip/Conda install the libraries stated below when necessary.
---

# <font color='red'>ATTENTION</font>

## Google Colab
- If you are running this code in Google Colab, **DO NOT** store the API Key in a text file and load the key later from Google Drive. This is insecure and will expose the key.
- **DO NOT** hard code the API Key directly in the Python code, even though it might seem convenient for quick development.
- You need to enter the API key at python code `getpass.getpass()` when ask.

## Local Environment/Laptop
- If you are running this code locally in your laptop, you can create a env.txt and store the API key there.
- Make sure env.txt is in the same directory of this Jupyter notebook.
- You need to install `python-dotenv` and run the Python code to load in the API key.

---
```
%pip install python-dotenv

from dotenv import load_dotenv

load_dotenv('env.tx')
openai_api_key = os.getenv('OPENAI_API_KEY')
```
---

## GitHub/GitLab
- **DO NOT** `commit` or `push` API Key to services like GitHub or GitLab.



# Lesson 09

- LangChain is a framework built around LLMs.
- Framework offered as a Python or Javascript (Typescript) package.
- Use it to build chatbots, Generative Question-Answer (GQA), summarization and much more.
- Core idea is to “chain” together different components to create more advanced use cases around LLMs.
- Provides developers with a comprehensive set of tools to seamlessly combine multiple prompts working with LLMs effortlessly.

In [None]:
%%capture --no-stderr
%pip install --quiet -U langchain
%pip install --quiet -U langgraph
%pip install --quiet -U langchain-openai

In [None]:
# langchain        0.3.11
# langgraph        0.2.59
# langchain-core   0.3.24
# langchain-openai 0.2.12
# openai           1.57.2
# pydantic         2.10.3

In [None]:
import getpass
import os

# setup the OpenAI API Key

# get OpenAI API key ready and enter it when ask
os.environ["OPENAI_API_KEY"] = getpass.getpass()

## A Simple LLM Application
We will be using [Chat Models](https://python.langchain.com/v0.2/docs/concepts/#chat-models). It takes a sequence of message as inputs and returns chat messages as outputs.

LangChain does not host any of the chat models. It depends on [third party](https://python.langchain.com/docs/integrations/chat/) LLM model providers. We will be using ChatOpenAI due to its popularity and performant. There are a few standard parameters that we can set with the chat models. Two most common are:
- `model`: the name of the model
- `temperature`: the sampling temperature

`temperature` controls the randomness or creativity of the model's output. Low temperature (close to 0) is more deterministic and focused outputs. High temperature (close to 1) is good for creative tasks or generating varied responses.

Chat models in LangChain have a number of (default methods)[https://python.langchain.com/v0.2/docs/concepts/#runnable-interface]. Most of the time, we will be using:
- `stream`: stream back chunks of response
- `invoke`: call the chain on the input
  
Chat models take messages as input. Messages have a role that describes who is saying the message and a content property. We get an `AIMessage` response upon invoking the model with messages.

We can also invoke a chat model with a string. The string is converted to `HumanMessage` and then passed to the model for processing. This interface is consistent across all chat models and models are typically initialised once at the start of each notebook.

In [None]:
# load langchain libraries
from langchain_openai import ChatOpenAI
from langchain.schema import HumanMessage

In [None]:
# create the chat LLM model
chat_model = ChatOpenAI(
    # don't need this if the OpenAI API Key is stored in the environment variable
    #openai_api_key="sk-proj-xxxxxxxxx",

    # TODO: we will be using gpt-4o-mini
    model = '_____'
)

In [None]:
# setup message prompt
text = "What date is Singapore National Day?"
messages = [HumanMessage(content=text)]

In [None]:
# note that Chat Model takes in message objects as input and generate message object as output

# TODO: invoke the LLM model
response = chat_model._____(messages)
print(response.content)

## A Simple LLM Application With Prompt Template

**Purpose of Prompt Template**
- Parameterized templates that dynamically generate specific prompts based on user input or other variables.
- By using variables in the template, you can adapt the prompt to specific scenarios without rewriting the entire prompt.
- Keeps your application logic (like API calls or response handling) separate from the prompt text, improving code readability and maintainability

In [None]:
# load langchain libraries
from langchain_openai import ChatOpenAI
from langchain_core.prompts.chat import ChatPromptTemplate

In [None]:
# initialise ChatModel with API key
chat_model = ChatOpenAI(
    model = 'gpt-4o-mini', 

    # TODO: set the temperature to zero point three
    temperature = _____
)

In [None]:
# Prompt template takes in raw user input ({input_language} and {output_language}) and 
# return a prompt that is ready to pass into a language model

system_template = "You are a helpful assistant that translates {input_language} to {output_language}."

# TODO: human_template defines the structure for the human message
# "text" is the placeholder for user-provided input
human_template = "{_____}"
chat_prompt = ChatPromptTemplate.from_messages([
    ("system", system_template),
    ("human", human_template),
])

In [None]:
# trsnslate English to French

# TODO: fill in the appropriate language strings for input and out languages
messages = chat_prompt.format_messages(
    input_language = "_____", 
    output_language = "_____", 
    text = "I love programming."
)

response = chat_model.invoke(messages).content
print(response)

In [None]:
# translate English to Chinese

# TODO: fill in the appropriate language string for input and out languages
messages = chat_prompt.format_messages(
    input_language = "_____", 
    output_language = "_____", 
    text = "I love programming."
)

# TODO: invoke the model
response = chat_model._____(messages).content
print(response)

## Streaming

`Runnable` interface is the foundation for working with LangChain components. A unit of work that can be invoked, batched, streamed, transformed and composed.

**Key Methods - (Synchronouse/Asynchronous):**
- `invoke`/`ainvoke`: Transforms a single input into an output.
- `batch`/`abatch`: Transforms multiple inputs into outputs.
- `stream`/`astream`: Outputs are streamed as they are produced.

Streaming is typically used when interfacting with language models to enable real-time or chunked responses. The purpose of this parameter is to allow the model to stream its output piece by piece (e.g., word by word or token by token) instead of waiting for the entire response to be generated.

In [None]:
messages = chat_prompt.format_messages(
    input_language = "English", 
    output_language = "English", 
    text = "What is teh-c siew dai?"
)

In [None]:
# observation:
# the output is streamed/printed piece by piece (eg. word by word or token by token)

# TODO: we would like to stream the output
for token in chat_model._____(messages):
    print(token.content, end="")

## Observation

- From the sample codes you just run, you have learned how to create your first simple LLM application. 
- You learned how to work with language model(s) and how to create a prompt template.

## PromptTemplate And LangChain Expression Language (LCEL)

In [None]:
# load langchain libraries
from langchain_openai import ChatOpenAI
from langchain_core.prompts.chat import ChatPromptTemplate
from langchain_core.output_parsers.string import StrOutputParser

In [None]:
# gpt-4 does not have 'instruct' model
# The chat models like gpt-4 and gpt-4-turbo are already instruction-following models.
# They are designed to interpret and respond effectively to instructions provided in the conversatio

llm = ChatOpenAI(
    model = 'gpt-4o-mini',
    temperature = 0.7,
)

# TODO: parse the output of a language model and convert it to a string
output_parser = _____()

In [None]:
# setup template 

human_template = "Write {lines} sentences about {topic}."
prompt = ChatPromptTemplate.from_template(human_template)

lines_topic_dict = {
    "lines" : "3", 
    "topic": "Sir Stamford Raffles"
}

In [None]:
# Without piping to StrOutputParser
# StrOutputParser = OutputParser that parses LLMResult into the top likely string

# TODO: prompt -> llm
lcel_chain_01 = _____ | _____

lcel_chain_01.invoke(lines_topic_dict)

In [None]:
# Pipe to StrOutputParesr

# TODO: prompt -> llm -> string output parser
lcel_chain_02 = _____ | _____ | _____

lcel_chain_02.invoke(lines_topic_dict)

## Template

### Prompt Template
Prompt templates take as input a dictionary where each key represents a variable for the prompt template to fill in.

In [None]:
from langchain_core.prompts.prompt import PromptTemplate

# TODO: create a prompt template with "sport" as the placeholder
prompt = _____(
    input_variables = ["sport"],
    template = "I love playing {_____}"
)

prompt.invoke({"sport":"table-tennis"})

Usually you will initialise Prompts using `from_template`

In [None]:
# TODO: create a prompt template from template
# the two placeholders are "country" and "adjective"
prompt = PromptTemplate._____(
    "I love visitng {_____} because of its {_____}"
)

prompt.invoke({
    "country" : "Japan",
    "adjective" : "scenery"
})

### MessagesPlaceholder
Prompt template is responsible for adding a list of messages in a particular place. If the user want to pass in a list of messages that could be slotted into a particular spot, we can use `MessagePlaceholder`.

In [None]:
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.messages import HumanMessage

prompt = ChatPromptTemplate([
    ("system", "You are a helpful assistant"),

    # place Human Message after System Message
    MessagesPlaceholder("messages")
])

prompt.invoke({"messages" : [HumanMessage(content="Hi! My name is John")]})

## LangChain Parsers / Structured Output

LangChain API Reference: [LangChain output_parser](https://python.langchain.com/api_reference/langchain/output_parsers.html)

### CommaSeparatedListOutputParser

In [None]:
from langchain_core.output_parsers.list import CommaSeparatedListOutputParser
from langchain_core.prompts.prompt import PromptTemplate
from langchain_openai import ChatOpenAI

In [None]:
# TODO: initialise a comma separated list output parser
output_parser = _____()
format_instructions = output_parser.get_format_instructions()

In [None]:
prompt_template = PromptTemplate(
    template="List down 5 countries that start with letter'{alphabet}'\n{format_instructions}",
    input_variables=["alphabet"],
    partial_variables={"format_instructions": format_instructions}
)

In [None]:
llm = ChatOpenAI(
    model = 'gpt-4o-mini',
    temperature = 0.2,
)

In [None]:
prompt = prompt_template.format(alphabet="S")

response = llm.invoke(prompt)
print(response.content)

## JSON
JSON is a lightweight, human-readable format for representing structured data. LLMs often use JSON to exchange information in a structured and consistent way. JSON is commonly used to represent and transmit structured data, making it easier to process and use in programming tasks.

JSON can represent key-value pairs, lists, and nested objects, which are useful for structured outputs. Many APIs, including those of LLMs, send and receive data in JSON format. LLMs might generate JSON outputs to integrate with other systems, such as databases, web applications, or scripts.

### Method 1:

In [None]:
from langchain_openai import ChatOpenAI
from langchain_core.prompts.chat import ChatPromptTemplate

llm = ChatOpenAI(
    model = 'gpt-4o-mini',
    model_kwargs = {"response_format" : { "type": "json_object" } }
)

In [None]:
response = llm.invoke("""
   Return a JSON object with two variables. The "name" is "Joe Doe" and his "age" is "30".
""")

In [None]:
print(response.content)

### Method 2:

- JSON requires `{` and `}` for its syntax. To escape these in Python strings, use double braces {{ and }}.
- `{name}` and `{age}` placeholders for variables remain single braces.

In [None]:
# TODO: enter appropriate curly braces below
prompt_template = PromptTemplate(
    input_variables=["name", "age"],
    template="""You are an assistant that formats responses in JSON.
Given the following inputs:
- Name: {name}
- Age: {age}

Respond with a JSON object in the following format:
_____
    "name": "value",
    "age": value
_____
"""
)

In [None]:
prompt = prompt_template.format(name="John Doe", age=30)

In [None]:
response = llm.invoke(prompt)
print(response.content)

## Schema Definition

- The output structure of a model response needs to be represented in some way.
- The simplest and most common format is a JSON-like structure which you just seen.
- The other method is use `Pydantic` as it allow you mto define schemas using Python's type annotations.
- It validates that the LLM's output conforms to the expected structure, catching errors early.

### Method 1: (Tool)
Tool challing refers to the mechanism where the language model interacts with external tools or functions to enhance its capabilities. This approach allows the model to perform actions, retrieve specific information, or execute tasks that go beyond its built-in knowledge and reasoning.

For a model to be able to call tools, we need to pass in tool schemas that describe what the tool does and what it's arguments are.

In [None]:
from pydantic import BaseModel, Field

# TODO: ResponseFormatter inherits from BaseModel 
# BaseModel is the foundational class in the pydantic library
class ResponseFormatter(BaseModel):
    answer: str = Field(description = "The answer to the user's question")
    followup_question: str = Field(description = "A followup question the user could ask")

In [None]:
from langchain_openai import ChatOpenAI

model = ChatOpenAI(
    model = "gpt-4o-mini", 
    temperature = 0
)

# TODO: bind response formatter schema as a tool to the model
model_with_tools = model.bind_tools([_____])

# Invoke the model
response = model_with_tools.invoke("What was the original colour of the Hulk in his first comic appearance?")

In [None]:
# use the 'eval' method to extract followup_question

followup = eval(response.additional_kwargs['tool_calls'][0]['function']['arguments'])

In [None]:
followup['followup_question']

### Method 2: (Pydantic)

Specify an JSON schema and query LLM for JSON outputs that conform to that schema.

Declare a data model with validation using Pydantic decoration `@validator`.

In [None]:
from langchain_core.prompts.chat import ChatPromptTemplate
from langchain_core.output_parsers.pydantic import PydanticOutputParser
from langchain_core.prompts.chat import SystemMessagePromptTemplate

# TODO: ResponseFormatter inherits from BaseModel. 
# BaseModel is the foundational class in the pydantic library
class ResponseFormatter(BaseModel):
    answer: str = Field(description = "The answer to the user's question")
    followup_question: str = Field(description = "A followup question the user could ask")

parser = PydanticOutputParser(pydantic_object=ResponseFormatter)

In [None]:
model = ChatOpenAI(
    model = "gpt-4o-mini", 
    temperature=0
)

In [None]:
template = "Answer the user query.\n{format_instructions}\n{query}\n"
system_message_prompt = SystemMessagePromptTemplate.from_template(template)
prompt = ChatPromptTemplate.from_messages([system_message_prompt])

messages = prompt.format_prompt(
    format_instructions=parser.get_format_instructions(),
    query = "What was the original colour of the Hulk in his first comic appearance?"
).to_messages()

In [None]:
# Invoke the model
response = model.invoke(messages)

In [None]:
# use the parse to extract the answer and followup question

output = parser.parse(response.content)

In [None]:
output.answer

In [None]:
output.followup_question

### Method 3: (ResponseSchema)

Schema for a response from a structured output parser.

In [None]:
from langchain.output_parsers.structured import ResponseSchema
from langchain.output_parsers.structured import StructuredOutputParser

In [None]:
# setup the schemas

# TODO: create the response schemas
answer_schema = _____(
    name = "answer",
    description = "The answer to the user's question"
)

followup_question_schema = _____(
    name = "followup_question",
    description = "A follow up question the user could ask"
)

# TODO: set up a list of schema just created
response_schema = [
    _____,
    _____
]

In [None]:
# setup the output parser
output_parser = StructuredOutputParser.from_response_schemas(response_schema)

format_instructions = output_parser.get_format_instructions()

# print the formatting instruction to understand the output format
print(format_instructions)

In [None]:
template = "Answer the user query.\n{format_instructions}\n{query}\n"
system_message_prompt = SystemMessagePromptTemplate.from_template(template)
prompt = ChatPromptTemplate.from_messages([system_message_prompt])

messages = prompt.format_prompt(
    format_instructions=parser.get_format_instructions(),
    query = "What was the original colour of the Hulk in his first comic appearance?"
).to_messages()

In [None]:
# invoke the model
response = model.invoke(messages)

In [None]:
print(response.content)
print('-'*10)

# the content is of type 'str'
print(type(response.content))

In [None]:
# force the output to be type dict

dict_response = output_parser.parse(response.content)

In [None]:
print(dict_response)
print('-'*10)

# the content is of type 'str'
print(type(dict_response))

## Memory Persistence
- LLMs are stateless.
- Each incoming query is processed independently.
- Memory allows a LLM to remember previous interactions.

According to LangChain documentation, it is recommended to take advantage of LangGraph persistence to incorporate `memory` into new LangChain applications.

- [Build a Chatbot](https://python.langchain.com/docs/tutorials/chatbot/#installation)
- [How to add thread-level persistence to your graph](https://langchain-ai.github.io/langgraph/how-tos/persistence/)

The `ConversationBufferMemory`, `ConversationBufferWindowMemory`, `ConversationTokenBufferMemory` and `ConversationSummaryBufferMemory` methods are deprecated since version 0.3.1.

In [None]:
from langgraph.checkpoint.memory import MemorySaver
from langgraph.graph import START, MessagesState, StateGraph

from langchain_core.messages import HumanMessage
from langchain_openai import ChatOpenAI

In [None]:
model = ChatOpenAI(model="gpt-4o-mini")

In [None]:
# define the function that calls the model
def call_model(state: MessagesState):
    response = model.invoke(state["messages"])
    return {"messages": response}

In [None]:
from IPython.display import Image, display

# visualize the graph using the get_graph method and one of the "draw" methods, like draw_ascii or draw_png

def drawGraph(graph):
    try:
        display(Image(graph.get_graph().draw_mermaid_png()))
    except Exception:
        # This requires some extra dependencies and is optional
        pass

### Without Persistence

In [None]:
# define a new graph
workflow = StateGraph(state_schema=MessagesState)

# define the (single) node in the graph
# TODO: fill in the blanks
workflow.add_node("model", call_model)
workflow.add_edge(START, "_____")
app = workflow.compile()

In [None]:
# draw the graph
drawGraph(app)

In [None]:
query = "Hi! I'm John."

input_messages = [HumanMessage(query)]
output = app.invoke({"messages": input_messages})

In [None]:
output['messages'][-1].pretty_print()

In [None]:
output['messages'][-1].content

In [None]:
query = "What is my name?"

input_messages = [HumanMessage(query)]
output = app.invoke({"messages": input_messages})

In [None]:
output['messages'][-1].pretty_print()

### With Persistence

- To add in persistence, we need to pass in a `Checkpointer` when compiling the graph.

In [None]:
# Add memory
# TODO: initialise memory saver
memory = _____()
# TODO: pass in the memory checkpointer
app = workflow.compile(checkpointer=_____)

In [None]:
# draw the graph
drawGraph(app)

In [None]:
# TODO: initialise an arbitary thread ID of 1
config = {"configurable": {"thread_id": "_____"}}

In [None]:
query = "Hi! I'm John."

input_messages = [HumanMessage(query)]
output = app.invoke({"messages": input_messages}, config)

In [None]:
output['messages'][-1].pretty_print()

In [None]:
output['messages'][-1].content

In [None]:
query = "What's my name?"

input_messages = [HumanMessage(query)]
output = app.invoke({"messages": input_messages}, config)
output["messages"][-1].pretty_print()

In [None]:
output['messages'][-1].pretty_print()