<img src="https://www.rp.edu.sg/images/default-source/default-album/rp-logo.png" width="200" alt="Republic Polytechnic"/>

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/koayst-rplesson/SDGAI_LLMforGenAIApp_Labs/blob/main/L09/L09.ipynb)

# Setup and Installation

You can run this Jupyter notebook either on your local machine or run it at Google Colab.

* For local machine, it is recommended to install Anaconda and create a new development environment called `c3669c`.
* Pip/Conda install the libraries stated below when necessary.
---

# <font color='red'>ATTENTION</font>

## Google Colab
- If you are running this code in Google Colab, **DO NOT** store the API Key in a text file and load the key later from Google Drive. This is insecure and will expose the key.
- **DO NOT** hard code the API Key directly in the Python code, even though it might seem convenient for quick development.
- You need to enter the API key at python code `getpass.getpass()` when ask.

## Local Environment/Laptop
- If you are running this code locally in your laptop, you can create a env.txt and store the API key there.
- Make sure env.txt is in the same directory of this Jupyter notebook.
- You need to install `python-dotenv` and run the Python code to load in the API key.

---
```
%pip install python-dotenv

from dotenv import load_dotenv

load_dotenv('env.tx')
openai_api_key = os.getenv('OPENAI_API_KEY')
```
---

## GitHub/GitLab
- **DO NOT** `commit` or `push` API Key to services like GitHub or GitLab.



# Lesson 09

- LangChain is a framework built around LLMs.
- Framework offered as a Python or Javascript (Typescript) package.
- Use it to build chatbots, Generative Question-Answer (GQA), summarization and much more.
- Core idea is to “chain” together different components to create more advanced use cases around LLMs.
- Provides developers with a comprehensive set of tools to seamlessly combine multiple prompts working with LLMs effortlessly.

In [None]:
%%capture --no-stderr
%pip install --quiet -U langchain
%pip install --quiet -U langchain-openai

In [None]:
# langchain        0.3.11
# langchain-core   0.3.24
# langchain-openai 0.2.12
# openai           1.57.2
# pydantic         2.10.3

In [1]:
import getpass
import os

# setup the OpenAI API Key

# get OpenAI API key ready and enter it when ask
os.environ["OPENAI_API_KEY"] = getpass.getpass()

 ········


### A Simple LLM Application 

In [2]:
# load langchain libraries
from langchain_openai import ChatOpenAI
from langchain.schema import HumanMessage

In [3]:
# why not a gpt-3.5-turbo-instruct ?
# https://github.com/BerriAI/litellm/issues/749
# gpt-3.5-turbo-instruct is an openai text completions model and so gets routed to completions not chat completions.

# in summary you will notice the patterns:
# model name = gpt-3.5-turbo-instruct --> text completion --> OpenAI
# model name - gpt-3.5-turbo --> chat completion --> ChatOpenAI

chat_model = ChatOpenAI(
    # don't need this if the OpenAI API Key is stored in the environment variable
    #openai_api_key="sk-proj-xxxxxxxxx",

    #model = 'gpt-3.5-turbo'
    model = 'gpt-4o-mini'
)

In [4]:
# setup message prompt
text = "What date is Singapore National Day?"
messages = [HumanMessage(content=text)]

In [5]:
# note that Chat Model takes in message objects as input and generate message object as output

response = chat_model.invoke(messages)
print(response.content)

Singapore National Day is celebrated on August 9th each year. It commemorates Singapore's independence from Malaysia in 1965.


### A Simple LLM Application With Prompt Template

In [6]:
# load langchain libraries
from langchain_openai import ChatOpenAI
from langchain_core.prompts.chat import ChatPromptTemplate

In [7]:
# initialise ChatModel with API key
chat_model = ChatOpenAI(
    model = 'gpt-4o-mini', 
    temperature = 0.3
)

In [8]:
# Prompt template takes in raw user input ({input_language} and {output_language}) and 
# return a prompt that is ready to pass into a language model

system_template = "You are a helpful assistant that translates {input_language} to {output_language}."

human_template = "{text}"
chat_prompt = ChatPromptTemplate.from_messages([
    ("system", system_template),
    ("human", human_template),
])

In [9]:
# trsnslate English to French

messages = chat_prompt.format_messages(
    input_language = "English", 
    output_language = "French", 
    text = "I love programming."
)

response = chat_model.invoke(messages).content
print(response)

J'aime la programmation.


In [10]:
# translate English to Chinese

messages = chat_prompt.format_messages(
    input_language = "English", 
    output_language = "Chinese", 
    text = "I love programming."
)

response = chat_model.invoke(messages).content
print(response)

我爱编程。


# Observation

- From the two sample codes you just run, you have learned how to create your first simple LLM application. 
- You learned how to work with language model(s) and how to create a prompt template

## PromptTemplate And LangChain Expression Language (LCEL)

In [11]:
# load langchain libraries
from langchain_openai import ChatOpenAI
from langchain_core.prompts.chat import ChatPromptTemplate
from langchain_core.output_parsers.string import StrOutputParser

In [12]:
# gpt-4 does not have 'instruct' model
# The chat models like gpt-4 and gpt-4-turbo are already instruction-following models.
# They are designed to interpret and respond effectively to instructions provided in the conversatio

llm = ChatOpenAI(
    model = 'gpt-4o-mini',
    temperature = 0.7,
)

output_parser = StrOutputParser()

In [13]:
# setup template 

human_template = "Write {lines} sentences about {topic}."
prompt = ChatPromptTemplate.from_template(human_template)

lines_topic_dict = {
    "lines" : "3", 
    "topic": "Sir Stamford Raffles"
}

In [14]:
# Without piping to StrOutputParser
# StrOutputParser = OutputParser that parses LLMResult into the top likely string

lcel_chain_01 = prompt | llm

lcel_chain_01.invoke(lines_topic_dict)

AIMessage(content="Sir Stamford Raffles was a British statesman and the founder of modern Singapore, establishing the strategic trading post in 1819. He played a crucial role in the expansion of British influence in Southeast Asia and was a proponent of various social and educational reforms. Raffles is also known for his interest in natural history, contributing to the study and preservation of the region's biodiversity.", additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 76, 'prompt_tokens': 17, 'total_tokens': 93, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_6fc10e10eb', 'finish_reason': 'stop', 'logprobs': None}, id='run-5702950a-c7c7-4143-b215-2ab0ee15e8c4-0', usage_metadata={'input_tokens': 17, 'output_tokens': 76,

In [15]:
# Pipe to StrOutputParesr

lcel_chain_02 = prompt | llm | output_parser

lcel_chain_02.invoke(lines_topic_dict)

"Sir Stamford Raffles was a British statesman and the founder of modern Singapore, establishing it as a strategic trading post for the British East India Company in 1819. He was also a prominent naturalist and played a significant role in the promotion of scientific exploration and conservation in the region. Raffles' legacy is celebrated for his contributions to Singapore's development and his efforts in fostering cultural and economic exchanges in Southeast Asia."

# Template

## Prompt Template
Prompt templates take as input a dictionary where each key represents a variable for the prompt template to fill in.

In [16]:
from langchain_core.prompts.prompt import PromptTemplate

prompt=PromptTemplate(
    input_variables = ["sport"],
    template = "I love playing {sport}"
)

prompt.invoke({"sport":"table-tennis"})

StringPromptValue(text='I love playing table-tennis')

Usually you will initialise Prompts using `from_template`

In [17]:
prompt = PromptTemplate.from_template(
    "I love visitng {country} because of its {adjective}"
)

prompt.invoke({
    "country" : "Japan",
    "adjective" : "scenery"
})

StringPromptValue(text='I love visitng Japan because of its scenery')

## MessagesPlaceholder
Prompt template is responsible for adding a list of messages in a particular place. If the user want to pass in a list of messages that could be slotted into a particular spot, we can use `MessagePlaceholder`.

In [18]:
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.messages import HumanMessage

prompt = ChatPromptTemplate([
    ("system", "You are a helpful assistant"),

    # place Human Message after System Message
    MessagesPlaceholder("messages")
])

prompt.invoke({"messages" : [HumanMessage(content="Hi! My name is John")]})

ChatPromptValue(messages=[SystemMessage(content='You are a helpful assistant', additional_kwargs={}, response_metadata={}), HumanMessage(content='Hi! My name is John', additional_kwargs={}, response_metadata={})])

# Structured Output
## CommaSeparatedListOutputParser

In [19]:
from langchain_core.output_parsers.list import CommaSeparatedListOutputParser
from langchain_core.prompts.prompt import PromptTemplate
from langchain_openai import ChatOpenAI

In [20]:
output_parser = CommaSeparatedListOutputParser()
format_instructions = output_parser.get_format_instructions()

In [21]:
prompt_template = PromptTemplate(
    template="List down 5 countries that start with letter'{alphabet}'\n{format_instructions}",
    input_variables=["alphabet"],
    partial_variables={"format_instructions": format_instructions}
)

In [22]:
llm = ChatOpenAI(
    model = 'gpt-4o-mini',
    temperature = 0.2,
)

In [23]:
prompt = prompt_template.format(alphabet="S")

response = llm.invoke(prompt)
print(response.content)

Spain, Sweden, Switzerland, Singapore, South Africa


In [24]:
## JSON

## JSON

### Method 1:

In [25]:
from langchain_openai import ChatOpenAI
from langchain_core.prompts.chat import ChatPromptTemplate

llm = ChatOpenAI(
    model = 'gpt-4o-mini',
    model_kwargs = {"response_format" : { "type": "json_object" } }
)

In [26]:
response = llm.invoke("""
   Return a JSON object with two variables. The "name" is "Joe Doe" and his "age" is "30".
""")

In [27]:
print(response.content)

{
  "name": "Joe Doe",
  "age": "30"
}


### Method 2:

- JSON requires `{` and `}` for its syntax. To escape these in Python strings, use double braces {{ and }}.
- `{name}` and `{age}` placeholders for variables remain single braces.

In [28]:
prompt_template = PromptTemplate(
    input_variables=["name", "age"],
    template="""You are an assistant that formats responses in JSON.
Given the following inputs:
- Name: {name}
- Age: {age}

Respond with a JSON object in the following format:
{{
    "name": "value",
    "age": value
}}
"""
)

In [29]:
prompt = prompt_template.format(name="John Doe", age=30)

In [30]:
response = llm.invoke(prompt)
print(response.content)

{
    "name": "John Doe",
    "age": 30
}


## Schema Definition

- The output structure of a model response needs to be represented in some way.
- The simplest and most common format is a JSON-like structure which you just seen.
- The other method is use `Pydantic` as it allow you mto define schemas using Python's type annotations
- It validates that the LLM's output conforms to the expected structure, catching errors early.

### Method 1: (Tool)

In [31]:
from pydantic import BaseModel, Field

class ResponseFormatter(BaseModel):
    answer: str = Field(description = "The answer to the user's question")
    followup_question: str = Field(description = "A followup question the user could ask")

In [32]:
from langchain_openai import ChatOpenAI

model = ChatOpenAI(
    model = "gpt-4o", 
    temperature = 0
)

# Bind responseformatter schema as a tool to the model
model_with_tools = model.bind_tools([ResponseFormatter])

# Invoke the model
response = model_with_tools.invoke("What was the original colour of the Hulk in his first comic appearance?")

In [33]:
# use the 'eval' method to extract followup_question

followup = eval(response.additional_kwargs['tool_calls'][0]['function']['arguments'])

In [34]:
followup['followup_question']

"Why was the Hulk's color changed from gray to green?"

### Method 2: (Pydantic)

In [35]:
from langchain_core.prompts.chat import ChatPromptTemplate
from langchain_core.output_parsers.pydantic import PydanticOutputParser
from langchain_core.prompts.chat import SystemMessagePromptTemplate

class ResponseFormatter(BaseModel):
    answer: str = Field(description = "The answer to the user's question")
    followup_question: str = Field(description = "A followup question the user could ask")

parser = PydanticOutputParser(pydantic_object=ResponseFormatter)

In [36]:
model = ChatOpenAI(
    model = "gpt-4o", 
    temperature=0
)

In [37]:
template = "Answer the user query.\n{format_instructions}\n{query}\n"
system_message_prompt = SystemMessagePromptTemplate.from_template(template)
prompt = ChatPromptTemplate.from_messages([system_message_prompt])

messages = prompt.format_prompt(
    format_instructions=parser.get_format_instructions(),
    query = "What was the original colour of the Hulk in his first comic appearance?"
).to_messages()

In [38]:
# Invoke the model
response = model.invoke(messages)

In [39]:
# use the parse to extract the answer and followup question

output = parser.parse(response.content)

In [40]:
output.answer

'The original color of the Hulk in his first comic appearance was gray.'

In [41]:
output.followup_question

"Why did the Hulk's color change from gray to green?"