<a href="https://colab.research.google.com/github/nemanovich/LLM-essentials/blob/main/week2_pratice_session.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In this week's practice session we'll learn:

- How to use LangChain, one of the most popular library to simplify LLM interaction;
- How to add plugins to an LLM with LangChain;
- How to interact with a database using an LLM.

# LangChain

LangChain is a handy library which supplies a whole infrastucture around LLMs (both open source and available by API) allowing to quickly establish LLM-powered services. It can help you with many LLM related tasks, from prompt optimisation to creating multi-call LLM agents.

Let's see how to use LangChain. First of all, download the library:

In [2]:
import os
from google.colab import userdata

# os.environ['OPENAI_API_KEY'] = open(".open-ai-api-key")
os.environ['OPENAI_API_KEY'] = userdata.get("OPENAI_API_KEY")
os.environ['KAGGLE_USERNAME'] = userdata.get("KAGGLE_USERNAME")
os.environ['KAGGLE_KEY'] = userdata.get("KAGGLE_KEY")

In [18]:
!pip install openai langchain langchain_openai -q

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/74.4 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m74.4/74.4 kB[0m [31m2.0 MB/s[0m eta [36m0:00:00[0m
[?25h

The easiest thing you can do with LangChain is just calling an LLM.  We'll do it for OpenAI API:

Note: The base model for OpenAI class is `text-davinci-003`, the significance of that will become apparent later

In [21]:
from langchain_openai import OpenAI

llm = OpenAI()

In [4]:
print(llm.invoke(
    "What is the difference between cats and dogs? In two words:"
))

 behavior and anatomy.

Behavior: Cats are generally considered to be more independent and aloof than dogs. They are known for their graceful and solitary nature, often spending hours grooming themselves and napping. Cats also have a tendency to be more territorial and may not get along well with other cats. On the other hand, dogs are social animals and thrive on companionship. They are known for their loyalty and love to be around their owners. Dogs are also more easily trainable and can perform a variety of tasks, while cats are typically less easily trained and may have a harder time learning new behaviors.

Anatomy: Cats and dogs have different physical characteristics that set them apart. Cats have retractable claws, while dogs' claws are always exposed. This allows cats to climb and jump with precision and agility, while dogs are better adapted for running and digging. Cats also have a more flexible spine, allowing them to squeeze into tight spaces and land on their feet when fa

As you can see, the interface is already much simpler, compared to writing it on your own.

LangChain also distinguishes between LLM's and Chat models.

A difference is very subtle and mostly affect the format in which you pass data. LLM's are a pure text completion models, which means they input text and output text. Where is ChatModels work on a list of ChatMessages, which can be AIMessage, HumanMessage or SystemMessage (this difference we covered in week 1) and return an AIMessage.

Newer OpenAI only implement chat interface, for example gpt-3.5-turbo, gpt-4, etc. This means, that you cannot use them as an LLM.

In [25]:
from langchain_openai import ChatOpenAI
from langchain.schema import (
    AIMessage,
    HumanMessage,
    SystemMessage
)

chat = ChatOpenAI(name='gpt-4o-mini')
chat.invoke([
    HumanMessage(content="In two words what's the difference "\
        "between Cats and Dogs?")
])

AIMessage(content='Behavior, loyalty', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 3, 'prompt_tokens': 19, 'total_tokens': 22, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'id': 'chatcmpl-C5chKsqzLSagXSnjogD7CBXvaZeol', 'service_tier': 'default', 'finish_reason': 'stop', 'logprobs': None}, id='run--45d0d24c-3820-4983-9fc4-7de79fc9e33c-0', usage_metadata={'input_tokens': 19, 'output_tokens': 3, 'total_tokens': 22, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})

Note: Draw your attention to the fact that we received a `AIMessage` instead of a string

### Basics

#### Prompt templates

A useful feature of LangChain is Prompt templates.

If you need to use the same prompt structure with different parameters, prompt templates can save you from the text duplication. See, for example:

In [26]:
from langchain.prompts import PromptTemplate

prompt = PromptTemplate.from_template(
    "What is the national cousine of {country}?"
)
prompt.format(country="Australia")

'What is the national cousine of Australia?'

Now our imaginary user needs only to select a country instead of creating a whole prompt.

#### Chaining

One of the main pillars of LangChain is the concept of chaining, that is of combining several LLM calls, external function calls, etc.

Much like you combine layers in neural networks, but here we have a much more diverse set of tools.

A very basic chain consists of prompt template and an LLM call. It's almost like a "function" for an LLM:

In [27]:
from langchain_core.output_parsers import StrOutputParser

output_parser = StrOutputParser()

chain = prompt | llm | output_parser
chain.invoke("Australia")

'\n\nThe national cuisine of Australia is often described as "modern Australian" or "Aussie cuisine," which is a fusion of different cultural influences, including British, Indigenous Australian, European, and Asian. Some popular dishes in Australia include meat pies, fish and chips, BBQ meats, lamingtons, and pavlova. However, due to its diverse population, there is no specific national dish or cuisine that represents all of Australia.'

`StrOutputParser` here transforms output of our LLM, which in this case is in `messages`, in the format of a string. In case you'd ask for multiple output options, this parser give you the most likely one.

Note by the way that, although we had a typo in the prompt template ("cousine" instead of "cuisine"), LLM managed to mitigate with it. You probably shouldn't rely on this too much, but generally LLMs, that are trained on data containing typos as well, can be able to cope with some amount of mistakes in the prompts.

In [28]:
text_model_response = llm.invoke("Hello, do you like cats?")
print(text_model_response)
print(f"Type: {type(text_model_response)}")

chat_response = chat.invoke("Hello, do you like cats?")
print(chat_response)
print(f"Type: {type(chat_response)}")


parsed_text_model_output = output_parser.invoke(text_model_response)
print(parsed_text_model_output)
print(type(parsed_text_model_output))

parsed_chat_output = output_parser.invoke(chat_response)
print(parsed_chat_output)
print(type(parsed_chat_output))



As an AI, I do not have personal preferences or emotions. I am programmed to assist and communicate with humans.
Type: <class 'str'>
content="Hello! As an AI, I don't have personal preferences, but I can provide information or answer any questions you may have about cats." additional_kwargs={'refusal': None} response_metadata={'token_usage': {'completion_tokens': 28, 'prompt_tokens': 14, 'total_tokens': 42, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'id': 'chatcmpl-C5chRcMMNJKjcgtxfighF1JbDb5MQ', 'service_tier': 'default', 'finish_reason': 'stop', 'logprobs': None} id='run--95cc0c1b-e845-4ff0-84f8-033287cb471f-0' usage_metadata={'input_tokens': 14, 'output_tokens': 28, 'total_tokens': 42, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_detai

#### Sequential chain

We can combine multiple calls in a simple sequential chain, where the output of one call become the input of another call.

In [29]:
first_prompt = PromptTemplate.from_template(
    "What is the capital of {country}?"
)
first_chain = first_prompt | llm | output_parser

second_prompt = PromptTemplate.from_template(
    "{city} is the capital of which country?"
)
second_chain = second_prompt | llm | output_parser

simple_sequential_chain = first_chain | second_chain

Intuitively now we should receive the same thing we inputted, let's try.

In [30]:
simple_sequential_chain.invoke("United Kingdom")

'\n\nUnited Kingdom'

If you want to make a more complicated chain, where outputs fill in specific variables, we'll have to use an `itemgetter`.

In [31]:
from operator import itemgetter


first_prompt = PromptTemplate.from_template(
    "Name a city of {country} starting with {letter}",
)
first_chain = first_prompt | llm | output_parser

second_prompt = PromptTemplate.from_template(
    "What is the main attraction in {city}?"
)
second_chain = second_prompt | llm | output_parser

sequential_chain = {
    "country": itemgetter("country"),
    "letter": itemgetter("letter"),
    "city": first_chain
} | second_chain | output_parser


In that case you'll have to pass input arguments as a dict.

In [32]:
sequential_chain.invoke({"country": "France", "letter": "P"})

'\n\nThe main attraction in Paris is the Eiffel Tower.'

#### Debugging

As you can see we only get the output of the last chain. But what if we want to see what happened in the first one?

In [33]:
from langchain.callbacks.tracers import ConsoleCallbackHandler


In [34]:
sequential_chain.invoke(
    {"country": "France", "letter": "P"},
    config={'callbacks': [ConsoleCallbackHandler()]}
)

[32;1m[1;3m[chain/start][0m [1m[chain:RunnableSequence] Entering Chain run with input:
[0m{
  "country": "France",
  "letter": "P"
}
[32;1m[1;3m[chain/start][0m [1m[chain:RunnableSequence > chain:RunnableParallel<country,letter,city>] Entering Chain run with input:
[0m{
  "country": "France",
  "letter": "P"
}
[32;1m[1;3m[chain/start][0m [1m[chain:RunnableSequence > chain:RunnableParallel<country,letter,city> > chain:RunnableLambda] Entering Chain run with input:
[0m{
  "country": "France",
  "letter": "P"
}
[36;1m[1;3m[chain/end][0m [1m[chain:RunnableSequence > chain:RunnableParallel<country,letter,city> > chain:RunnableLambda] [2ms] Exiting Chain run with output:
[0m{
  "output": "France"
}
[32;1m[1;3m[chain/start][0m [1m[chain:RunnableSequence > chain:RunnableParallel<country,letter,city> > chain:RunnableLambda] Entering Chain run with input:
[0m{
  "country": "France",
  "letter": "P"
}
[36;1m[1;3m[chain/end][0m [1m[chain:RunnableSequence > chain:Runnab

'\n\nThe main attraction in Paris is the Eiffel Tower.'

### Task 1

In this task we'll learn how to rewrite ChatGPT interaction code to LangChain.

In the previous week we inplemented translate and summarise function. Rewrite it using `SequentialChain`

In [16]:
from langchain_openai import OpenAI
from langchain.chains import SequentialChain
from langchain_core.output_parsers import StrOutputParser

from operator import itemgetter

llm = OpenAI()
output_parser = StrOutputParser()

summarise_prompt = PromptTemplate(
    input_variables=['text'],
    template="Write a short summary of the following text.\n{text}"
)
summarise_chain = summarise_prompt | llm | output_parser

translate_prompt = PromptTemplate(
    input_variables=['summary', 'target_language'],
    template="Translate the following text to {target_language}:\n{summary}"
)
translate_chain = translate_prompt | llm | output_parser

summarise_and_translate_chain = {
    "text": itemgetter("text"),
    "summary": summarise_chain,
    "target_language": itemgetter("target_language")
} | translate_chain | output_parser

In [None]:
article = open("wikipedia_article_japanese.txt").read()

summarise_and_translate_chain.invoke(
    {'text': article, "target_language": "English"}
)

'によっても異なる。\n\n\nIn addition, products featuring paw pads are also commonly seen in adult goods.\n\nPaw pads are also used as trademarks for hanko (name stamps) and stamps (refer to Nekkiu).\n\nPaw pads are the raised and hairless part of the bottom of the feet of animals in the order Carnivora, and are officially called metatarsal pads. The paw pads have sections such as the palmar pad, digital pads, carpal pads, plantar pads, and toe pads, and they mainly serve to cushion the impact during walking. They can be found in animals such as cats, dogs, bears, weasels, rodents, and marsupials. The shape and softness of paw pads vary among individuals and can also differ depending on the environment they inhabit. '

## LangChain Agents and Memory

In this part we'll explore two cool features of LangChain: **Agents** and **Memory**. You will learn how to:

- access internet inside a chain;
- remember the conversation history and adjust to it.

**Agents** allow you to use tooling like web search, calling apis, math, python code etc. (they are known as "Plugins" in ChatBPT Web UI) to achive the goal of the given task.

**Memory** allows you to keep a state of the conversation, just like what you see in the WebUI of ChatGPT.

If you combine the two you can essentially get the same interface as ChatGPT WebUI has with plugins.

### Web search

The are plenty of search engines available. We'll try DuckDuckGo, but feel free to use any other for your projects.

Let's install the library.

In [35]:
!pip install duckduckgo_search langchain_community -q

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m26.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.3/3.3 MB[0m [31m75.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m45.2/45.2 kB[0m [31m2.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.9/50.9 kB[0m [31m2.4 MB/s[0m eta [36m0:00:00[0m
[?25h

A search engine is a **tool**. Which is essentially a function with specific signature, that our LLM can use.

In [19]:
from IPython.display import display
from langchain_community.tools import DuckDuckGoSearchRun, DuckDuckGoSearchResults

results_tool = DuckDuckGoSearchResults()
display(results_tool("What is the name of the cat from Shrek"))

search_tool = DuckDuckGoSearchRun()
display(search_tool("What is the name of the cat from Shrek"))

  with DDGS() as ddgs:


"snippet: Oct 13, 2009 · I'm looking for a command line tool which gets an IP address and returns the host name, for Windows., title: windows - Resolve host name from IP address - Server Fault, link: https://serverfault.com/questions/74042/resolve-host-name-from-ip-address, snippet: This is a Canonical Question about Active Directory domain naming. After experimenting with Windows domains and domain controllers in a virtual environment, I've realized that having an …, title: Windows Active Directory naming best practices? - Server Fault, link: https://serverfault.com/questions/76715/windows-active-directory-naming-best-practices, snippet: Mar 26, 2023 · What could be the possible problems with accessing a Windows file server shares using a DNS CNAME instead of the actual computer name? The file server is joined to an Active …, title: Accessing Windows file server by alias name, link: https://serverfault.com/questions/1127178/accessing-windows-file-server-by-alias-name, snippet: I occas

  with DDGS() as ddgs:


"Oct 13, 2009 · I'm looking for a command line tool which gets an IP address and returns the host name, for Windows. This is a Canonical Question about Active Directory domain naming. After experimenting with Windows domains and domain controllers in a virtual environment, I've realized that having an … Mar 26, 2023 · What could be the possible problems with accessing a Windows file server shares using a DNS CNAME instead of the actual computer name? The file server is joined to an Active … I occasionally get the following 421 error: Misdirected Request The client needs a new connection for this request as the requested host name does not match the Server Name Indication (SNI... Oct 25, 2023 · This is a new installation of Server 2022 Standard 21H2. I'm trying to configure the SMTP Server so that a client application can send emails internally. When I open IIS 6.0 …"

Creating an agent, which uses this tool is pretty simple

In [20]:
from langchain.agents import AgentType, initialize_agent, load_tools
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(name='gpt-4o-mini')

agent = initialize_agent(
    tools=[search_tool], llm=llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
)

  agent = initialize_agent(


Let's see it in action

In [21]:
agent.invoke("What is the name of the cat from Shrek")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mI should search for the name of the cat from Shrek
Action: duckduckgo_search
Action Input: cat from Shrek name[0m

  with DDGS() as ddgs:



Observation: [36;1m[1;3mThe cat <<EOF syntax is very useful when working with multi-line text in Bash, eg. when assigning multi-line string to a shell variable, file or a pipe. Examples of cat <<EOF syntax usage in Bash: cat "Some text here." > myfile.txt Possible? Such that the contents of myfile.txt would now be overwritten to: Some text here. This doesn't work for me, but also doesn't throw any errors. … The original order is in fact backwards. Certs should be followed by the issuing cert until the last cert is issued by a known root per IETF's RFC 5246 Section 7.4.2 This is a sequence (chain) of … May 14, 2009 · 46 There are a few ways to pass the list of files returned by the find command to the cat command, though technically not all use piping, and none actually pipe directly to cat. The … Oct 23, 2018 · The problem is that echo removes the newlines from the string. How do you append to a file a string which contains newlines?[0m
Thought:[32;1m[1;3mThis search did not prov

  with DDGS() as ddgs:



Observation: [36;1m[1;3mPus is an exudate, typically white-yellow, yellow, or yellow-brown, formed at the site of inflammation during infections, regardless of … The meaning of PUSS is cat. Jun 14, 2023 · Pus is a thick fluid containing dead tissue, cells, and bacteria. Your body often produces it when it’s fighting off an infection, … Nov 16, 2023 · Pus is a whitish-yellow, yellow, green, or brown-yellow protein-rich fluid called liquor puris that accumulates at the site of an … Jan 26, 2023 · Pus is a fluid that contains a mixture of dead skin cells, white blood cells, and infectious material. The body produces pus as …[0m
Thought:[32;1m[1;3mSearching for "Puss in Boots Shrek name" gave me more relevant information
Final Answer: The name of the cat from Shrek is Puss in Boots.[0m

[1m> Finished chain.[0m


{'input': 'What is the name of the cat from Shrek',
 'output': 'The name of the cat from Shrek is Puss in Boots.'}

As you can see, agent not only chose to perform a web search, but also read the results and gave you the final answer.

You can read more about how ReAct agents work [here](https://react-lm.github.io/)

### Memory

Memory allows an agent to memorize the previous interaction with the user and act according to it. Let's try to add memory and make a small conversation.

We'll use the simplest construct called `ConversationBufferMemory` but you can actually use more complicated ones, which save conversation history to a database for example.

In [23]:
from langchain.memory import ConversationBufferMemory

memory = ConversationBufferMemory(memory_key='chat_history', return_messages=True)
memory.chat_memory.add_user_message("Hello, ChatGPT! How's your day?")
memory.chat_memory.add_ai_message("I'm doing well, thanks for asking!")

memory.load_memory_variables({})

  memory = ConversationBufferMemory(memory_key='chat_history', return_messages=True)


{'chat_history': [HumanMessage(content="Hello, ChatGPT! How's your day?", additional_kwargs={}, response_metadata={}),
  AIMessage(content="I'm doing well, thanks for asking!", additional_kwargs={}, response_metadata={})]}

Note:
- We used `memory_key` = 'chat_history', which is why memory returns us messages under that key
- We used `return_messages` = True, which is why memory returns messages to us instead of concatenated strings.

Chat history is explicitly present in the prompt as the `history` variable.

Now, let's define the chain:


With LangChain you can initialise an agent with memory still in just a couple lines.
You need to make sure to use an appropriate agent type (in this case the "CHAT_CONVERSATION" ReAct agent.

Note: Admittedly the documentation for this is a bit chaotic, so you'll have to play a bit before you get a good result.

In [24]:
from langchain.agents import AgentType, initialize_agent, load_tools
from langchain_openai import ChatOpenAI

memory = ConversationBufferMemory(memory_key='chat_history', return_messages=True)

llm = ChatOpenAI(name='gpt-4o-mini')

agent = initialize_agent(
    tools=[search_tool],
    memory=memory,
    llm=llm,
    agent=AgentType.CHAT_CONVERSATIONAL_REACT_DESCRIPTION ,
    verbose=True,
)

Let's to observe some memorization happening!

In [25]:
agent.invoke("What is the name of the cat from Shrek?")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m```json
{
    "action": "Final Answer",
    "action_input": "The name of the cat from Shrek is Puss in Boots."
}
```[0m

[1m> Finished chain.[0m


{'input': 'What is the name of the cat from Shrek?',
 'chat_history': [HumanMessage(content='What is the name of the cat from Shrek?', additional_kwargs={}, response_metadata={}),
  AIMessage(content='The name of the cat from Shrek is Puss in Boots.', additional_kwargs={}, response_metadata={})],
 'output': 'The name of the cat from Shrek is Puss in Boots.'}

In [27]:
agent.invoke("How many sequels were there in this film?")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m```json
{
    "action": "duckduckgo_search",
    "action_input": "Shrek film sequels"
}
```[0m

  with DDGS() as ddgs:



Observation: [36;1m[1;3mShrek is an anti-social ogre who loves the solitude of his swamp and enjoys fending off mobs and intruders. One day, his life is interrupted after he inadvertently saves a talkative Donkey from … May 18, 2001 · Shrek: Directed by Andrew Adamson, Vicky Jenson. With Mike Myers, Eddie Murphy, Cameron Diaz, John Lithgow. A mean lord exiles fairytale creatures to the swamp of … On a mission to retrieve a princess from a fire-breathing dragon, gruff ogre Shrek teams up with an unlikely compatriot — a wisecracking donkey. You may be looking for Shrek (character) or Shrek (franchise). Shrek is a 2001 American computer-animated fantasy comedy film produced and distributed by DreamWorks Pictures. It … Shrek (Mike Myers) goes on a quest to rescue the feisty Princess Fiona (Cameron Diaz) with the help of his loveable Donkey (Eddie Murphy) and win back the deed to his swamp from …[0m
Thought:[32;1m[1;3m```json
{
    "action": "Final Answer",
    "action_input": "There 

{'input': 'How many sequels were there in this film?',
 'chat_history': [HumanMessage(content='What is the name of the cat from Shrek?', additional_kwargs={}, response_metadata={}),
  AIMessage(content='The name of the cat from Shrek is Puss in Boots.', additional_kwargs={}, response_metadata={}),
  HumanMessage(content='How many sequels were there in this film?', additional_kwargs={}, response_metadata={}),
  AIMessage(content='There are four sequels in the Shrek film series.', additional_kwargs={}, response_metadata={})],
 'output': 'There are four sequels in the Shrek film series.'}

# Vector stores

In [3]:
from IPython.display import Image
Image("/content/langchain_vectorstore.png", width=600)

FileNotFoundError: No such file or directory: '/content/langchain_vectorstore.png'

FileNotFoundError: No such file or directory: '/content/langchain_vectorstore.png'

<IPython.core.display.Image object>

One of the goals of this week is to create your own RAG-based app. **RAG** (**R**etrieval **A**ugmented **G**eneration) is a concept of supporting a generative model with some kind of a retrieval tool which allows to get more faithful results and less hallucinations. This is crucial when we need to supply our users with facts, for example, if we're creating a navigation tool for a company's internal wiki.

Actually, we already touched upon RAG when we used DuckDuckGo. This time we'll retrieve data from a specific type of database - **vector store**.

The idea behind vector storages is to represent data items as **embeddings** (real vectors). When we receive a search query, we also somehow make it into an embedding and look for its nearest neighbors in the vector space which can be done rather quickly if somewhat approximately. If your embedding model produces vectors with strong semantic information embedded into it, you can have very high quality retrieval.

Vector storages emerged long before transformers, but, but because transformer models offer exceptional text understanding capabilities, using them to construct embeddings for vector storage systems is very popular. A typical AI-powered vector database query tool works like that:

- An LLM reformulates user's prompt into a vector store query;
- An embedding model is used to map the query into the database vector space;
- Vector store returns several items whose embeddings are nearest neighbors of the query's embedding;
- An LLM is used to process search results into a nice human readable output.

In this practice session you'll getting acquainted with vector databases, and in the homework you'll assemble all the pipeline using LangChain.


There are quite a few vector stores available. We will employ the system called [Faiss](https://github.com/facebookresearch/faiss). It is a state-of-the-art library made by Meta for creating vector databases, which is used by a lot of production solutions.

We will use an IELTS essay dataset as a source of long texts, we want to search through.

Please make sure to put your credentials in an appropriate location following the instruction here https://github.com/Kaggle/kaggle-api#api-credentials

In [4]:
!pip install kaggle faiss-cpu tiktoken -q

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m31.4/31.4 MB[0m [31m54.0 MB/s[0m eta [36m0:00:00[0m
[?25h

In [5]:
!export KAGGLE_CONFIG_DIR="/content/" && kaggle datasets download mazlumi/ielts-writing-scored-essays-dataset

Dataset URL: https://www.kaggle.com/datasets/mazlumi/ielts-writing-scored-essays-dataset
License(s): other
Downloading ielts-writing-scored-essays-dataset.zip to /content
  0% 0.00/674k [00:00<?, ?B/s]
100% 674k/674k [00:00<00:00, 595MB/s]


In [6]:
!unzip ielts-writing-scored-essays-dataset.zip

Archive:  ielts-writing-scored-essays-dataset.zip
  inflating: ielts_writing_dataset.csv  


Let's look at the data:

In [7]:
import pandas

In [8]:
pandas.options.display.max_colwidth = 100
reviews = pandas.read_csv("ielts_writing_dataset.csv")
reviews.head(2).dropna(axis=1)

Unnamed: 0,Task_Type,Question,Essay,Overall
0,1,The bar chart below describes some changes about the percentage of people were born in Australia...,"Between 1995 and 2010, a study was conducted representing the percentages of people born in Aust...",5.5
1,2,"Rich countries often give money to poorer countries, but it does not solve poverty. Therefore, d...","Poverty represents a worldwide crisis. It is the ugliest epidemic in a region, which could infec...",6.5


## Text splitters

The length of the documents that we could store in a vector storage is limited by the context length of your models. The texts we work with are often longer, so we need **Text Splitters** to cut the texts into pieces.

First of all, let's check out how big our documents are:

In [9]:
# no truncation of text
pandas.options.display.max_colwidth = 100_000_000

In [10]:
import tiktoken
import re
enc = tiktoken.get_encoding("cl100k_base")

In [11]:
rows_as_single_string = reviews.apply(
    lambda row: (re.sub(' +', ' ', row.to_string().replace("\n", " "))),
    axis=1
)
max(map(lambda text: len(enc.encode(text)), rows_as_single_string))

772

Even though this is less then 4096 max ChatGPT tokens, models typically don't undrestand long texts well enough, so it's better to split this item.

Let's create a document list for our database

In [12]:
documents = rows_as_single_string.tolist()

Let's look at a simple splitter called `CharacterTextSplitter`. It splits text on `separator` then gathers chunks based on `chunk size` as measured by a `length_function`. `chunk_overlap` controlls how much of the previous chunk we want to include in the next one for continuity.

Let's see an example.

In [13]:
from langchain.text_splitter import CharacterTextSplitter

text_splitter = CharacterTextSplitter(
    separator=" ",
    chunk_size=32,
    chunk_overlap=4,
    length_function=lambda text: len(enc.encode(text)),
)

In [14]:
from IPython.display import display

texts = text_splitter.create_documents(documents)
display(texts[0])
display(texts[1])
display(texts[2])



Document(metadata={}, page_content='Task_Type 1 Question The bar chart below describes some changes about the percentage of people')

Document(metadata={}, page_content='of people were born in Australia and who were born outside Australia living in urban,')

Document(metadata={}, page_content='in urban, rural and town between 1995 and 2010.Summarise the information by')

`RecursiveCharacterTextSplitter` is very similar to `CharacterTextSplitter`, except for the splitting and gathering logic. It inputs a list of `separators` (the default is ["\n\n", "\n", " ", ""]), which it then used in the same order as in the list. That means that first we split paragraphs, then if they are bigger than `chunk_size` we split on sentences, and so on. This helps the chunks to be a bit more cohesive.

In [15]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=32,
    chunk_overlap=4,
    length_function=lambda text: len(enc.encode(text)),
    add_start_index=True,
)

In [16]:
texts = text_splitter.create_documents(documents)
display(texts[0])
display(texts[1])
display(texts[2])

Document(metadata={'start_index': 0}, page_content='Task_Type 1 Question The bar chart below describes some changes about the percentage of people were born in Australia and who were born outside Australia living in urban, rural')

Document(metadata={'start_index': -1}, page_content='in urban, rural and town between 1995 and 2010.Summarise the information by selecting and reporting the main features and make comparisons where relevant.')

Document(metadata={'start_index': 288}, page_content='comparisons where relevant. Essay Between 1995 and 2010, a study was conducted representing the percentages of people born in Australia, versus people born outside')

Probably the most reasonable way to split is not by characters but by tokens using the model's tokenizer. LangChain supports creating a text splitter directly from tiktoken.

In [None]:
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=32,
    chunk_overlap=4,
    add_start_index=True
)

In [None]:
texts = text_splitter.create_documents(documents)
display(texts[0])
display(texts[1])
display(texts[2])

Document(metadata={'start_index': 0}, page_content='Task_Type 1 Question The bar chart below describes some changes about the percentage of people were born in Australia and who were born outside Australia living in urban, rural')

Document(metadata={'start_index': -1}, page_content='in urban, rural and town between 1995 and 2010.Summarise the information by selecting and reporting the main features and make comparisons where relevant. Essay Between')

Document(metadata={'start_index': 316}, page_content='Essay Between 1995 and 2010, a study was conducted representing the percentages of people born in Australia, versus people born outside Australia, living in urban, rural,')

## Vector database creation

Let's create a database of segments of IELTS essays and examinator comments.

In [36]:
from langchain.docstore.document import Document
from langchain_openai import OpenAIEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import FAISS

# Load the document, split it into chunks, embed each chunk and load it into the vector store.
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=256,
    chunk_overlap=16,
    add_start_index=True
)
splitted_documents = text_splitter.create_documents(documents)
db = FAISS.from_documents(splitted_documents, OpenAIEmbeddings())

Now we can perform similarity search using our embeddings

In [None]:
query = "An awesome essay about bar charts"
docs = db.similarity_search(query)
docs[0].page_content

"Task_Type 1 Question The bar charts below shows the number of visits to a community website in the first and second year of use.Summarize the information by selecting and reporting the main features and mae comparisons with relevant. Essay The bar chart illustrates the quantity of visits by the thousands paid to a community website within the first two years of use.\\nOverall, there is a greater upward trend from the second year of use compared to the first year of use. In addition to that, in both years the website undergoes a drastic fluctuation in numbers. It can be observed that initially in the month of September, number of visits in the first year of use are lower than second year of use, but numbers of the former subsequently surpasses the latter in the final month of August.\\nIn regards to the first year of use, quantity of visits increases from about 2000 visits in September to 10000 visits within 2 months and remains constant for another month. Following that, numbers plumm

In [None]:
query = "A poorly written essay"
docs = db.similarity_search(query)
docs[0].page_content

'but you must offer more arguments regarding why you agree or disagree. There are many spelling, punctuation and article errors. The essay is easy to follow but has the appearance of the writer running short of time. Task_Response NaN Coherence_Cohesion NaN Lexical_Resource NaN Range_Accuracy NaN Overall 5.0'

# Specific OpenAI api capabilities

Since the creation of LangChain, OpenAI's api actually added a lot of creature comforts on it's own, so some of the funcitonality is being duplicated a bit now.


## Structured outputs

Modern LLMs support outputing in a specific format, for example we can use "JSON mode" to force outputs to be in JSON fromat.

In [37]:
import os
from google.colab import userdata

# os.environ['OPENAI_API_KEY'] = open(".open-ai-api-key")
os.environ['OPENAI_API_KEY'] = userdata.get("OPENAI_API_KEY")

from openai import OpenAI

client = OpenAI()

non_json_output = client.chat.completions.create(
    messages=[{'role': 'user', 'content': 'Design a role play character\'s name, class and a short description'}],
    model="gpt-4o-mini",
).choices[0].message.content
print(non_json_output)

json_output = client.chat.completions.create(
    messages=[{'role': 'user', 'content': 'Design a role play character\'s name, class and a short description in json format'}],
    model="gpt-4o-mini",
    response_format={"type": "json_object"}
).choices[0].message.content
print(json_output)

**Character Name:** Lirael Thorne

**Class:** Shadow Mage

**Description:** Lirael Thorne is a mysterious figure shrouded in the whispers of the night. With long, flowing silver hair that seems to absorb the surrounding light and piercing violet eyes, she exudes an aura of both elegance and danger. As a Shadow Mage, Lirael has mastered the art of manipulating darkness to conceal her presence and create illusions that can confuse and terrify her foes. Her attire consists of a fitted, midnight-blue cloak adorned with glowing runes, offering her stealth and protection.

Lirael hails from a hidden enclave, where she was trained in the ancient customs of shadow magic. Torn between her desire for power and a profound sense of responsibility, she often grapples with the ethical implications of her abilities. Despite her enigmatic demeanor, she is fiercely loyal to those she trusts, using her skills to help her allies and protect the innocent from the dangers lurking in the shadows. With a pen

This is useful, because that'll make it much easier for you later to parse the outputs:

In [38]:
import json
json.loads(json_output)

{'character': {'name': 'Elysia Darkweaver',
  'class': 'Shadow Sorceress',
  'description': 'Elysia is a master of dark magic, drawing power from the shadows to manipulate the minds of her enemies and bend them to her will. With her flowing black robes that seem to absorb light, and her piercing violet eyes, she excels in deception and intrigue. Elysia carries an ancient grimoire that holds secrets of forgotten spells and curses, making her both a formidable foe and a valuable ally in the darkest of times.'}}

We can go another step further and actually define a `pydantic` model for our outputs:

In [None]:
from typing import List
from pydantic import BaseModel

class CharacterProfile(BaseModel):
    name: str
    age: int
    special_skills: List[str]
    traits: List[str]
    character_class: str
    origin: str

completion = client.beta.chat.completions.parse(
    model="gpt-4o-2024-08-06",
    messages=[
        {"role": "user", "content": "Design a role play character"}
    ],
    response_format=CharacterProfile,
)

completion.choices[0].message.parsed

CharacterProfile(name='Elara Nightshade', age=27, special_skills=['Archery', 'Potion Brewing', 'Stealth Navigation'], traits=['Loyal', 'Adaptable', 'Cunning'], character_class='Ranger', origin='Elderwood Forest')

So no we have predefined format of outputs, which is easy to work with.

## OpenAPI Tool Usage

We can use tools in OpenAI api as well. Let's see how we can use web search with just the api:

In [39]:
!pip install duckduckgo_search -q

In [40]:
from duckduckgo_search import DDGS

search = DDGS()
search.text(keywords="What is the capital of France", max_results=3)

  search = DDGS()


[{'title': 'Capital of France Crossword Clue - NYT Crossword Answers',
  'href': 'https://nytcrosswordanswers.org/capital-of-france-crossword-clue/',
  'body': 'May 6, 2020 answer of Capital Of France clue in NYT Crossword Puzzle. There is One Answer total, Euros is the most recent and it has 5 letters.'},
 {'title': "Capital of France's Côte d'Or Crossword Clue",
  'href': 'https://nytcrosswordanswers.org/capital-of-frances-cote-dor-crossword-clue/',
  'body': 'March 18, 2019 answer of Capital Of Frances Cote Dor clue in NYT Crossword Puzzle. There is One Answer total, Dijon is the most recent and it has 5 letters.'},
 {'title': 'Tour de France stage Crossword Clue - NYT Crossword Answers',
  'href': 'https://nytcrosswordanswers.org/tour-de-france-stage-crossword-clue/',
  'body': 'March 28, 2025 answer of Tour De France Stage clue in NYT Crossword Puzzle. There are Two Answers total, Etape is the most recent and it has 5 letters.'}]

Now we can define a `tool` description for OpenAI's client, so that the model knows how to use it.

We will only expose `keywords` parameter.

We also need to write short descriptions to explain what the tool and the parameter are for.

Tool usage is sort of an extension of "JSON mode" because in the end we get a dict of parameters, parsed from the JSON.

In [44]:
from openai import OpenAI

client = OpenAI()

tools = [
    {
        "type": "function",
        "function": {
            "name": "search-text",
            "description": "Retrieves results from DuckDuckGo web search",
            "parameters": {
                "type": "object",
                "properties": {
                    "keywords": {
                        "type": "string",
                        "description": "What you search for",
                    },
                },
                "required": ["keywords"],
            },
        }
    },
]


messages = []
messages.append({"role": "system", "content": "If you are asked about the factual information, create a function call instead. If you already searched, use the results to give an answer."})
messages.append({"role": "user", "content": "What is the name of the cat from Shrek?"})
chat_response = client.chat.completions.create(
    messages=messages, tools=tools, model="gpt-4o-mini"
)
chat_response

ChatCompletion(id='chatcmpl-C5cnDgvRKZHXtoHzvT1Ep3XE41uY4', choices=[Choice(finish_reason='tool_calls', index=0, logprobs=None, message=ChatCompletionMessage(content=None, refusal=None, role='assistant', annotations=[], audio=None, function_call=None, tool_calls=[ChatCompletionMessageFunctionToolCall(id='call_ga4iMurTLWx3HwPCrKDTwqDi', function=Function(arguments='{"keywords":"cat from Shrek name"}', name='search-text'), type='function')]))], created=1755456795, model='gpt-4o-mini-2024-07-18', object='chat.completion', service_tier='default', system_fingerprint='fp_560af6e559', usage=CompletionUsage(completion_tokens=18, prompt_tokens=91, total_tokens=109, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0), prompt_tokens_details=PromptTokensDetails(audio_tokens=0, cached_tokens=0)))

Now we can extract the function usage output from the result

In [45]:
chat_response.choices[0].message.tool_calls[0]

ChatCompletionMessageFunctionToolCall(id='call_ga4iMurTLWx3HwPCrKDTwqDi', function=Function(arguments='{"keywords":"cat from Shrek name"}', name='search-text'), type='function')

Now based on this functionality, we can create a function to answer using web search.

In [49]:
import json

def chat_completion_with_web_search(query):
    ready_to_answer = False
    messages = []
    messages.append({
        "role": "system",
        "content": "If you are asked about the factual information, "\
        "create a search function call instead of answering directly."\
        "If you already searched, use the results to give an answer."})
    messages.append({"role": "user", "content": query})
    while not ready_to_answer:
        chat_response = client.chat.completions.create(
            messages=messages, tools=tools, model="gpt-4o-mini"
        ).choices[0].message
        messages.append(chat_response.to_dict())
        if chat_response.tool_calls:
            if chat_response.tool_calls[0].function.name == "search-text":
                print("Searching the web")
                call_arguments = json.loads(
                    chat_response.tool_calls[0].function.arguments
                )
                print(f"Call arguments: {call_arguments}")
                web_results = str(search.text(**call_arguments))
                print(f"Results: {web_results}")
                messages.append({
                    "role": "tool",
                    "content": web_results,
                    "tool_call_id": chat_response.tool_calls[0].id
                })
            else:
                raise ValueError(f"Unsupported tool {chat_response.tool_calls[0].function.name}")
        else:
            print("Answering the question")
            messages.append({"role": "assistant", "content": chat_response.content})
            ready_to_answer = True
    return messages[-1]['content']

In [50]:
chat_completion_with_web_search("How many episodes in Star Wars?")

Searching the web
Call arguments: {'keywords': 'how many episodes in Star Wars series'}
Results: [{'title': 'How to Watch Every Star Wars Movie and Series in Order - IGN', 'href': 'https://www.ign.com/articles/star-wars-movies-tv-shows-chronological-order', 'body': ''}, {'title': 'Star Wars Movies and Series Viewing Guide | StarWars.com', 'href': 'https://www.starwars.com/news/star-wars-movies-and-series-guide', 'body': 'May 4, 2025 · Check out the two lists below — release order and chronological order — of every Star Wars movie and series, including live-action and animation, to help you on your Star …'}, {'title': 'Every Star Wars Movie and TV Show, in Release Order', 'href': 'https://www.hollywoodreporter.com/lists/star-wars-movies-tv-shows-release-order/', 'body': ''}, {'title': 'Star Wars - Movies & TV Series Chronological Order', 'href': 'https://www.imdb.com/list/ls072034866/', 'body': 'As the Clone Wars sweep the galaxy, Anakin Skywalker and his new Padawan, Ahsoka Tano, embar

"The exact total number of episodes and movies in the Star Wars franchise can vary. Currently, there are 12 theatrically released Star Wars movies in the main canon. However, if you're also interested in the TV series and animated components, that number will increase substantially.\n\nFor a more detailed breakdown, you can check the following resources:\n- [Star Wars Movies and Series Viewing Guide](https://www.starwars.com/news/star-wars-movies-and-series-guide)\n- [Complete List of STAR WARS Movies - IMDb](https://www.imdb.com/list/ls029559286/) \n\nIf you have specific series or films in mind, please let me know!"

# Latency

Depending on the model (size of the model), provider and some specific parameters, the latency of completion calls can vary a lot.

Let's write a small function to measure latency and test it on OpenAI's and Anthropic's models.

In [51]:
!pip install openai anthropic -q

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/297.2 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m297.0/297.2 kB[0m [31m8.8 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m297.2/297.2 kB[0m [31m6.0 MB/s[0m eta [36m0:00:00[0m
[?25h

In [52]:
import time
import numpy as np

def measure_execution_time(func, n, *args, **kwargs):
    latencies = []

    for _ in range(n):
        start_time = time.time()
        func(*args, **kwargs)
        end_time = time.time()

        latency = end_time - start_time
        latencies.append(latency)

    latencies = np.array(latencies)

    stats = {
        'average_latency': np.mean(latencies),
        'max_latency': np.max(latencies),
        'min_latency': np.min(latencies),
        'std_latency': np.std(latencies)
    }

    return stats

In [55]:
import openai
from google.colab import userdata
openai.api_key = userdata.get('OPENAI_API_KEY')

def get_chatgpt_answer(message: str, model, params={}) -> str:
    chat_completion = openai.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": message}],
        **params
    )
    return chat_completion.choices[0].message.content


from anthropic import Anthropic

client = Anthropic(
    api_key=userdata.get("nebius_api_key")
)

def get_anthropic_answer(message: str, model, params={'max_tokens': 1024}) -> str:
    answer = client.messages.create(
        messages=[
            {
                "role": "user",
                "content": message,
            }
        ],
        model=model,
        **params
    )
    return answer.content[0].text

In [None]:
models_to_test = [
   "gpt-3.5-turbo",
   "gpt-4",
   "gpt-4o",
   'gpt-4o-mini',
   "claude-3-opus-20240229",
   "claude-3-sonnet-20240229",
   "claude-3-haiku-20240307",
   "claude-3-5-sonnet-20240620"
]

for model in models_to_test:
    print("-"*100)
    print(f"Model name {model}")
    if "gpt" in model:
        completion_function = get_chatgpt_answer
    else:
        completion_function = get_anthropic_answer

    print(measure_execution_time(
        completion_function,
        5,
        "What is the name of the cat from Shrek?",
        model,
    ))

    print("-"*100)

----------------------------------------------------------------------------------------------------
Model name gpt-3.5-turbo
{'average_latency': 0.8572634220123291, 'max_latency': 1.2018272876739502, 'min_latency': 0.5498929023742676, 'std_latency': 0.21132660559943792}
----------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------
Model name gpt-4
{'average_latency': 1.3319780826568604, 'max_latency': 1.8591728210449219, 'min_latency': 1.0715179443359375, 'std_latency': 0.2895837058446255}
----------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------
Model name gpt-4o
{'average_latency': 1.4340587615966798, 'max_latency': 1.799485445022583, 'min_latency': 1.1727039813995361, 'std_latency': 0.20461863474660597}

There are also parameters you can change to make latency a bit better. For example, if you want only a short sentence to be generated, you can set max_tokens. This speeds up response time a lot.

In [None]:
for model in models_to_test:
    print("-"*100)
    print(f"Model name {model}")
    if "gpt" in model:
        completion_function = get_chatgpt_answer
    else:
        completion_function = get_anthropic_answer

    print(measure_execution_time(
        completion_function,
        5,
        "What is the name of the cat from Shrek?",
        model,
        params={
            "max_tokens": 10
        }
    ))

    print("-"*100)

----------------------------------------------------------------------------------------------------
Model name gpt-3.5-turbo
{'average_latency': 0.5668250560760498, 'max_latency': 0.8016211986541748, 'min_latency': 0.4170095920562744, 'std_latency': 0.13576578001880066}
----------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------
Model name gpt-4
{'average_latency': 1.2162207126617433, 'max_latency': 1.757826328277588, 'min_latency': 0.9130477905273438, 'std_latency': 0.3195366842789354}
----------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------
Model name gpt-4o
{'average_latency': 0.735747241973877, 'max_latency': 0.9551527500152588, 'min_latency': 0.4709610939025879, 'std_latency': 0.1668785130025617}
-

There are many other factors, which contribute to latency changes. For example lot of other people may be using the same model as you.

In that case `gpt-4o-mini` can be slower than `gpt-4` just because of popularity at a certain time.  



# Summary

This week we've learned:
- How to use LangChain library.
- How to add plugins to an help an LLM excel in more complex tasks.
- How to create a vector database and how to interact with it.
- About LLM latency and what affects it


In this week's homework you'll:
- Learn how to make ChatGPT nail high-school tests.
- Learn to route between different LLMs depending on the task.
- Create you own Gradio app to demo your LLM functionality.