## Why LangChain?

There are times when we blindly accept things presented to us. We all have accepted that `LangChain` is the thing we NEED to use to do anything related to Large Language Models (LLMs). But why `LangChain` is the first question we want to answer in this notebook.

### Working with OpenAI's APIs without LangChain

Let's pick the most popular LLMs in the market, OpenAI. Good folks @ OpenAI have provided a nice python wrapper (`pip install openai`) to their REST endpoints ([link here](https://platform.openai.com/docs/api-reference)). Without `LangChain`, we could work with the models provided easily. Let's see some examples below:

In [28]:
from getpass import getpass
import openai
openai.api_key = getpass(prompt="Add your openai key:")

Add your openai key: ········


In [30]:
open_ai_key = openai.api_key

In [15]:
models = openai.Model.list()
[model["id"] for model in models["data"][:5]]

['whisper-1',
 'babbage',
 'davinci',
 'text-davinci-edit-001',
 'babbage-code-search-code']

Once we have set the keys, let's do a basic completion task

In [21]:
prompt = "Can you tell me who's the president of the United States of America?"
completion = openai.Completion.create(model="text-davinci-003", prompt=prompt)

In [22]:
completion

<OpenAIObject text_completion id=cmpl-7OXVg6lSFnH9Y2oxmKntB543rX5M1 at 0x117e568e0> JSON: {
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "logprobs": null,
      "text": "\n\nThe current president of the United States is Joe Biden."
    }
  ],
  "created": 1686083040,
  "id": "cmpl-7OXVg6lSFnH9Y2oxmKntB543rX5M1",
  "model": "text-davinci-003",
  "object": "text_completion",
  "usage": {
    "completion_tokens": 13,
    "prompt_tokens": 15,
    "total_tokens": 28
  }
}

Cleaning up the data, we get the below:

In [19]:
completion["choices"][0]["text"]

'\n\nThe current President of the United States of America is Joe Biden.'

If we wanted to work with the latest 3.5 turbo/GPT-4 model, it needs different prompt which is compatible to the chat interface

In [23]:
messages = [
    {
        "role": "system",
        "content": "You are a Dutch language teacher who helps newbies learn Dutch faster. Please converse with the user as a new learner"
    },
    {
        "role": "user",
        "content": "What would be our first learning? Week of the days?"
    }
]
completion = openai.ChatCompletion.create(model="gpt-3.5-turbo", messages=messages)

<OpenAIObject chat.completion id=chatcmpl-7OXVm8IM6fEl5loIl1O2ctwu7HDCq at 0x117cf8900> JSON: {
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "message": {
        "content": "That's a great idea! Let's start with the days of the week. In Dutch, the days of the week are:\n\n- maandag (Monday)\n- dinsdag (Tuesday)\n- woensdag (Wednesday)\n- donderdag (Thursday)\n- vrijdag (Friday)\n- zaterdag (Saturday)\n- zondag (Sunday)\n\nCan you try to pronounce them after me?",
        "role": "assistant"
      }
    }
  ],
  "created": 1686083046,
  "id": "chatcmpl-7OXVm8IM6fEl5loIl1O2ctwu7HDCq",
  "model": "gpt-3.5-turbo-0301",
  "object": "chat.completion",
  "usage": {
    "completion_tokens": 83,
    "prompt_tokens": 48,
    "total_tokens": 131
  }
}

In [24]:
completion["choices"][0]["message"]["content"]

"That's a great idea! Let's start with the days of the week. In Dutch, the days of the week are:\n\n- maandag (Monday)\n- dinsdag (Tuesday)\n- woensdag (Wednesday)\n- donderdag (Thursday)\n- vrijdag (Friday)\n- zaterdag (Saturday)\n- zondag (Sunday)\n\nCan you try to pronounce them after me?"

Now if I've to continue the conversation, I'd have to do a few things: 
1. Save the latest response and append it to `messages`
```python
messages = messages + [{"role": "assistant", "content": completion["choices"][0]["message"]["content"]}]
```
2. Call the same `openai.ChatCompletion.create` function and send them back
3. Rinse and repeat until I exhaust my `2k` context window for `3.5-turbo` and `4k` context window for `gpt-4`

2k and 4k context windows are large, but they also cost a lot when you send each query back. How do I track what's the size of my context window everytime I call openai? Use `tiktoken`, which lets you know how much tokens are you sending to openai for a specific model

In [26]:
import tiktoken
enc = tiktoken.get_encoding("cl100k_base")

# To get the tokeniser corresponding to a specific model in the OpenAI API:
enc = tiktoken.encoding_for_model("gpt-3.5-turbo")

In [28]:
encoded_message = enc.encode("This is a good place")
encoded_message

[2028, 374, 264, 1695, 2035]

In [29]:
enc.decode(encoded_message)

'This is a good place'

In [34]:
f"Total tokens for gpt3.5-turbo --> {len([enc.decode_single_token_bytes(token) for token in encoded_message])}"

'Total tokens for gpt3.5-turbo --> 5'

In [46]:
# Shameless copy-pasta from OpenAI example
num_tokens = 0
tokens_per_message = 4 # every message follows <|start|>{role/name}\n{content}<|end|>\n
tokens_per_name = -1 # if there's a name, the role is omitted
for message in messages:
    num_tokens += tokens_per_message
    for key, value in message.items():
        num_tokens += len(enc.encode(value))
        if key == "name":
            num_tokens += tokens_per_name
num_tokens += 3  # every reply is primed with <|start|>assistant<|message|>
print(f"Number of tokens for latest message: {num_tokens}")

Number of tokens for latest message: 48


In [45]:
assert completion["usage"]["prompt_tokens"] == num_tokens, "Wrong implementation"

Our assert succeeds, but that's still a lot of work! Just to do a basic query. For a fairly robust implementation, we would need a few things:
- Retries, OpenAI APIs are notoriously unstable with queries getting a lot of timeouts
- Caching, You don't want to waste considerable energy to generate a completion for similar query by same/another user
- Stardardized output schema, If your use-case demands a standardized output which could be a json/xml schema you need to invest in all those things.

The above are just basic tasks that we have just mentioned. Phew!
![tired-meme](https://i.kym-cdn.com/entries/icons/original/000/039/399/ddw.jpg)

Also, there's OpenAI but other alternatives like Cohere, Anthropic, Falcon, Llama that everyone would want to at least try out if not use in production. Models like Anthropic's `Claude-Instant-v1` literally blows OpenAI's `gpt3.5-turbo` out of the water ([read here](https://twitter.com/vladquant/status/1659679709154934784))

As we mentioned above; working with LLMs, any engineer/product person would need the ability to iterate fast and have multiple options to try out. `LangChain` is THAT library right now. All the right (almost) abstractions required for LLMs are baked in `LangChain`.

### Working with OpenAI's APIs with LangChain

`LangChain` provides `llms` as the basic construct, helping us to easily swap between models (local and 3rd party)

Let's first try out `OpenAI` wrapper:

In [25]:
from langchain.llms import OpenAI

In [26]:
?OpenAI

[0;31mInit signature:[0m
[0mOpenAI[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0;34m*[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mcache[0m[0;34m:[0m [0mOptional[0m[0;34m[[0m[0mbool[0m[0;34m][0m [0;34m=[0m [0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mverbose[0m[0;34m:[0m [0mbool[0m [0;34m=[0m [0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mcallbacks[0m[0;34m:[0m [0mUnion[0m[0;34m[[0m[0mList[0m[0;34m[[0m[0mlangchain[0m[0;34m.[0m[0mcallbacks[0m[0;34m.[0m[0mbase[0m[0;34m.[0m[0mBaseCallbackHandler[0m[0;34m][0m[0;34m,[0m [0mlangchain[0m[0;34m.[0m[0mcallbacks[0m[0;34m.[0m[0mbase[0m[0;34m.[0m[0mBaseCallbackManager[0m[0;34m,[0m [0mNoneType[0m[0;34m][0m [0;34m=[0m [0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mcallback_manager[0m[0;34m:[0m [0mOptional[0m[0;34m[[0m[0mlangchain[0m[0;34m.[0m[0mcallbacks[0m[0;34m.[0m[0mbase[0m[0;34m.[0m[0mBaseCallbackManager[0m[0;34m][

In [53]:
gpt35 = OpenAI(model_name='text-davinci-003', openai_api_key=open_ai_key)

In [56]:
generation = gpt35.generate(prompts=[prompt])

In [57]:
generation.generations

[[Generation(text='\n\nThe President of the United States of America is Joe Biden.', generation_info={'finish_reason': 'stop', 'logprobs': None})]]

One could even do a variation of the above:

In [71]:
generation.generations[0][0].text

'\n\nThe President of the United States of America is Joe Biden.'

In [58]:
gpt35(prompt)

'\n\nThe current president of the United States of America is Joe Biden.'

Simple and Carefree outputs, without parsing through the json outputs that openai provides. Is that it? Nope.

#### Swap 3rd party to local models 
Let's swap OpenAI for a fairly small local model: `flan-t5` from google

In [59]:
from langchain import HuggingFaceHub

In [61]:
hf_token = getpass(prompt="Add huggingface token (Visit -> https://huggingface.co/settings/tokens):")

Add huggingface token (Visit -> https://huggingface.co/settings/tokens): ········


One can search for models here: [huggingface models](https://huggingface.co/models)

In [66]:
flan_t5 = HuggingFaceHub(repo_id="google/flan-t5-small", huggingfacehub_api_token=hf_token)

In [68]:
flan_t5(prompt)

'John F. Kennedy'

Ugggh! It is a fairly bad model, `flan-t5-xxl` might be a better one yet `OpenAI` models triumph the rest. At least we are sure that these models would be available for us if we need local inference or our use-cases are for sensitive data.

__NOTE__: 
- Since Meta "released" Llama weights, there's a been an unending procession of very OpenAI compareable models (Vicuna-13B, Falcon come to mind). But those need a bunch of compute to run off locally or even on platforms like Replicate. So the next time you think about running these models, a Mac M1 Air or even a 3060RTX might not be able to run these due to hardware constraints.
- Not to digress, but there's a class of quantized models released by ggml.ai that run on M1s/M2s at least. More improvement are coming in, but they are still worse off that OpenAI/Anthropic/Cohere.

#### Writing complex prompts with dynamic information?

In [72]:
prompt

"Can you tell me who's the president of the United States of America?"

The `prompt` above is the basic-est example that one can throw at an LLM. In the AI Summer before the cambrian explosion of LLMs, one had to pain-stakingly create models specific to a task.

Want to do English to Dutch translation? Train a model
Want to do nlp classification? Train a model

LLMs kind of let you cheat your way through by just using one model. __One model to rule them all__

![Sauron](https://i0.wp.com/middle-earth.xenite.org/files/2013/12/sauron-and-the-one-ring.jpg?fit=360%2C247&ssl=1)

But, there's a catch, you need to pain-stakingly craft a nice prompt to get a relevant answer. When GPT3 was first released, all the NLP tasks (summarization, QnA, translation) needed a bunch of example to be sent to a prompt. This information stuffing isn't required anymore now but you still need a few ways to pass some information.

`LangChain` with it's `Prompt` construct simplifies this information stuffing helping us to truly write dynamically generated queries whose side-effect is faster iteration.

Let's see a complex example, where I want to generate text on a topic based on how the popular Dragon Ball Z characters would talk. Let's write a prompt for `Vegeta, a character who is egotistical and sarcastic`

In [73]:
vegeta_prompt = "Write 50 words on Global Warming in the tone of Vegeta, a character who is egotistical and sarcastic"

In [74]:
gpt35_turbo(vegeta_prompt)

"\n\n1. Global Warming? Bah, how could a puny planet like this one possibly affect the universe's climate.\n2. Typical humans, thinking they can do anything they please and the universe will remain unchanged.\n3. Global Warming? I suppose it is the least of this planet's problems.\n4. I guess I should be grateful that Global Warming isn't any worse than it is.\n5. If I wanted to destroy the planet, I'd just wait until Global Warming does it for me.\n6. Global Warming? It's like the universe is trying to tell me something, but I'm not sure what.\n7. Don't worry about Global Warming, I'll just use my superior Saiyan strength to fix it.\n8. Global Warming? I don't even have time to think about it, I'm too busy trying to save the universe.\n9. I bet I could stop Global Warming with one glance of my powerful glare.\n10. Global Warming? What a pathetic attempt to ruin the planet. I'll take care of it."

Very impressive! Now if I've to write it in the tone of Gohan who's a nerd and serious kid, I'd have to copy paste a lot of stuff. But with `LangChain`'s `PromptTemplate` we can do a bunch of code optimization easily.



In [2]:
from langchain import PromptTemplate

In [83]:
template = "Write 50 words on Global Warming in the tone of {character}, character who is {personality}"

I can list a bunch of characters in a list and just write a loop!

In [88]:
characters = [
    {"character": "Vegeta", "personality": "egotistical and sarcastic"},
    {"character": "Gohan", "personality": "nerdy and serious"},
    {"character": "Chichi", "personality": "angry and strong woman"},
    {"character": "Bulma", "personality": "scientist and opinionated"}]

In [79]:
?PromptTemplate

[0;31mInit signature:[0m
[0mPromptTemplate[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0;34m*[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0minput_variables[0m[0;34m:[0m [0mList[0m[0;34m[[0m[0mstr[0m[0;34m][0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0moutput_parser[0m[0;34m:[0m [0mOptional[0m[0;34m[[0m[0mlangchain[0m[0;34m.[0m[0mschema[0m[0;34m.[0m[0mBaseOutputParser[0m[0;34m][0m [0;34m=[0m [0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mpartial_variables[0m[0;34m:[0m [0mMapping[0m[0;34m[[0m[0mstr[0m[0;34m,[0m [0mUnion[0m[0;34m[[0m[0mstr[0m[0;34m,[0m [0mCallable[0m[0;34m[[0m[0;34m[[0m[0;34m][0m[0;34m,[0m [0mstr[0m[0;34m][0m[0;34m][0m[0;34m][0m [0;34m=[0m [0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mtemplate[0m[0;34m:[0m [0mstr[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mtemplate_format[0m[0;34m:[0m [0mstr[0m [0;34m=[0m [0;34m'f-string'[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    

In [85]:
prompt = PromptTemplate(input_variables=["character", "personality"], template=template)

In [86]:
from langchain import LLMChain

In [87]:
llm_chain = LLMChain(
    prompt=prompt,
    llm=gpt35_turbo
)

In [90]:
result = llm_chain.generate(characters) # Multiple prompts in one simple function

Retrying langchain.llms.openai.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised RateLimitError: The server had an error while processing your request. Sorry about that!.
Retrying langchain.llms.openai.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised RateLimitError: The server had an error while processing your request. Sorry about that!.
Retrying langchain.llms.openai.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised RateLimitError: The server had an error while processing your request. Sorry about that!.
Retrying langchain.llms.openai.completion_with_retry.<locals>._completion_with_retry in 8.0 seconds as it raised RateLimitError: The server had an error while processing your request. Sorry about that!.
Retrying langchain.llms.openai.completion_with_retry.<locals>._completion_with_retry in 10.0 seconds as it raised RateLimitError: The server had an error while processing your request. Sor

Langchain also retries on my behalf automatically without making me write extra code! How good is that!

In [109]:
[f"{characters[idx]['character']} ::: {gen[0].text}" for idx, gen in enumerate(result.generations)]

 'Gohan ::: \n\nGlobal warming is a serious issue that we must address. We need to reduce emissions of greenhouse gases and increase our use of renewable energy sources such as solar and wind power. We must also take steps to reduce deforestation, as trees are essential for carbon sequestration. We must also reduce our reliance on fossil fuels, as burning them is a major source of greenhouse gases. We must take action now, or the consequences of global warming will be devastating for future generations. We must educate ourselves and others on the causes and effects of global warming, and strive to make a difference. Together, we can make a real impact in reducing global warming.',
 'Chichi ::: \n\nGlobal Warming is a serious issue that needs to be addressed. It is having a devastating effect on the planet and the environment. We need to take drastic measures to reduce emissions and stop this problem from getting worse. We must stop burning fossil fuels and switch to renewable energy so

#### Prompt templates for chat based models (eg: gpt3.5-turbo/gpt4)

GPT3.5/GPT4 introduced a chat based model, that has 3 roles: `user`, `assistant` and `system`. `LangChain` provides primitives to help formulating a query

In [6]:
from langchain.prompts import (
    ChatPromptTemplate,
    PromptTemplate,
    SystemMessagePromptTemplate,
    AIMessagePromptTemplate,
    HumanMessagePromptTemplate,
)
from langchain.schema import (
    AIMessage,
    HumanMessage,
    SystemMessage
)

Let's use the prompt we used earlier to continue our Dutch conversations

In [7]:
messages = [
    {
        "role": "system",
        "content": "You are a Dutch language teacher who helps newbies learn Dutch faster. Please converse with the user as a new learner"
    },
    {
        "role": "user",
        "content": "What would be our first learning? Week of the days?"
    }
]

In [14]:
system_template = "You are a Dutch language teacher who helps newbies learn Dutch faster. Please converse with the user as a new learner"

In [18]:
system_msg = SystemMessage(content=system_template)

In [19]:
system_msg

SystemMessage(content='You are a Dutch language teacher who helps newbies learn Dutch faster. Please converse with the user as a new learner', additional_kwargs={})

In [54]:
human_message = HumanMessagePromptTemplate.from_template(template="{question}")

In [55]:
chat_template = ChatPromptTemplate.from_messages([system_msg, human_message])

In [56]:
chat_template

ChatPromptTemplate(input_variables=['question'], output_parser=None, partial_variables={}, messages=[SystemMessage(content='You are a Dutch language teacher who helps newbies learn Dutch faster. Please converse with the user as a new learner', additional_kwargs={}), HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['question'], output_parser=None, partial_variables={}, template='{question}', template_format='f-string', validate_template=True), additional_kwargs={})])

In [34]:
from langchain.chat_models import ChatOpenAI

In [59]:
gpt35_turbo = ChatOpenAI(model_name='gpt-3.5-turbo', openai_api_key=open_ai_key)

In [45]:
from langchain import LLMChain

In [60]:
chain = LLMChain(llm=gpt35_turbo, prompt=chat_template)

In [61]:
chain.run([{"question": "What are the days of a week called?"}])

'De dagen van de week in het Nederlands zijn:\n\n- Maandag\n- Dinsdag\n- Woensdag\n- Donderdag\n- Vrijdag\n- Zaterdag\n- Zondag\n\nHeb je nog andere vragen over de Nederlandse taal?'

If we templatize the system message as well, we can put multiple languages:

In [81]:
system_template = "You are a language teacher who helps newbies learn new faster. Please converse with the user as a new learner and explaing in English"

In [83]:
system_msg = SystemMessage(content=system_template)

In [85]:
human_message = HumanMessagePromptTemplate.from_template(template="{question}. Provide example in {language}")

In [89]:
chat_template = ChatPromptTemplate.from_messages([system_msg, human_message])

In [90]:
chat_template

ChatPromptTemplate(input_variables=['question', 'language'], output_parser=None, partial_variables={}, messages=[SystemMessage(content='You are a language teacher who helps newbies learn new faster. Please converse with the user as a new learner and explaing in English', additional_kwargs={}), HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['language', 'question'], output_parser=None, partial_variables={}, template='{question}. Provide example in {language}', template_format='f-string', validate_template=True), additional_kwargs={})])

In [92]:
chain = LLMChain(llm=gpt35_turbo, prompt=chat_template)

In [94]:
chain.run(
    {"language": "Dutch", "question": "What are the days of a week called?"}
)

'The days of the week in English are:\n\n1. Monday\n2. Tuesday\n3. Wednesday\n4. Thursday\n5. Friday\n6. Saturday\n7. Sunday\n\nIn Dutch, they are called:\n\n1. Maandag\n2. Dinsdag\n3. Woensdag\n4. Donderdag\n5. Vrijdag\n6. Zaterdag\n7. Zondag\n\nFor example, if you want to say "Today is Monday" in Dutch, you would say "Vandaag is het maandag".'

If I wanted to have the message in a certain format, there are various parsers available ([here](https://python.langchain.com/en/latest/modules/prompts/output_parsers.html))

We'd be using the most common format: `json` to get data in a certain format

In [95]:
from langchain.output_parsers import StructuredOutputParser, ResponseSchema

`ResponseSchema` can be considered as the output value that you would want to have. To take an example:
```json
{"team": "Arsenal", "player_name": "Bukayo Saka"}
```

In [98]:
response_schemas = [
    ResponseSchema(name="team", description="Team played for"),
    ResponseSchema(name="player_name", description="Player name in the question")
]

In [99]:
output_parser = StructuredOutputParser.from_response_schemas(response_schemas)

In [100]:
format_instructions = output_parser.get_format_instructions()

In [101]:
format_instructions

'The output should be a markdown code snippet formatted in the following schema, including the leading and trailing "\\`\\`\\`json" and "\\`\\`\\`":\n\n```json\n{\n\t"team": string  // Team played for\n\t"player_name": string  // Player name in the question\n}\n```'

In [102]:
system_msg = SystemMessage(content="You are a knowledgeable football fan")

In [125]:
human_msg = HumanMessagePromptTemplate.from_template(
    "{format_instructions}\n{question}", 
    partial_variables={"format_instructions": format_instructions})

In [126]:
chat_template = ChatPromptTemplate.from_messages([system_msg, human_msg])

In [127]:
chain = LLMChain(llm=gpt35_turbo, prompt=chat_template)

In [130]:
result = chain.run({"question": "Which team does Bukayo Saka play for in the Premier League?", "format_instructions": format_instructions})

In [131]:
output_parser.parse(result)

{'team': 'Arsenal', 'player_name': 'Bukayo Saka'}

In [132]:
result = chain.run(
    {"question": "Which team does Bukayo Saka play for on the international level?", 
     "format_instructions": format_instructions})
output_parser.parse(result)

{'team': 'England', 'player_name': 'Bukayo Saka'}

All the above are smaller building blocks that `LangChain` provides. Now if we want to make it an actual chat application there's a bunch of work to do. 

`LangChain` makes it easy to add memory to your chain. We'll be looking at that in the next section

## Understanding Memory via ChatBot

We'll continue the learning Dutch conversation here

In [135]:
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationChain

In [173]:
gpt35_turbo = ChatOpenAI(
    model_name='gpt-3.5-turbo', 
    openai_api_key=open_ai_key
)

In [175]:
template = """You are a Dutch Language teacher. You help newbie learn Dutch in an easy, fun way with explainations in English

{chat_history}
You: {input}
DLT:"""
prompt = PromptTemplate(input_variables=["chat_history", "input"], template=template)

In [176]:
memory = ConversationBufferMemory(memory_key="chat_history")

In [177]:
conversation = ConversationChain(
    llm=gpt35_turbo,
    prompt=prompt,
    verbose=True,
    memory=memory
)

In [178]:
conversation.predict(input="What are the days of a week called?")



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mYou are a Dutch Language teacher. You help newbie learn Dutch in an easy, fun way with explainations in English


You: What are the days of a week called?
DLT:[0m

[1m> Finished chain.[0m


"The days of the week in Dutch are as follows:\n\n- Maandag (Monday)\n- Dinsdag (Tuesday)\n- Woensdag (Wednesday)\n- Donderdag (Thursday)\n- Vrijdag (Friday)\n- Zaterdag (Saturday)\n- Zondag (Sunday)\n\nIt's important to note that the days of the week in Dutch are not capitalized unless they begin a sentence or are used in a title."

In [179]:
conversation.predict(input="How can I pronounce them?")



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mYou are a Dutch Language teacher. You help newbie learn Dutch in an easy, fun way with explainations in English

Human: What are the days of a week called?
AI: The days of the week in Dutch are as follows:

- Maandag (Monday)
- Dinsdag (Tuesday)
- Woensdag (Wednesday)
- Donderdag (Thursday)
- Vrijdag (Friday)
- Zaterdag (Saturday)
- Zondag (Sunday)

It's important to note that the days of the week in Dutch are not capitalized unless they begin a sentence or are used in a title.
You: How can I pronounce them?
DLT:[0m

[1m> Finished chain.[0m


'Here are the pronunciations of the days of the week in Dutch:\n\n- Maandag: mahn-dahg\n- Dinsdag: dinz-dahg\n- Woensdag: wuhns-dahg\n- Donderdag: dahn-duhr-dahg\n- Vrijdag: vry-dahg\n- Zaterdag: zah-tuhr-dahg\n- Zondag: zohn-dahg\n\nRemember to emphasize the first syllable of each day, and try to pronounce the "g" sound at the end of each word with a slight guttural sound.'

As you can observe above, the bot that we created understands the context of the 2nd question. This is a very basic implementation that can work well till you exhaust your response token quota.

To get over that you can use `ConverstationBufferWindowMemory` which uses only the last `k` interactions or move to a more expansive set of memory: vector embeddings-based stores with example wrappers available for:
- Libraries: FAISS, Annoy
- Open Source Vector DBs: TryChroma, Weaviate, Qdrant
- Cloud Vector DBs: Pinecone, Qdrant & Weviate (yes)
- Mixed usage products: Redis

Some resources to get an overview about vectors:
- [Nirant's tweet thread about Vector DBs](https://twitter.com/NirantK/status/1644290469915164672)
- [Weviate's blog about Vector DBs vs Vector Libraries](https://weaviate.io/blog/vector-library-vs-vector-database)
- [About Vector Embeddings by Weviate](https://weaviate.io/blog/vector-embeddings-explained)

We'll be using a low overhead implementation of vectorstore initally by using FAISS. Since it doesn't provide storage in-built, we'll use the `InMemoryDocstore` provided by langchain

In [180]:
from langchain.docstore import InMemoryDocstore

In [188]:
import faiss
from langchain.vectorstores import FAISS

We build an index of vector embeddings based on the conversations. In simple words, we'll be converting all the converstation (text) to an embeddings (series of numbers) through a third party embedding model (OpenAI provides one, there's other models as well which are open sourced)

In [183]:
from langchain.embeddings import OpenAIEmbeddings

In [190]:
embedding_size = 1536

How do we know the `embedding_size`?
![]()

In [192]:
index = faiss.IndexFlatL2(embedding_size)
embedding_fn = OpenAIEmbeddings(openai_api_key=open_ai_key).embed_query
vectorstore = FAISS(embedding_fn, index, InMemoryDocstore({}), {})

In [193]:
embedding_fn # All the embeddings would be generated via this function

<bound method OpenAIEmbeddings.embed_query of OpenAIEmbeddings(client=<class 'openai.api_resources.embedding.Embedding'>, model='text-embedding-ada-002', deployment='text-embedding-ada-002', openai_api_version=None, openai_api_base=None, openai_api_type=None, openai_proxy=None, embedding_ctx_length=8191, openai_api_key='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', openai_organization=None, allowed_special=set(), disallowed_special='all', chunk_size=1000, max_retries=6, request_timeout=None, headers=None)>

In [194]:
vectorstore # Where the embeddings would be stored in an index

<langchain.vectorstores.faiss.FAISS at 0x136b4f220>

In [195]:
index # The faiss index object

<faiss.swigfaiss.IndexFlatL2; proxy of <Swig Object of type 'faiss::IndexFlatL2 *' at 0x136b4f870> >

Similar to the previous example, where we had `ConversationBufferMemory` to save our chat history. Here, we'll use `VectorStoreRetrieverMemory`

In [196]:
from langchain.memory import VectorStoreRetrieverMemory

In [199]:
retriever = vectorstore.as_retriever(search_kwargs=dict(k=1))
memory = VectorStoreRetrieverMemory(retriever=retriever, memory_key="chat_history")

Let's do a test run (example from the langchain docs)

In [202]:
memory.save_context({"input": "My favorite food is pizza"}, {"output": "thats good to know"})

In [204]:
memory.load_memory_variables({"prompt": "what's my favourite food?"})

{'chat_history': 'input: My favorite food is pizza\noutput: thats good to know'}

The magic of embeddings!

We'll create another conversation chain

In [200]:
conversation = ConversationChain(
    llm=gpt35_turbo,
    prompt=prompt,
    verbose=True,
    memory=memory
)

In [206]:
conversation.predict(input="What are the days of a week called?")



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mYou are a Dutch Language teacher. You help newbie learn Dutch in an easy, fun way with explainations in English

input: My favorite food is pizza
output: thats good to know
You: What are the days of a week called?
DLT:[0m

[1m> Finished chain.[0m


'The days of the week in Dutch are maandag (Monday), dinsdag (Tuesday), woensdag (Wednesday), donderdag (Thursday), vrijdag (Friday), zaterdag (Saturday), and zondag (Sunday).'

In [207]:
conversation.predict(input="How can I pronounce them?")



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mYou are a Dutch Language teacher. You help newbie learn Dutch in an easy, fun way with explainations in English

input: What are the days of a week called?
response: The days of the week in Dutch are maandag (Monday), dinsdag (Tuesday), woensdag (Wednesday), donderdag (Thursday), vrijdag (Friday), zaterdag (Saturday), and zondag (Sunday).
You: How can I pronounce them?
DLT:[0m

[1m> Finished chain.[0m


'Sure! Here\'s how to pronounce each day of the week:\n\n- maandag: mahn-dahg\n- dinsdag: dins-dahg\n- woensdag: woon-sdahg\n- donderdag: dahn-der-dahg\n- vrijdag: vry-dahg\n- zaterdag: zah-ter-dahg\n- zondag: zohn-dahg\n\nRemember, the "g" at the end of each day is pronounced like the "ch" in the Scottish word "loch".'

In [208]:
conversation.predict(input="What's my favourite food?")



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mYou are a Dutch Language teacher. You help newbie learn Dutch in an easy, fun way with explainations in English

input: My favorite food is pizza
output: thats good to know
You: What's my favourite food?
DLT:[0m

[1m> Finished chain.[0m


'Your favorite food is pizza. \n\nIn Dutch, it would be: Jouw favoriete eten is pizza.'

It gave me an answer from memory, as well as a good Dutch answer. Coooool!

## Going deep with Cloud Vector Store using Document QnA

Using Faiss is all good, but for storing all the documents permanently we need a proper database. We can go both ways, using local vector dbs or cloud providers (Weviate, Pinecone).

Using local vector dbs, usually mean an extra component to setup. For our usecase, we'll use the free tier for Weviate. Most of the cloud providers have a free-tier (Pinecone, Qdrant provide one).

First task is to create a free db on [Weviate Cloud](https://console.weaviate.cloud/)
![create-a-cluster](./assets/create-cluster.png)

Check the details here, we need to pick the `Cluster url`
![cluster-details](./assets/cluster-details.png)

In [217]:
WEAVIATE_URL = getpass("WEAVIATE_URL:")

WEAVIATE_URL: ········


API Key will be separate for each cluster. There's another level of authorization and authentication that can be done with weaviate, but out of scope for this notebook.

Fetch the key by clicking on `Enabled Authentication` on the screen above:
![auth-enabled](./assets/auth-enabled.png)

In [218]:
import os
os.environ["WEAVIATE_API_KEY"] = getpass("WEAVIATE_API_KEY:")

WEAVIATE_API_KEY: ········


In [219]:
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Weaviate
from langchain.document_loaders import TextLoader

Let's quickly load up a pdf and send data to weaviate, to check if we have the connection string and api key correct

__Sidenote__: There's a bunch of data loaders available with `langchain`: [details here](https://python.langchain.com/en/latest/modules/indexes/document_loaders.html)

We'll be using pdf data loader for a book on Elixir

In [220]:
book_path = "./assets/elixir_patterns.pdf"

In [221]:
from langchain.document_loaders import PyPDFLoader

In [222]:
loader = PyPDFLoader(book_path)
pages = loader.load_and_split()

In [225]:
pages[10: 12]

[Document(page_content='①A new b key is added to the provided map.\n②The initial map only contains an a key.\n③Our data prior to calling the do_work function.\n④Our data after to calling the do_work function.\nAs you can see from marker 4, the contents of my_data remain unchanged. This is the\nbeauty of immutability in action. The state of map bound to my_data remains\nunchanged and so we can confidently pass data from function to function without\nworrying about side effects. This in turn allows us to create programs composed\nprimarily of pure functions (functions without side effects).\nThe relevance of this when it comes to data structures in Erlang and Elixir is that all of\nthe data structures that we have access to must be immutable. Like many things in\nSoftware Engineering (and engineering in general), this does not come without its\ntrade-offs. In a naive implementation of a run-time that supports immutable data\nstructures, every time a piece of data is acted upon, a full co

#### Add section on `load_qa_chain

#### Do `RetrievalChain` and `ConversationalRetrievalChain

## Semantic Search via LangChain

## Query csv/excel data via LangChain (plus LlamaIndex)

## Introduction to Agents and Tools