## Why LangChain?

There are times when we blindly accept things presented to us. We all have accepted that `LangChain` is the thing we NEED to use to do anything related to Large Language Models (LLMs). But why `LangChain` is the first question we want to answer in this notebook.

### Working with OpenAI's APIs without LangChain

Let's pick the most popular LLMs in the market, OpenAI. Good folks @ OpenAI have provided a nice python wrapper (`pip install openai`) to their REST endpoints ([link here](https://platform.openai.com/docs/api-reference)). Without `LangChain`, we could work with the models provided easily. Let's see some examples below:

In [10]:
from getpass import getpass
import openai
openai.api_key = getpass(prompt="Add your openai key:")

Add your openai key: ········


In [15]:
models = openai.Model.list()
[model["id"] for model in models["data"][:5]]

['whisper-1',
 'babbage',
 'davinci',
 'text-davinci-edit-001',
 'babbage-code-search-code']

Once we have set the keys, let's do a basic completion task

In [21]:
prompt = "Can you tell me who's the president of the United States of America?"
completion = openai.Completion.create(model="text-davinci-003", prompt=prompt)

In [22]:
completion

<OpenAIObject text_completion id=cmpl-7OXVg6lSFnH9Y2oxmKntB543rX5M1 at 0x117e568e0> JSON: {
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "logprobs": null,
      "text": "\n\nThe current president of the United States is Joe Biden."
    }
  ],
  "created": 1686083040,
  "id": "cmpl-7OXVg6lSFnH9Y2oxmKntB543rX5M1",
  "model": "text-davinci-003",
  "object": "text_completion",
  "usage": {
    "completion_tokens": 13,
    "prompt_tokens": 15,
    "total_tokens": 28
  }
}

Cleaning up the data, we get the below:

In [19]:
completion["choices"][0]["text"]

'\n\nThe current President of the United States of America is Joe Biden.'

If we wanted to work with the latest 3.5 turbo/GPT-4 model, it needs different prompt which is compatible to the chat interface

In [23]:
messages = [
    {
        "role": "system",
        "content": "You are a Dutch language teacher who helps newbies learn Dutch faster. Please converse with the user as a new learner"
    },
    {
        "role": "user",
        "content": "What would be our first learning? Week of the days?"
    }
]
completion = openai.ChatCompletion.create(model="gpt-3.5-turbo", messages=messages)

<OpenAIObject chat.completion id=chatcmpl-7OXVm8IM6fEl5loIl1O2ctwu7HDCq at 0x117cf8900> JSON: {
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "message": {
        "content": "That's a great idea! Let's start with the days of the week. In Dutch, the days of the week are:\n\n- maandag (Monday)\n- dinsdag (Tuesday)\n- woensdag (Wednesday)\n- donderdag (Thursday)\n- vrijdag (Friday)\n- zaterdag (Saturday)\n- zondag (Sunday)\n\nCan you try to pronounce them after me?",
        "role": "assistant"
      }
    }
  ],
  "created": 1686083046,
  "id": "chatcmpl-7OXVm8IM6fEl5loIl1O2ctwu7HDCq",
  "model": "gpt-3.5-turbo-0301",
  "object": "chat.completion",
  "usage": {
    "completion_tokens": 83,
    "prompt_tokens": 48,
    "total_tokens": 131
  }
}

In [24]:
completion["choices"][0]["message"]["content"]

"That's a great idea! Let's start with the days of the week. In Dutch, the days of the week are:\n\n- maandag (Monday)\n- dinsdag (Tuesday)\n- woensdag (Wednesday)\n- donderdag (Thursday)\n- vrijdag (Friday)\n- zaterdag (Saturday)\n- zondag (Sunday)\n\nCan you try to pronounce them after me?"

Now if I've to continue the conversation, I'd have to do a few things: 
1. Save the latest response and append it to `messages`
```python
messages = messages + [{"role": "assistant", "content": completion["choices"][0]["message"]["content"]}]
```
2. Call the same `openai.ChatCompletion.create` function and send them back
3. Rinse and repeat until I exhaust my `2k` context window for `3.5-turbo` and `4k` context window for `gpt-4`

2k and 4k context windows are large, but they also cost a lot when you send each query back. How do I track what's the size of my context window everytime I call openai? Use `tiktoken`, which lets you know how much tokens are you sending to openai for a specific model

In [26]:
import tiktoken
enc = tiktoken.get_encoding("cl100k_base")

# To get the tokeniser corresponding to a specific model in the OpenAI API:
enc = tiktoken.encoding_for_model("gpt-3.5-turbo")

In [28]:
encoded_message = enc.encode("This is a good place")
encoded_message

[2028, 374, 264, 1695, 2035]

In [29]:
enc.decode(encoded_message)

'This is a good place'

In [34]:
f"Total tokens for gpt3.5-turbo --> {len([enc.decode_single_token_bytes(token) for token in encoded_message])}"

'Total tokens for gpt3.5-turbo --> 5'

In [46]:
# Shameless copy-pasta from OpenAI example
num_tokens = 0
tokens_per_message = 4 # every message follows <|start|>{role/name}\n{content}<|end|>\n
tokens_per_name = -1 # if there's a name, the role is omitted
for message in messages:
    num_tokens += tokens_per_message
    for key, value in message.items():
        num_tokens += len(enc.encode(value))
        if key == "name":
            num_tokens += tokens_per_name
num_tokens += 3  # every reply is primed with <|start|>assistant<|message|>
print(f"Number of tokens for latest message: {num_tokens}")

Number of tokens for latest message: 48


In [45]:
assert completion["usage"]["prompt_tokens"] == num_tokens, "Wrong implementation"

Our assert succeeds, but that's still a lot of work! Just to do a basic query. For a fairly robust implementation, we would need a few things:
- Retries, OpenAI APIs are notoriously unstable with queries getting a lot of timeouts
- Caching, You don't want to waste considerable energy to generate a completion for similar query by same/another user
- Stardardized output schema, If your use-case demands a standardized output which could be a json/xml schema you need to invest in all those things.

The above are just basic tasks that we have just mentioned. Phew!
![tired-meme](https://i.kym-cdn.com/entries/icons/original/000/039/399/ddw.jpg)

Also, there's OpenAI but other alternatives like Cohere, Anthropic, Falcon, Llama that everyone would want to at least try out if not use in production. Models like Anthropic's `Claude-Instant-v1` literally blows OpenAI's `gpt3.5-turbo` out of the water ([read here](https://twitter.com/vladquant/status/1659679709154934784))

As we mentioned above; working with LLMs, any engineer/product person would need the ability to iterate fast and have multiple options to try out. `LangChain` is THAT library right now. All the right (almost) abstractions required for LLMs are baked in `LangChain`.

### Working with OpenAI's APIs with LangChain

`LangChain` provides `llms` as the basic construct, helping us to easily swap between models (local and 3rd party)

Let's first try out `OpenAI` wrapper:

In [48]:
from langchain.llms import OpenAI

In [49]:
?OpenAI

[0;31mInit signature:[0m
[0mOpenAI[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0;34m*[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mcache[0m[0;34m:[0m [0mOptional[0m[0;34m[[0m[0mbool[0m[0;34m][0m [0;34m=[0m [0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mverbose[0m[0;34m:[0m [0mbool[0m [0;34m=[0m [0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mcallbacks[0m[0;34m:[0m [0mUnion[0m[0;34m[[0m[0mList[0m[0;34m[[0m[0mlangchain[0m[0;34m.[0m[0mcallbacks[0m[0;34m.[0m[0mbase[0m[0;34m.[0m[0mBaseCallbackHandler[0m[0;34m][0m[0;34m,[0m [0mlangchain[0m[0;34m.[0m[0mcallbacks[0m[0;34m.[0m[0mbase[0m[0;34m.[0m[0mBaseCallbackManager[0m[0;34m,[0m [0mNoneType[0m[0;34m][0m [0;34m=[0m [0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mcallback_manager[0m[0;34m:[0m [0mOptional[0m[0;34m[[0m[0mlangchain[0m[0;34m.[0m[0mcallbacks[0m[0;34m.[0m[0mbase[0m[0;34m.[0m[0mBaseCallbackManager[0m[0;34m][

In [53]:
gpt35_turbo = OpenAI(model_name='text-davinci-003', openai_api_key=open_ai_key)

In [56]:
generation = gpt35_turbo.generate(prompts=[prompt])

In [57]:
generation.generations

[[Generation(text='\n\nThe President of the United States of America is Joe Biden.', generation_info={'finish_reason': 'stop', 'logprobs': None})]]

One could even do a variation of the above:

In [71]:
generation.generations[0][0].text

'\n\nThe President of the United States of America is Joe Biden.'

In [58]:
gpt35_turbo(prompt)

'\n\nThe current president of the United States of America is Joe Biden.'

Simple and Carefree outputs, without parsing through the json outputs that openai provides. Is that it? Nope.

#### Swap 3rd party to local models 
Let's swap OpenAI for a fairly small local model: `flan-t5` from google

In [59]:
from langchain import HuggingFaceHub

In [61]:
hf_token = getpass(prompt="Add huggingface token (Visit -> https://huggingface.co/settings/tokens):")

Add huggingface token (Visit -> https://huggingface.co/settings/tokens): ········


One can search for models here: [huggingface models](https://huggingface.co/models)

In [66]:
flan_t5 = HuggingFaceHub(repo_id="google/flan-t5-small", huggingfacehub_api_token=hf_token)

In [68]:
flan_t5(prompt)

'John F. Kennedy'

Ugggh! It is a fairly bad model, `flan-t5-xxl` might be a better one yet `OpenAI` models triumph the rest. At least we are sure that these models would be available for us if we need local inference or our use-cases are for sensitive data.

__NOTE__: 
- Since Meta "released" Llama weights, there's a been an unending procession of very OpenAI compareable models (Vicuna-13B, Falcon come to mind). But those need a bunch of compute to run off locally or even on platforms like Replicate. So the next time you think about running these models, a Mac M1 Air or even a 3060RTX might not be able to run these due to hardware constraints.
- Not to digress, but there's a class of quantized models released by ggml.ai that run on M1s/M2s at least. More improvement are coming in, but they are still worse off that OpenAI/Anthropic/Cohere.

#### Writing complex prompts with dynamic information?

In [72]:
prompt

"Can you tell me who's the president of the United States of America?"

The `prompt` above is the basic-est example that one can throw at an LLM. In the AI Summer before the cambrian explosion of LLMs, one had to pain-stakingly create models specific to a task.

Want to do English to Dutch translation? Train a model
Want to do nlp classification? Train a model

LLMs kind of let you cheat your way through by just using one model. __One model to rule them all__

![Sauron](https://i0.wp.com/middle-earth.xenite.org/files/2013/12/sauron-and-the-one-ring.jpg?fit=360%2C247&ssl=1)

But, there's a catch, you need to pain-stakingly craft a nice prompt to get a relevant answer. When GPT3 was first released, all the NLP tasks (summarization, QnA, translation) needed a bunch of example to be sent to a prompt. This information stuffing isn't required anymore now but you still need a few ways to pass some information.

`LangChain` with it's `Prompt` construct simplifies this information stuffing helping us to truly write dynamically generated queries whose side-effect is faster iteration.

Let's see a complex example, where I want to generate text on a topic based on how the popular Dragon Ball Z characters would talk. Let's write a prompt for `Vegeta, a character who is egotistical and sarcastic`

In [73]:
vegeta_prompt = "Write 50 words on Global Warming in the tone of Vegeta, a character who is egotistical and sarcastic"

In [74]:
gpt35_turbo(vegeta_prompt)

"\n\n1. Global Warming? Bah, how could a puny planet like this one possibly affect the universe's climate.\n2. Typical humans, thinking they can do anything they please and the universe will remain unchanged.\n3. Global Warming? I suppose it is the least of this planet's problems.\n4. I guess I should be grateful that Global Warming isn't any worse than it is.\n5. If I wanted to destroy the planet, I'd just wait until Global Warming does it for me.\n6. Global Warming? It's like the universe is trying to tell me something, but I'm not sure what.\n7. Don't worry about Global Warming, I'll just use my superior Saiyan strength to fix it.\n8. Global Warming? I don't even have time to think about it, I'm too busy trying to save the universe.\n9. I bet I could stop Global Warming with one glance of my powerful glare.\n10. Global Warming? What a pathetic attempt to ruin the planet. I'll take care of it."

Very impressive! Now if I've to write it in the tone of Gohan who's a nerd and serious kid, I'd have to copy paste a lot of stuff. But with `LangChain`'s `PromptTemplate` we can do a bunch of code optimization easily.



In [75]:
from langchain import PromptTemplate

In [83]:
template = "Write 50 words on Global Warming in the tone of {character}, character who is {personality}"

I can list a bunch of characters in a list and just write a loop!

In [88]:
characters = [
    {"character": "Vegeta", "personality": "egotistical and sarcastic"},
    {"character": "Gohan", "personality": "nerdy and serious"},
    {"character": "Chichi", "personality": "angry and strong woman"},
    {"character": "Bulma", "personality": "scientist and opinionated"}]

In [79]:
?PromptTemplate

[0;31mInit signature:[0m
[0mPromptTemplate[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0;34m*[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0minput_variables[0m[0;34m:[0m [0mList[0m[0;34m[[0m[0mstr[0m[0;34m][0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0moutput_parser[0m[0;34m:[0m [0mOptional[0m[0;34m[[0m[0mlangchain[0m[0;34m.[0m[0mschema[0m[0;34m.[0m[0mBaseOutputParser[0m[0;34m][0m [0;34m=[0m [0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mpartial_variables[0m[0;34m:[0m [0mMapping[0m[0;34m[[0m[0mstr[0m[0;34m,[0m [0mUnion[0m[0;34m[[0m[0mstr[0m[0;34m,[0m [0mCallable[0m[0;34m[[0m[0;34m[[0m[0;34m][0m[0;34m,[0m [0mstr[0m[0;34m][0m[0;34m][0m[0;34m][0m [0;34m=[0m [0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mtemplate[0m[0;34m:[0m [0mstr[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mtemplate_format[0m[0;34m:[0m [0mstr[0m [0;34m=[0m [0;34m'f-string'[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    

In [85]:
prompt = PromptTemplate(input_variables=["character", "personality"], template=template)

In [86]:
from langchain import LLMChain

In [87]:
llm_chain = LLMChain(
    prompt=prompt,
    llm=gpt35_turbo
)

In [90]:
result = llm_chain.generate(characters) # Multiple prompts in one simple function

Retrying langchain.llms.openai.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised RateLimitError: The server had an error while processing your request. Sorry about that!.
Retrying langchain.llms.openai.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised RateLimitError: The server had an error while processing your request. Sorry about that!.
Retrying langchain.llms.openai.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised RateLimitError: The server had an error while processing your request. Sorry about that!.
Retrying langchain.llms.openai.completion_with_retry.<locals>._completion_with_retry in 8.0 seconds as it raised RateLimitError: The server had an error while processing your request. Sorry about that!.
Retrying langchain.llms.openai.completion_with_retry.<locals>._completion_with_retry in 10.0 seconds as it raised RateLimitError: The server had an error while processing your request. Sor

Langchain also retries on my behalf automatically without making me write extra code! How good is that!

In [109]:
[f"{characters[idx]['character']} ::: {gen[0].text}" for idx, gen in enumerate(result.generations)]

 'Gohan ::: \n\nGlobal warming is a serious issue that we must address. We need to reduce emissions of greenhouse gases and increase our use of renewable energy sources such as solar and wind power. We must also take steps to reduce deforestation, as trees are essential for carbon sequestration. We must also reduce our reliance on fossil fuels, as burning them is a major source of greenhouse gases. We must take action now, or the consequences of global warming will be devastating for future generations. We must educate ourselves and others on the causes and effects of global warming, and strive to make a difference. Together, we can make a real impact in reducing global warming.',
 'Chichi ::: \n\nGlobal Warming is a serious issue that needs to be addressed. It is having a devastating effect on the planet and the environment. We need to take drastic measures to reduce emissions and stop this problem from getting worse. We must stop burning fossil fuels and switch to renewable energy so

## Understanding Memory via ChatBot

## Understanding Indexes via Document QnA

## Introduction to Agents and Tools

## Semantic Search via LangChain

## Query csv/excel data via LangChain (plus LlamaIndex)