<a href="https://colab.research.google.com/github/shahzaibkhan/learning-langchain/blob/main/Learning_LangChain_02.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Learning Models with LangChain**

A model is a program which is trained to complete a specific task.

Note:
- For current case, we be using Language Models.
- We are not going to train a model
- We will only be using trained models.

These pre-trained models are trained on large amounts of data and require a lot of compute to run and thus are called Large Language Models (LLM).



## **LLM (Large Language Model)**

Now let's talk about LLM. LLM are trained to do language tasks like text generation. There are various LLM in the market but we are going to cover only OpenAI.

So lets dive in directly learning how to use models with langchain.

Let's install necessary libraries:


In [1]:
!pip install langchain
!pip install openai

# The tiktoken package is a Byte Pair Encoding tokenizer. It is used with the OpenAI models and breaks down text into tokens.
# This is used because the provided strings are sometimes a bit long for the specified OpenAI model.
# So, it splits the text and encodes them into tokens. Now, let’s work on the main project.
!pip install tiktoken

Collecting langchain
  Downloading langchain-0.0.222-py3-none-any.whl (1.2 MB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/1.2 MB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━[0m [32m0.8/1.2 MB[0m [31m22.8 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.2/1.2 MB[0m [31m25.0 MB/s[0m eta [36m0:00:00[0m
Collecting dataclasses-json<0.6.0,>=0.5.7 (from langchain)
  Downloading dataclasses_json-0.5.9-py3-none-any.whl (26 kB)
Collecting langchainplus-sdk>=0.0.17 (from langchain)
  Downloading langchainplus_sdk-0.0.20-py3-none-any.whl (25 kB)
Collecting openapi-schema-pydantic<2.0,>=1.2 (from langchain)
  Downloading openapi_schema_pydantic-1.2.4-py3-none-any.whl (90 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m90.0/90.0 kB[0m [31m12.2 MB/s[0m eta [36m0:00:00[0m
Collecting marshmallow<4.0.0,>=3.3.0 (from dataclasses-json<0

In [2]:
# Lets set the OPEN API KEY
import os
os.environ["OPENAI_API_KEY"]="YOUR_API_KEY"

To use a model here is how it is done:

In [3]:
from langchain.llms import OpenAI
llm = OpenAI(temperature=1)
print(llm("Write a positive quote about life"))



"Life is not what happens to us, it's how we react to what happens—how we choose to see it, think about it, and feel about it." - Roy T. Bennett


### **Estimating number of tokens**

OpenAI models have a context length limiting the size of input data which can be sent to the model. Thus we need to make sure the input text is below that limit before sending to the model. We can do that token calculation using the code below:

In [4]:
llm.get_num_tokens("Write a positive quote about life")

6

### **Streaming**

Streaming is an important concept in LLM which allows you to display output on the go instead of waiting for the full output. Even in the ChatGPT interface you will see content streamed instead of waiting till entire output is generated

Here is a code example for the same. We handle streaming in langchain using a callback handler:

In [5]:
from langchain.llms import OpenAI
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
llm = OpenAI(streaming=True, callbacks=[StreamingStdOutCallbackHandler()], temperature=0)
resp = llm("Write me a poem about life")



Life is a journey, a winding road
Full of surprises, both good and bad
It's a roller coaster, a wild ride
Sometimes you'll soar, sometimes you'll slide

Life is a challenge, a test of will
It's a game of chance, a chance to thrill
It's a chance to learn, to grow and to thrive
It's a chance to make a difference, to stay alive

Life is a blessing, a gift from above
It's a chance to love, to laugh and to live
It's a chance to make a mark, to leave a legacy
It's a chance to make a difference, to be free

Life is a journey, a winding road
Full of surprises, both good and bad
It's a roller coaster, a wild ride
But if you take the time, you'll find the joy inside.

## **Chat Models**

The famous ChatGPT model GPT-3.5 comes under this. The main difference between the previous LLM models and the Chat Models are:

- Chat Models are 10x cheaper for api calls
- You can hold a conversation with a chat model like you can with a human which is not possible with the previous LLMs

Since Chat Models can hold a conversation they take a list of chat messages as input instead of plain text like a LLM

Now let's discuss how we can use these Chat Models.
Let's do the necessary imports first:

In [6]:
from langchain.chat_models import ChatOpenAI
from langchain import PromptTemplate, LLMChain
from langchain.prompts.chat import (
    ChatPromptTemplate,
    SystemMessagePromptTemplate,
    AIMessagePromptTemplate,
    HumanMessagePromptTemplate,
)
from langchain.schema import (
    AIMessage,
    HumanMessage,
    SystemMessage
)

In [7]:
#Instead of using a OpenAI class we will be using a ChatOpenAI class to create our chat LLM
chat = ChatOpenAI(temperature=0)


**Input is a bunch of messages.**

Messages are classified into 3 types

1. **System Message** - This is an initial prompt sent to the model to control the behavior of the model.
2. **Human Message** - Input message of the user
3. **AI Message** - Message response given by ChatGPT

ChatGPT needs a list of all these messages in the conversation to be able to understand the content and converse further.

Now let's see an example where we define the system message and the message input of the user and pass to the chat model. The output generated will be an AI message.

We are using a System Prompt to let the model do the task of paraphrasing. This technique of providing the model a prompt to make it perform a task is called Prompt Engineering.

In [8]:
messages = [
    SystemMessage(content="Consider yourself as an expert in English Language, who can help paraphrase sentences"),
    HumanMessage(content="I love programming.")
]
chat(messages)

AIMessage(content='I have a passion for coding.', additional_kwargs={}, example=False)

### **Templates in Chat Models**

Earlier, we defined a system message with input variable task. This task can be dynamically change to do various tasks. For this example we will follow the task of paraphrasing.



In [9]:
template="You are a helpful assistant that {task}."
system_message_prompt = SystemMessagePromptTemplate.from_template(template)
human_template="{text}"
human_message_prompt = HumanMessagePromptTemplate.from_template(human_template)
chat_prompt = ChatPromptTemplate.from_messages([system_message_prompt, human_message_prompt])

chat(chat_prompt.format_prompt(task="paraphrases the sentence", text="I love programming.").to_messages())

AIMessage(content='I have a strong passion for coding.', additional_kwargs={}, example=False)

### **Chaining in Chat Models**
Chaining multiple tasks for LLMs can be achieved with Chat Models as well. Here is an example:


In [10]:
chain = LLMChain(llm=chat, prompt=chat_prompt)
chain.run(task="paraphrases the sentence", text="I love programming.")

'I have a strong passion for coding.'

### **Streaming with Chat Models**
As disucussed above how streaming can be useful in LLMs. Now let's see how we can do the same with Chat Models.

In [11]:
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
chat = ChatOpenAI(streaming=True, callbacks=[StreamingStdOutCallbackHandler()], temperature=0)
resp = chat([HumanMessage(content="Write me a poem about life")])

In the tapestry of life, we find our place,
A journey of wonder, a cosmic embrace.
With each passing moment, a story unfolds,
A symphony of emotions, both new and old.

Life, a kaleidoscope of colors and hues,
A canvas of dreams, where hope renews.
Through valleys of darkness and mountains of light,
We navigate the unknown, with all our might.

In the cradle of existence, we learn to grow,
From fragile beginnings, our spirits bestow.
We stumble and fall, but rise with grace,
For life's greatest lessons, we must embrace.

Through laughter and tears, we find our way,
Discovering purpose, day by day.
The sun may set, but dawn will arise,
A chance to begin anew, with wide-open eyes.

Life's tapestry weaves a pattern unique,
With threads of joy, and moments bittersweet.
We dance to the rhythm of time's gentle sway,
Embracing the beauty, come what may.

In the symphony of life, we find our song,
A melody of triumph, amidst the throng.
With love as our guide, we navigate strife,
For in this g

# **Embedding models**

First we need to understand what is an embedding. An embedding is generally associated with a piece of text and it represents the properties of text.

Just to give an example, let's consider the words good, best, bad. If we find the embeddings of these words we observe that embeddings of good and best are close while embedding of bad is far. The reason being embedding of a word has knowledge of the meaning of the word. Thus words with similar meanings have similar embeddings.

Embeddings also have an **interesting** as can be seen below. Let's consider E(x) as Embedding of word x

E(king) - E(male) + E(female) ~= E(queen)

What this represents is if we subtract the embedding of word male from word king and add the embedding of word female it will be quite close to embedding of word queen. As humans we can understand this intuitively as removing male gender and adding female gender to king makes it a queen but now machines have the capability to understand such complex relations.

Now that we have an idea of what is embeddings, the task of a embeddings model is to create these embeddings for the text input provided. A model which generates embedding which can show properties like the ones we discussed above and more is considered a good model.

Once these embeddings are generated, we can use it to perform tasks like semantic search similar to how apps like Chatbase, PDF.ai, SiteGPT work. You can creating embeddings for all your documents or webpages and when user asks a query you can fetch the relevant pieces and send to the user

Now let's discuss it with the help of an example:


In [13]:
from langchain.embeddings import OpenAIEmbeddings
embeddings = OpenAIEmbeddings()
strings = ["This is for demonstration.", "This string is also for demonstration.", "This is another demo string.", "This one is last string."]
result = embeddings.embed_documents(strings)

# To identify if sentences are similar, we can calculate the distance between these vectors.
# If the distance is small, then the words are of similar meanings
print(result)

[[-0.012422735299856755, 0.01051919864741267, 0.00134972058359777, -0.0091197337988123, -0.008920757670124438, 0.014233417232726035, -0.008164648567375062, 0.0015603035522324844, -0.006944262607122784, -0.02400977253560505, 0.008761576953438653, 0.009046775853916, -0.013861995560459205, -0.0033311900730996834, -0.0033063181734290176, 0.004795322000542166, 0.016249708173391036, -0.016634394672551046, 0.014167092166937591, -0.029713750545151976, -0.012648242013653518, 0.013397718237295035, 0.005342506121603158, 0.01274109766455086, -0.02942191876556677, -0.003939724833448857, 0.020255758538289606, -0.026901556952380595, 0.022961831653270467, -0.020786360927242217, 0.0020411621418437655, 0.0057537232058720664, 0.013702814843773422, -0.04109518063574962, -0.011208981753051067, -0.011308470283056269, 0.01268140454654774, -0.022404699610531494, -0.0032234113755321963, 0.004612927138301412, -0.011819174966007841, 0.017377238017084705, 0.006413660218170171, -0.027074002495959558, -0.0120911090