<a href="https://colab.research.google.com/github/sudarshan-koirala/youtube-stuffs/blob/main/langchain/LangChain_Components.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# LangChain Components
- [LangChain components Documentation](https://docs.langchain.com/docs/category/components)
- [Youtube Video covering this notebook](https://youtu.be/r1HjwBSS80g)

Goal: Walk you through the main core components of Langchain.

```
🦜️🔗 LangChain is a framework for developing applications powered by language models
```

✍️ Authors NOTE:
- GPT4 is not being used, mostly, `gpt-3.5-turobo` and `text-davinci-003` are used which are the default models from LangChain based on the specific task we perform.
- By no means I am an expert on this topic, I am learning myself and decided to provide this for the community, so most of you get benefit out of it.
- For detailed information and documentation, refer to the official [LangChain documentation](https://python.langchain.com/en/latest/index.html)
- 🍿 For more information, you could watch the playlist of [LangChain](https://www.youtube.com/playlist?list=PLz-qytj7eIWVd1a5SsQ1dzOjVDHdgC1Ck) in my youtube channel.

# Table of Contents

>[LangChain Components](#scrollTo=6wF7awIeGVBq)

>[Setup](#scrollTo=-ACgSwMXHKm2)

>[Schema](#scrollTo=iAAEi6kvLdJ7)

>>[Text](#scrollTo=tWILVJuPMpsV)

>>[ChatMessages](#scrollTo=hLQFM01INFKX)

>>[Examples](#scrollTo=XH-Eiso8nT_l)

>>[Documents](#scrollTo=tkz677i4oOzN)

>[Models](#scrollTo=6a1EiiYrqqF-)

>>[Language Model](#scrollTo=mGG6egsEri1l)

>>[Chat Model](#scrollTo=DHRPcKeSuDaI)

>>[Text Embedding Model](#scrollTo=H8gu8licuFTu)

>>>[Huggingface Embeddings](#scrollTo=m_Mie2Ym2cad)

>>>[OpenAI Embeddings](#scrollTo=i-EMoIV22iSq)

>[Prompts](#scrollTo=Gu_VTMSW3JsY)

>>[PromptValue](#scrollTo=m6ScgA2Z3tHy)

>>[Prompt Template](#scrollTo=vRt3TQYD6u5K)

>>[Example Selectors](#scrollTo=ex8qtf0R8SCO)

>>[Output Parsers](#scrollTo=A1QTlrHF-mAA)

>[Indexes](#scrollTo=C16Vdu0EBbVe)

>>[Document Loaders](#scrollTo=Q9mjvnHiBx15)

>>[Text Splitters](#scrollTo=JJFGMG9nDKnp)

>>[Retriever](#scrollTo=t1xZIdmPE9fF)

>>[Vectorstore](#scrollTo=nPepas4qL4qm)

>[Memory](#scrollTo=MGiVicr4Oyxl)

>>[ChatMessageHistory](#scrollTo=3ms0nY-iPj9k)

>[Chains && Getting started chains](#scrollTo=O2jihk2hRs8F)

>>[Chain](#scrollTo=YuzerNVfkaq5)

>>[LLMChain](#scrollTo=bCyKB9Phk5Jw)

>>[Index-related Chains](#scrollTo=NT0RFq4smMlV)

>>>[Summarization Chain as an example](#scrollTo=h5NkVWW0ofnT)

>[Agents](#scrollTo=7JaldmU7rzqu)

>>[Tools](#scrollTo=G777EdMDssqf)

>>[Toolkit](#scrollTo=2DOwKB5AtsZ2)

>>[Agent](#scrollTo=KvV-3DiJvkud)

>>[Agent Executors](#scrollTo=MP9q0908urBQ)

>>>[Example 1 (PythonReplTool)](#scrollTo=yXvC_t2JxBRR)

>>>[Example 2 (Self Ask With Search)](#scrollTo=ELUdJHYRxGlA)



# Setup
- We will be using large language models and embeddings from OpenAI. Grab the api key from this [link](https://platform.openai.com/account/api-keys)

In [None]:
%%capture
!pip install langchain openai

In [None]:
import warnings
warnings.filterwarnings('ignore')

In [None]:
%reload_ext watermark
%watermark -a "Sudarshan Koirala" -vmp langchain,openai

Author: Sudarshan Koirala

Python implementation: CPython
Python version       : 3.10.11
IPython version      : 7.34.0

langchain: 0.0.161
openai   : 0.27.6

Compiler    : GCC 9.4.0
OS          : Linux
Release     : 5.10.147+
Machine     : x86_64
Processor   : x86_64
CPU cores   : 2
Architecture: 64bit



In [None]:
import os
os.environ['OPENAI_API_KEY'] ="OPENAI_API_KEY"

# 1. [Schema](https://docs.langchain.com/docs/components/schema/)
Basic data types and schemas (nuts and bolts of working with LLMs)

## Text
- When working with llms, text goes in, text comes out. Its the natural way to interact with LLMs.

In [None]:
text = "What is LangChain?"

## ChatMessages
- Similar to text, but with more of a message type.
    - SystemChatMessage -> Instructions to the AI system
    - HumanChatMessage -> Messages that are intended to represent the user/human.
    - AIChatMessage -> Messages coming from the AI system

In [None]:
from langchain.chat_models import ChatOpenAI
from langchain.schema import HumanMessage, SystemMessage, AIMessage

In [None]:
chat = ChatOpenAI(temperature=0.1)

In [None]:
# get completions by passing in a single message
chat([HumanMessage(content="Translate this sentence from English to Nepali. I love programming.")])

AIMessage(content='मलाई प्रोग्रामिङ मन पर्छ।', additional_kwargs={}, example=False)

In [None]:
# pass multiple messages
messages = [
    SystemMessage(content="You are a helpful assistant that translates English to French."),
    HumanMessage(content="I love programming.")
]
chat(messages)

AIMessage(content="J'adore la programmation.", additional_kwargs={}, example=False)

In [None]:
# multiple sets of messages using .generate.
batch_messages = [
    [
        SystemMessage(content="You are a helpful assistant that translates English to French."),
        HumanMessage(content="I love programming.")
    ],
    [
        SystemMessage(content="You are a helpful assistant that translates English to French."),
        HumanMessage(content="I love artificial intelligence.")
    ],
]
result = chat.generate(batch_messages)
result

LLMResult(generations=[[ChatGeneration(text="J'adore la programmation.", generation_info=None, message=AIMessage(content="J'adore la programmation.", additional_kwargs={}, example=False))], [ChatGeneration(text="J'adore l'intelligence artificielle.", generation_info=None, message=AIMessage(content="J'adore l'intelligence artificielle.", additional_kwargs={}, example=False))]], llm_output={'token_usage': {'prompt_tokens': 57, 'completion_tokens': 20, 'total_tokens': 77}, 'model_name': 'gpt-3.5-turbo'})

## Examples
- Examples are input/output pairs that represent inputs to a function and then expected output. They can be used in both training and evaluation of models.

- These can be inputs/outputs for a model or for a chain. Both types of examples serve a different purpose. 
- Examples for a model can be used to finetune a model. 
- Examples for a chain can be used to evaluate the end-to-end chain, or maybe even train a model to replace that whole chain.

## Documents
- Piece of unstructured data.
- Holds the page content and metadata (more info of the data / text)
- You will see more in depth example of this in other parts as we explore. For [example](https://python.langchain.com/en/latest/modules/indexes/document_loaders/examples/chatgpt_loader.html)

In [None]:
from langchain.schema import Document

In [None]:
document = Document(page_content="The content of the data", metadata={
    'document_id' : 123,
    'document_create_time': '2023-05-26'
})
document

Document(page_content='The content of the data', metadata={'document_id': 123, 'document_create_time': '2023-05-26'})

# 2. [Models](https://docs.langchain.com/docs/components/models/)
**Different types of models used in Langchain.** **bold text**

## Language Model
- Takes text in as input, returns text as an output.

In [None]:
from langchain.llms import OpenAI

In [None]:
llm = OpenAI(model_name="gpt-3.5-turbo", temperature=0)

In [None]:
llm("What number comes after 2?")

'The number that comes after 2 is 3.'

It's not necessary that you use dafault model, you can use others too, [openai model endpoint](https://platform.openai.com/docs/models/model-endpoint-compatibility)

In [None]:
llm_davinci = OpenAI(model_name="text-davinci-003", temperature=0)

In [None]:
llm_davinci("What number comes after 2?")

'\n\n3'

## Chat Model
- A chat model takes a list of ChatMessages as an input and returns a ChatMessage.

In [None]:
from langchain.chat_models import ChatOpenAI
from langchain.schema import HumanMessage, SystemMessage, AIMessage

chat = ChatOpenAI(temperature=0)

In [None]:
chat([HumanMessage(content="Translate this sentence from English to Nepali. I love programming.")])

AIMessage(content='मलाई प्रोग्रामिङ्ग मन पर्छ।', additional_kwargs={}, example=False)

As shown above in the chat message section, we could also use multiple messages or batch of messages.

## Text Embedding Model
- A text embedding model takes a piece of text as input and numerical representation of that text in the form of a list of floats.
- meaning, text converted to vector for machines to understand.
- Can use different [embeddings](https://python.langchain.com/en/latest/modules/models/text_embedding.html), but here I will show you `OpenAIEmbeddings` and `HuggingfaceEmbeddings`.
- To use HuggingfaceEmbeddings, you need to install `sentence_transformers`

In [None]:
from langchain.embeddings import OpenAIEmbeddings, HuggingFaceEmbeddings

### Huggingface Embeddings

In [None]:
embeddings = HuggingFaceEmbeddings()

ValueError: ignored

In [None]:
%%capture
!pip install sentence_transformers

In [None]:
embeddings = HuggingFaceEmbeddings()

Downloading (…)a8e1d/.gitattributes:   0%|          | 0.00/1.18k [00:00<?, ?B/s]

Downloading (…)_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Downloading (…)b20bca8e1d/README.md:   0%|          | 0.00/10.6k [00:00<?, ?B/s]

Downloading (…)0bca8e1d/config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

Downloading (…)ce_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

Downloading (…)e1d/data_config.json:   0%|          | 0.00/39.3k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/438M [00:00<?, ?B/s]

Downloading (…)nce_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

Downloading (…)a8e1d/tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

Downloading (…)8e1d/train_script.py:   0%|          | 0.00/13.1k [00:00<?, ?B/s]

Downloading (…)b20bca8e1d/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading (…)bca8e1d/modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

In [None]:
text = "This is a test document to check the embeddings."

In [None]:
text_embedding = embeddings.embed_query(text)

In [None]:
print(f'Embeddings lenght: {len(text_embedding)}')
print (f"Here's a sample: {text_embedding[:5]}...")

Embeddings lenght: 768
Here's a sample: [-0.0027273043524473906, -0.09290841221809387, -0.025437859818339348, 0.07693517953157425, 0.034590668976306915]...


### OpenAI Embeddings

In [None]:
openai_embeddings = OpenAIEmbeddings()

In [None]:
text_openai_embedding = openai_embeddings.embed_query(text)

In [None]:
print(f'Embeddings lenght: {len(text_openai_embedding)}')
print (f"Here's a sample: {text_openai_embedding[:5]}...")

Embeddings lenght: 1536
Here's a sample: [-0.020220063626766205, 0.00847809761762619, 0.004504404030740261, -0.011350567452609539, 0.007171223871409893]...


# 3. [Prompts](https://docs.langchain.com/docs/components/prompts/)
- In simple terms, it is the input to the model.

## PromptValue

In [None]:
from langchain.llms import OpenAI

In [None]:
llm = OpenAI(model_name="gpt-3.5-turbo", temperature=0)

In [None]:
prompt = "What comes after 2?"

In [None]:
llm(prompt) #openai model

'3.'

## Prompt Template
- Better way of prompting.
- Contains a text string (similar to f-string in python.

In [None]:
name = "Sudarshan"
f"Hello, {name}."

'Hello, Sudarshan.'

In [None]:
from langchain.llms import OpenAI
from langchain import PromptTemplate

In [None]:
template = """
I want you to act as a naming consultant for new companies.
What is a good name for a company that makes {product}?
"""

prompt = PromptTemplate(
    input_variables=["product"],
    template=template,
)
prompt.format(product="colorful socks")

'\nI want you to act as a naming consultant for new companies.\nWhat is a good name for a company that makes colorful socks?\n'

## Example Selectors
- Lets go see one example with LengthBased ExampleSelector.

In [None]:
from langchain.prompts import PromptTemplate
from langchain.prompts import FewShotPromptTemplate
from langchain.prompts.example_selector import LengthBasedExampleSelector

In [None]:
# These are a lot of examples of a pretend task of creating antonyms.
examples = [
    {"input": "happy", "output": "sad"},
    {"input": "tall", "output": "short"},
    {"input": "energetic", "output": "lethargic"},
    {"input": "sunny", "output": "gloomy"},
    {"input": "windy", "output": "calm"},
]

In [None]:
example_prompt = PromptTemplate(
    input_variables=["input", "output"],
    template="Input: {input}\nOutput: {output}",
)
example_selector = LengthBasedExampleSelector(
    # These are the examples it has available to choose from.
    examples=examples, 
    # This is the PromptTemplate being used to format the examples.
    example_prompt=example_prompt, 
    # This is the maximum length that the formatted examples should be.
    # Length is measured by the get_text_length function below.
    max_length=25,
    # This is the function used to get the length of a string, which is used
    # to determine which examples to include. It is commented out because
    # it is provided as a default value if none is specified.
    # get_text_length: Callable[[str], int] = lambda x: len(re.split("\n| ", x))
)
dynamic_prompt = FewShotPromptTemplate(
    # We provide an ExampleSelector instead of examples.
    example_selector=example_selector,
    example_prompt=example_prompt,
    prefix="Give the antonym of every input",
    suffix="Input: {adjective}\nOutput:", 
    input_variables=["adjective"],
)

In [None]:
# An example with small input, so it selects all examples.
my_adjective = "big"
print(dynamic_prompt.format(adjective=my_adjective))

Give the antonym of every input

Input: happy
Output: sad

Input: tall
Output: short

Input: energetic
Output: lethargic

Input: sunny
Output: gloomy

Input: windy
Output: calm

Input: big
Output:


In [None]:
llm(dynamic_prompt.format(adjective=my_adjective))

'small'

## Output Parsers 
Format the output of a model.
- `get_format_instructions() -> str`: A method which returns a string containing instructions for how the output of a language model should be formatted.
- `parse(str) -> Any`: A method which takes in a string (assumed to be the response from a language model) and parses it into some structure.

Optional
- `parse_with_prompt(str) -> Any`: A method which takes in a string (assumed to be the response from a language model) and a prompt (assumed to the prompt that generated such a response) and parses it into some structure

In [None]:
from langchain.output_parsers import CommaSeparatedListOutputParser
from langchain.prompts import PromptTemplate
from langchain.llms import OpenAI

In [None]:
output_parser = CommaSeparatedListOutputParser()

In [None]:
format_instructions = output_parser.get_format_instructions()
prompt = PromptTemplate(
    template="List five {subject}.\n{format_instructions}",
    input_variables=["subject"],
    partial_variables={"format_instructions": format_instructions}
)

In [None]:
model = OpenAI(temperature=0)

In [None]:
input = prompt.format(subject="ice cream flavors")
output = model(input)

In [None]:
output

'\n\nVanilla, Chocolate, Strawberry, Mint Chocolate Chip, Cookies and Cream'

In [None]:
output_parser.parse(output)

['Vanilla',
 'Chocolate',
 'Strawberry',
 'Mint Chocolate Chip',
 'Cookies and Cream']

# 4. [Indexes](https://docs.langchain.com/docs/components/indexing/)
Ways to structure documents so that LLMs can best interact with them.

## [Document Loaders](https://python.langchain.com/en/latest/modules/indexes/document_loaders.html)

> Indented block


- Document Loaders are responsible for loading a list of Document objects.
- Let's go through the `HuggingfaceHub dataset loader`

In [None]:
from langchain.document_loaders import HuggingFaceDatasetLoader

In [None]:
dataset_name="imdb"
page_content_column="text"
loader=HuggingFaceDatasetLoader(dataset_name,page_content_column)

In [None]:
data = loader.load()

ImportError: ignored

In [None]:
%%capture
!pip install datasets

In [None]:
data = loader.load()
data[:2]

Downloading builder script:   0%|          | 0.00/4.31k [00:00<?, ?B/s]

Downloading metadata:   0%|          | 0.00/2.17k [00:00<?, ?B/s]

Downloading readme:   0%|          | 0.00/7.59k [00:00<?, ?B/s]

Downloading and preparing dataset imdb/plain_text to /root/.cache/huggingface/datasets/imdb/plain_text/1.0.0/d613c88cf8fa3bab83b4ded3713f1f74830d1100e171db75bbddb80b3345c9c0...


Downloading data:   0%|          | 0.00/84.1M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/25000 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/25000 [00:00<?, ? examples/s]

Generating unsupervised split:   0%|          | 0/50000 [00:00<?, ? examples/s]

Dataset imdb downloaded and prepared to /root/.cache/huggingface/datasets/imdb/plain_text/1.0.0/d613c88cf8fa3bab83b4ded3713f1f74830d1100e171db75bbddb80b3345c9c0. Subsequent calls will reuse this data.


  0%|          | 0/3 [00:00<?, ?it/s]

[Document(page_content='I rented I AM CURIOUS-YELLOW from my video store because of all the controversy that surrounded it when it was first released in 1967. I also heard that at first it was seized by U.S. customs if it ever tried to enter this country, therefore being a fan of films considered "controversial" I really had to see this for myself.<br /><br />The plot is centered around a young Swedish drama student named Lena who wants to learn everything she can about life. In particular she wants to focus her attentions to making some sort of documentary on what the average Swede thought about certain political issues such as the Vietnam War and race issues in the United States. In between asking politicians and ordinary denizens of Stockholm about their opinions on politics, she has sex with her drama teacher, classmates, and married men.<br /><br />What kills me about I AM CURIOUS-YELLOW is that 40 years ago, this was considered pornographic. Really, the sex and nudity scenes are 

## [Text Splitters](https://python.langchain.com/en/latest/modules/indexes/text_splitters.html)
- Splitting up a document into smaller documents.
- LLMs has a limit of what they can accept. [Link to openai website](https://platform.openai.com/docs/models/gpt-3-5)
- We need to split document into chunks. Text splitters helps us here.


- For this demo, lets go with markdown text splitter.

In [None]:
from langchain.text_splitter import MarkdownTextSplitter

In [None]:
markdown_text = """
# 🦜️🔗 LangChain

⚡ Building applications with LLMs through composability ⚡

## Quick Install

```bash
# Hopefully this code block isn't split
pip install langchain
```

As an open source project in a rapidly developing field, we are extremely open to contributions.
"""

In [None]:
markdown_splitter = MarkdownTextSplitter(chunk_size=100, chunk_overlap=0)

In [None]:
docs = markdown_splitter.create_documents([markdown_text])

In [None]:
print(len(docs))

3


In [None]:
docs

[Document(page_content='# 🦜️🔗 LangChain\n\n⚡ Building applications with LLMs through composability ⚡', metadata={}),
 Document(page_content="Quick Install\n\n```bash\n# Hopefully this code block isn't split\npip install langchain", metadata={}),
 Document(page_content='As an open source project in a rapidly developing field, we are extremely open to contributions.', metadata={})]

## [Retriever](https://python.langchain.com/en/latest/modules/indexes/retrievers.html)
- makes easy to combine documents with language models.
- For simplicity, I will go with `VectorStore Retriever`

In [None]:
%%capture
!wget https://static.nomic.ai/gpt4all/2023_GPT4All-J_Technical_Report_2.pdf

In [None]:
%%capture
!pip install pypdf tiktoken faiss-cpu

In [None]:
from langchain.document_loaders import PyPDFLoader
loader = PyPDFLoader('/content/2023_GPT4All-J_Technical_Report_2.pdf')


In [None]:
from langchain.text_splitter import CharacterTextSplitter, RecursiveCharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain.embeddings import OpenAIEmbeddings

documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)
embeddings = OpenAIEmbeddings()
db = FAISS.from_documents(texts, embeddings)

In [None]:
db

<langchain.vectorstores.faiss.FAISS at 0x7f936bad2020>

In [None]:
# You can also specify search kwargs like k to use when doing retrieval.
#retriever = db.as_retriever()
retriever = db.as_retriever(search_kwargs={"k": 2})

In [None]:
# By default, the vectorstore retriever uses similarity search
docs = retriever.get_relevant_documents("How is the training done?")

In [None]:
print(len(docs))

2


## [Vectorstore](https://python.langchain.com/en/latest/modules/indexes/vectorstores.html)
- Vectorstore is the database to store vectors.

In [None]:
print (f"You have {len(texts)} documents")

You have 3 documents


In [None]:
embedding_list = embeddings.embed_documents([text.page_content for text in texts])

In [None]:
print (f"You have {len(embedding_list)} embeddings")
print (f"Here's a sample of one: {embedding_list[0][:3]}...")

You have 3 embeddings
Here's a sample of one: [-0.02944159085255252, -0.005345897996550137, 0.019034529986016152]...


# 5. [Memory](https://docs.langchain.com/docs/components/memory/)
- Concept of storing and retrieving data in the process of a conversation.
- Helping LLMs remember things.
- Two types of memory.
    - Short term memory: generally refers to how to pass data in the context of a singular conversation (generally is previous ChatMessages or summaries of them)
    - Long term memory: deals with how to fetch and update information between conversations

## [ChatMessageHistory](https://python.langchain.com/en/latest/modules/memory/getting_started.html)

In [None]:
from langchain.memory import ChatMessageHistory

history = ChatMessageHistory()

history.add_user_message("hi!")

history.add_ai_message("whats up?")

In [None]:
history.messages

[HumanMessage(content='hi!', additional_kwargs={}, example=False),
 AIMessage(content='whats up?', additional_kwargs={}, example=False)]

In [None]:
history.add_user_message("Fine, what about you?")
history.messages

[HumanMessage(content='hi!', additional_kwargs={}, example=False),
 AIMessage(content='whats up?', additional_kwargs={}, example=False),
 HumanMessage(content='Fine, what about you?', additional_kwargs={}, example=False)]

In [None]:
# we can just create a chat and add memory
from langchain.chat_models import ChatOpenAI
chat = ChatOpenAI()

In [None]:
ai_response = chat(history.messages)
ai_response

AIMessage(content="As an AI language model, I don't have feelings, but I'm functioning properly and ready to assist you with any task or question you may have. How can I help you today?", additional_kwargs={}, example=False)

In [None]:
history.add_ai_message(ai_response.content)
history.messages

[HumanMessage(content='hi!', additional_kwargs={}, example=False),
 AIMessage(content='whats up?', additional_kwargs={}, example=False),
 HumanMessage(content='Fine, what about you?', additional_kwargs={}, example=False),
 AIMessage(content="As an AI language model, I don't have feelings, but I'm functioning properly and ready to assist you with any task or question you may have. How can I help you today?", additional_kwargs={}, example=False)]

# 6. [Chains](https://docs.langchain.com/docs/components/chains/) && [Getting started chains](https://python.langchain.com/en/latest/modules/chains/getting_started.html)
- As name suggests we can combine different things as a chain. The most common type of chain is `LLMChain`
- Chains allow us to combine multiple components together to create a single, coherent application

## Chain
- A chain is just an end-to-end wrapper around multiple individual components.



## LLMChain



In [None]:
from langchain.prompts import PromptTemplate
from langchain.llms import OpenAI

llm = OpenAI(temperature=0.9)
prompt = PromptTemplate(
    input_variables=["product"],
    template="What is a good name for a company that makes {product}?",
)

In [None]:
from langchain.chains import LLMChain
chain = LLMChain(prompt=prompt, llm=llm)

# Run the chain only specifying the input variable.
print(chain.run("colorful socks"))



Happy Hues Socks


In [None]:
chat = ChatOpenAI(temperature=0)
prompt_template = "Tell me a {adjective} joke"
llm_chain = LLMChain(
    llm=chat,
    prompt=PromptTemplate.from_template(prompt_template)
)

llm_chain(inputs={"adjective":"corny"})

{'adjective': 'corny',
 'text': 'Why did the tomato turn red? Because it saw the salad dressing!'}

**LLMChain with Huggingface Hub model**
- I will show you for this particular case with model from [Huggingface hub](https://huggingface.co/google/flan-t5-xl).
- For working with HuggingfaceHub model, we need `huggingfacehub_api_token`. Now, lets go to [huggingface hub access tokens page](https://huggingface.co/settings/tokens) and grab api token.

In [None]:
%%capture
!pip install huggingface_hub

In [None]:
import os

os.environ['HUGGINGFACEHUB_API_TOKEN'] = "HUGGINGFACEHUB_API_TOKEN"

In [None]:
# load the model

from langchain import HuggingFaceHub

repo_id = "google/flan-t5-xl"

llm_hf = HuggingFaceHub(repo_id=repo_id, model_kwargs={"temperature":0, "max_length":64})

In [None]:
# example using Huggingface
from langchain import PromptTemplate, LLMChain

template = """Question: {question}

Answer: Let's think step by step."""
prompt = PromptTemplate(template=template, input_variables=["question"])
llm_chain = LLMChain(prompt=prompt, llm=llm_hf)

question = "Who won the FIFA World Cup in the year 1994? "

print(llm_chain.run(question))

The FIFA World Cup is a football tournament that is played every 4 years. The year 1994 was the 44th FIFA World Cup. The final answer: Brazil.


## [Index-related Chains](https://docs.langchain.com/docs/components/chains/index_related_chains)
- This category of chains are used for interacting with indexes. The purpose these chains is to combine your own data (stored in the indexes) with LLMs. The best example of this is question answering over your own documents.
- There are few methods or chains and the decision to use depends upon the specific usecase. Simplest to most complex:
    - Stuffing (simplest and most straightforward)
    - Map Reduce (breaks down the documents into smaller chunks)
    - Refine (further filters the results from the map_reduce chain)
    - Map-Rerank (reorders the results based on relevance)
    

### Summarization Chain as an example

In [None]:
from langchain import OpenAI, PromptTemplate, LLMChain
from langchain.text_splitter import CharacterTextSplitter
from langchain.chains.mapreduce import MapReduceChain
from langchain.prompts import PromptTemplate

llm = OpenAI(temperature=0)

text_splitter = CharacterTextSplitter()

In [None]:
# lets print the texts from our earler example
texts

[Document(page_content='GPT4All-J: An Apache-2 Licensed Assistant-Style Chatbot\nYuvanesh Anand\nyuvanesh@nomic.aiZach Nussbaum\nzach@nomic.aiBrandon Duderstadt\nbrandon@nomic.ai\nBenjamin M. Schmidt\nben@nomic.aiAdam Treat\ntreat.adam@gmail.comAndriy Mulyar\nandriy@nomic.ai\nAbstract\nGPT4All-J is an Apache-2 licensed chatbot\ntrained over a massive curated corpus of as-\nsistant interactions including word problems,\nmulti-turn dialogue, code, poems, songs, and\nstories. It builds on the March 2023 GPT4All\nrelease by training on a significantly larger\ncorpus, by deriving its weights from the\nApache-licensed GPT-J model rather than the\nGPL-licensed of LLaMA, and by demonstrat-\ning improved performance on creative tasks\nsuch as writing stories, poems, songs and\nplays. We openly release the training data,\ndata curation procedure, training code, and fi-\nnal model weights to promote open research\nand reproducibility. Additionally, we release\nPython bindings and a Chat UI to a q

In [None]:
from langchain.chains.summarize import load_summarize_chain

In [None]:
chain = load_summarize_chain(llm, chain_type="map_reduce", verbose=True)
chain.run(docs)



[1m> Entering new MapReduceDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mWrite a concise summary of the following:


"Model BoolQ PIQA HellaSwag WinoGrande ARC-e ARC-c OBQA
GPT4All-J 6.7B 73.4 74.8 63.4 64.7 54.9 36.0 40.2
GPT4All-J Lora 6.7B 68.6 75.8 66.2 63.5 56.4 35.7 40.2
GPT4All LLaMa Lora 7B 73.1 77.6 72.1 67.8 51.1 40.4 40.2
Dolly 6B 68.8 77.3 67.6 63.9 62.9 38.7 41.2
Dolly 12B 56.7 75.4 71.0 62.2 64.6 38.5 40.4
Alpaca 7B 73.9 77.2 73.9 66.1 59.8 43.3 43.4
Alpaca Lora 7B 74.3 79.3 74.0 68.8 56.6 43.9 42.6
GPT-J 6.7B 65.4 76.2 66.2 64.1 62.2 36.6 38.2
LLaMa 7B 73.1 77.4 73.0 66.9 52.5 41.4 42.4
Pythia 6.7B 63.5 76.3 64.0 61.1 61.3 35.2 37.2
Pythia 12B 67.7 76.6 67.3 63.8 63.9 34.8 38
Table 1: Zero-shot performance on Common Sense Reasoning tasks
(a) TSNE visualization of the final GPT4All-J training data,
ten-colored by extracted topic.
(b) Zoomed in view of Figure 1a. The region displayed con-
tains generations r

' This paper presents GPT4All-J, a model trained on a diverse set of topics, and evaluates it against other models on Common Sense Reasoning tasks. The evaluation results show that GPT4All-J outperforms the other models, and the authors release data and training details to accelerate open LLM research. Additionally, instruction-tuning is evaluated, which showed performance regressions over the base model, but some tasks showed performance improvements.'

**Try with other chain and see how it performs. It all depends upon your usecase.**

# 7. [Agents](https://docs.langchain.com/docs/components/agents/)
- Some applications will require not just a predetermined chain of calls to LLMs/other tools, but potentially an unknown chain that depends on the user's input. 
- This is where agents come in place. Agent has access to a suite of tools. Depending on the user input, the agent can then decide which, if any, of these tools to call.

## [Tools](https://python.langchain.com/en/latest/modules/agents/tools.html) 
- Tools are ways that an agent can use to interact with the outside world.
- There are many tools so refer to the link for more info.

## [Toolkit](https://docs.langchain.com/docs/components/agents/toolkit)
- Groups of tools that can be used/are necessary to solve a particular problem.
- Lets go through the example of using toolkit.



## [Agent](https://python.langchain.com/en/latest/modules/agents/agents.html)
- There are different types of agents for specific tasks. Refer to documentation for more info.

## [Agent Executors](https://python.langchain.com/en/latest/modules/agents/agent_executors.html)
- Agent executors take an agent and tools and use the agent to decide which tools to call and in what order.

### Example 1 (PythonReplTool)

In [None]:
from langchain.agents.agent_toolkits import create_python_agent
from langchain.tools.python.tool import PythonREPLTool
from langchain.python import PythonREPL
from langchain.llms.openai import OpenAI

In [None]:
agent_executor = create_python_agent(
    llm=OpenAI(temperature=0, max_tokens=1000),
    tool=PythonREPLTool(),
    verbose=True
)

In [None]:
agent_executor.run("What is the 10th fibonacci number?")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m I need to calculate the 10th fibonacci number
Action: Python REPL
Action Input: def fibonacci(n):
    if n == 0:
        return 0
    elif n == 1:
        return 1
    else:
        return fibonacci(n-1) + fibonacci(n-2)

print(fibonacci(10))[0m
Observation: [36;1m[1;3m55
[0m
Thought:[32;1m[1;3m I now know the final answer
Final Answer: 55[0m

[1m> Finished chain.[0m


'55'

### Example 2 (Self Ask With Search)
- As we will be using SerpAPIWrapper, we need serpapi_token and install `google-search-results` python package.
- Visit -> https://serpapi.com/
- Once logged in, get api key from -> https://serpapi.com/manage-api-key

In [None]:
%%capture
!pip install google-search-results

In [None]:
os.environ['SERPAPI_API_KEY'] = "SERPAPI_API_KEY"

In [None]:
from langchain import OpenAI, SerpAPIWrapper
from langchain.agents import initialize_agent, Tool
from langchain.agents import AgentType

llm = OpenAI(temperature=0)
search = SerpAPIWrapper()
tools = [
    Tool(
        name="Intermediate Answer",
        func=search.run,
        description="useful for when you need to ask with search"
    )
]

In [None]:
self_ask_with_search = initialize_agent(tools, llm, agent=AgentType.SELF_ASK_WITH_SEARCH, verbose=True)
self_ask_with_search.run("What is the hometown of the reigning men's French Open?")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m Yes.
Follow up: Who is the reigning men's French Open champion?[0m
Intermediate answer: [36;1m[1;3mNo good search result found[0m
[32;1m[1;3mFollow up: Who won the men's French Open in 2020?[0m
Intermediate answer: [36;1m[1;3mThree-time defending champion Rafael Nadal defeated Novak Djokovic in the final, 6–0, 6–2, 7–5 to win the men's singles tennis title at the 2020 French Open.[0m
[32;1m[1;3mFollow up: Where is Rafael Nadal from?[0m
Intermediate answer: [36;1m[1;3mManacor, Spain[0m
[32;1m[1;3mSo the final answer is: Manacor, Spain[0m

[1m> Finished chain.[0m


'Manacor, Spain'

In [None]:
#if you want the intermediate answers, pass return_intermediate_steps=True.
self_ask_with_search = initialize_agent(tools, llm, agent=AgentType.SELF_ASK_WITH_SEARCH, return_intermediate_steps=True, verbose=True)
response = self_ask_with_search("What is the hometown of the reigning men's French Open?")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m Yes.
Follow up: Who is the reigning men's French Open champion?[0m
Intermediate answer: [36;1m[1;3mNo good search result found[0m
[32;1m[1;3mFollow up: What is the name of the current men's French Open champion?[0m
Intermediate answer: [36;1m[1;3mRafael Nadal[0m
[32;1m[1;3mFollow up: Where is Rafael Nadal from?[0m
Intermediate answer: [36;1m[1;3mManacor, Spain[0m
[32;1m[1;3mSo the final answer is: Manacor, Spain[0m

[1m> Finished chain.[0m


In [None]:
import json
print(json.dumps(response["intermediate_steps"], indent=2))

[
  [
    [
      "Intermediate Answer",
      "Who is the reigning men's French Open champion?",
      " Yes.\nFollow up: Who is the reigning men's French Open champion?"
    ],
    "No good search result found"
  ],
  [
    [
      "Intermediate Answer",
      "What is the name of the current men's French Open champion?",
      "Follow up: What is the name of the current men's French Open champion?"
    ],
    "Rafael Nadal"
  ],
  [
    [
      "Intermediate Answer",
      "Where is Rafael Nadal from?",
      "Follow up: Where is Rafael Nadal from?"
    ],
    "Manacor, Spain"
  ]
]


**Happy Learning and Chaining** 😎