<a href="https://colab.research.google.com/github/rinogrego/Learning-LLM/blob/main/explorations/LangChain-Exploration.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# LangChain Exploration

Exploring [LangChain Modules documentation](https://python.langchain.com/docs/modules/)

In [None]:
!pip install langchain langchain-community langchain-openai

from google.colab import output
output.clear()

In [None]:
from google.colab import userdata

GOOGLE_API_KEY = userdata.get('GOOGLE_API_KEY')
HUGGINGFACE_API_WRITE = userdata.get('HUGGINGFACE_TOKEN_WRITE')
HUGGINGFACE_API_READ = userdata.get('HUGGINGFACE_API_TOKEN_READ')
OPENAI_API_KEY = userdata.get('OPENAI_API_KEY')

## Model I/O

The core element of any language model application is...the model. LangChain gives you the building blocks to interface with any language model.

<img src="https://python.langchain.com/assets/images/model_io-e6fc0045b7eae0377a4ddeb90dc8cdb8.jpg" height="75%" width="75%">

### Quick Start

In [None]:
from langchain_community.llms import Ollama
from langchain_community.chat_models import ChatOllama

llm = Ollama(model="llama2")
chat_model = ChatOllama()
chat_model

ChatOllama()

In [None]:
from langchain_core.messages import HumanMessage

text = "What would be a good company name for a company that makes colorful socks?"
messages = [HumanMessage(content=text)]

llm.invoke(text)
# >> Feetful of Fun

chat_model.invoke(messages)
# >> AIMessage(content="Socks O'Color")

ConnectionError: HTTPConnectionPool(host='localhost', port=11434): Max retries exceeded with url: /api/generate (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x794454f5a3b0>: Failed to establish a new connection: [Errno 111] Connection refused'))

### Concepts

### Prompts

#### Quick Start

In [None]:
from langchain.prompts import PromptTemplate

prompt_template = PromptTemplate.from_template(
    "Tell me a {adjective} joke about {content}.\nTo be more specific, tell me more about {specific_content}"
)
prompt_template.format(
    adjective="funny",
    content="football",
    specific_content="english premier league",
    second_content="soccer" # this won't give error but won't be included in the prompt
)

'Tell me a funny joke about football.\nTo be more specific, tell me more about english premier league'

In [None]:
# The template supports any number of variables, including no variables:
from langchain.prompts import PromptTemplate

prompt_template = PromptTemplate.from_template("Tell me a joke")
prompt_template.format()

'Tell me a joke'

In [None]:
"""
    The prompt to chat models is a list of chat messages.
    Each chat message is associated with content, and an additional parameter
        called role.
    For example, in the OpenAI Chat Completions API, a chat message can be
        associated with an AI assistant, a human or a system role.

"""
from langchain_core.prompts import ChatPromptTemplate

chat_template = ChatPromptTemplate.from_messages(
    [
        ("system", "You are a helpful AI bot. Your name is {name}."),
        ("human", "Hello, how are you doing?"),
        ("ai", "I'm doing well, thanks!"),
        ("human", "{user_input}"),
    ]
)

messages = chat_template.format_messages(name="Bob", user_input="What is your name?")
messages

[SystemMessage(content='You are a helpful AI bot. Your name is Bob.'),
 HumanMessage(content='Hello, how are you doing?'),
 AIMessage(content="I'm doing well, thanks!"),
 HumanMessage(content='What is your name?')]

In [None]:
from langchain.prompts import HumanMessagePromptTemplate
from langchain_core.messages import SystemMessage
from langchain_openai import ChatOpenAI

chat_template = ChatPromptTemplate.from_messages(
    [
        SystemMessage(
            content=(
                "You are a helpful assistant that re-writes the user's text to "
                "sound more upbeat."
            )
        ),
        HumanMessagePromptTemplate.from_template("{text}"),
    ]
)
messages = chat_template.format_messages(text="I don't like eating tasty things")
print(messages)

[SystemMessage(content="You are a helpful assistant that re-writes the user's text to sound more upbeat."), HumanMessage(content="I don't like eating tasty things")]


#### Composition

In [None]:
from langchain.prompts import PromptTemplate

In [None]:
"""
    # When working with string prompts, each template is joined together.
    You can work with either prompts directly or strings (the first element
        in the list needs to be a prompt).
"""
prompt = (
    PromptTemplate.from_template("Tell me a joke about {topic}")
    + ", make it funny"
    + "\n\nand in {language}"
)
prompt

PromptTemplate(input_variables=['language', 'topic'], template='Tell me a joke about {topic}, make it funny\n\nand in {language}')

In [None]:
prompt.format(topic="sports", language="spanish")

'Tell me a joke about sports, make it funny\n\nand in spanish'

In [None]:
from langchain.chains import LLMChain
from langchain_openai import ChatOpenAI

In [None]:
from langchain_openai import ChatOpenAI
model = ChatOpenAI(openai_api_key=OPENAI_API_KEY)
prompt = PromptTemplate.from_template("Hi OpenAI. Explain about {topic} in {language} language")
chain = LLMChain(llm=model, prompt=prompt)
chain.run(topic="sports", language="indonesia")

'Olahraga adalah kegiatan fisik yang dilakukan untuk meningkatkan kesehatan, kebugaran, dan kebugaran tubuh. Olahraga juga dapat dilakukan sebagai hobi atau untuk mencapai prestasi dalam kompetisi.\n\nDi Indonesia, olahraga sangat populer dan memiliki beragam jenis yang digemari oleh masyarakat. Beberapa olahraga yang populer di Indonesia antara lain sepak bola, bulu tangkis, voli, basket, dan tinju.\n\nSepak bola adalah olahraga yang paling populer di Indonesia. Liga sepak bola Indonesia, Liga 1, merupakan salah satu liga teratas di Asia Tenggara. Tim nasional sepak bola Indonesia juga memiliki fans yang fanatik dan mendukung dengan antusiasme.\n\nBulu tangkis juga merupakan olahraga yang sangat populer di Indonesia. Indonesia telah menghasilkan beberapa pemain bulu tangkis terbaik di dunia, seperti Taufik Hidayat, Susi Susanti, dan Tontowi Ahmad/Liliyana Natsir.\n\nSelain itu, olahraga voli, basket, dan tinju juga memiliki penggemar yang besar di Indonesia. Banyak atlet Indonesia yan

In [None]:
# Chat Prompt Composition
"""
    A chat prompt is made up a of a list of messages.
    Purely for developer experience, we’ve added a convinient way to
        create these prompts.
    In this pipeline, each new element is a new message in the final prompt.
"""

from langchain_core.messages import AIMessage, HumanMessage, SystemMessage

In [None]:
prompt = SystemMessage(content="You are a nice pirate")
prompt

SystemMessage(content='You are a nice pirate')

In [None]:
new_prompt = (
    prompt + HumanMessage(content="hi") + AIMessage(content="what?") + "{input}"
)
new_prompt

ChatPromptTemplate(input_variables=['input'], messages=[SystemMessage(content='You are a nice pirate'), HumanMessage(content='hi'), AIMessage(content='what?'), HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['input'], template='{input}'))])

In [None]:
new_prompt.format_messages() # give error because no 'input' value provided

KeyError: 'input'

In [None]:
new_prompt.format_messages(input="i said hi")

[SystemMessage(content='You are a nice pirate'),
 HumanMessage(content='hi'),
 AIMessage(content='what?'),
 HumanMessage(content='i said hi')]

In [None]:
## Using chain

# model = ChatOpenAI()
# chain = LLMChain(llm=model, prompt=new_prompt)
# chain.run("i said hi")

#### Selector


| Name | Description |
| -- | -- |
| Similarity | Uses semantic similarity between inputs and examples to decide which examples to choose. |
| MMR | Uses Max Marginal Relevance between inputs and examples to decide which examples to choose. |
| Length | Selects examples based on how many can fit within a certain length |
| Ngram | Uses ngram overlap between inputs and examples to decide which examples to choose. |

##### Length

This example selector selects which examples to use based on length. This is useful when you are worried about constructing a prompt that will go over the length of the context window. For longer inputs, it will select fewer examples to include, while for shorter inputs it will select more.

In [None]:
from langchain.prompts import FewShotPromptTemplate, PromptTemplate
from langchain.prompts.example_selector import LengthBasedExampleSelector

# Examples of a pretend task of creating antonyms.
examples = [
    {"input": "happy", "output": "sad"},
    {"input": "tall", "output": "short"},
    {"input": "energetic", "output": "lethargic"},
    {"input": "sunny", "output": "gloomy"},
    {"input": "windy", "output": "calm"},
]

example_prompt = PromptTemplate(
    input_variables=["input", "output"],
    template="Input: {input}\nOutput: {output}",
)
example_selector = LengthBasedExampleSelector(
    # The examples it has available to choose from.
    examples=examples,
    # The PromptTemplate being used to format the examples.
    example_prompt=example_prompt,
    # The maximum length that the formatted examples should be.
    # Length is measured by the get_text_length function below.
    max_length=25,
    # The function used to get the length of a string, which is used
    # to determine which examples to include. It is commented out because
    # it is provided as a default value if none is specified.
    # get_text_length: Callable[[str], int] = lambda x: len(re.split("\n| ", x))
)
dynamic_prompt = FewShotPromptTemplate(
    # We provide an ExampleSelector instead of examples.
    example_selector=example_selector,
    example_prompt=example_prompt,
    prefix="Give the antonym of every input",
    suffix="Input: {adjective}\nOutput:",
    input_variables=["adjective"],
)

In [None]:
# An example with small input, so it selects all examples.
print(dynamic_prompt.format(adjective="big"))

Give the antonym of every input

Input: happy
Output: sad

Input: tall
Output: short

Input: energetic
Output: lethargic

Input: sunny
Output: gloomy

Input: windy
Output: calm

Input: big
Output:


In [None]:
# An example with long input, so it selects only one example.
long_string = "big and huge and massive and large and gigantic and tall and much much much much much bigger than everything else"
print(dynamic_prompt.format(adjective=long_string))

Give the antonym of every input

Input: happy
Output: sad

Input: big and huge and massive and large and gigantic and tall and much much much much much bigger than everything else
Output:


In [None]:
# You can add an example to an example selector as well.
new_example = {"input": "big", "output": "small"}
dynamic_prompt.example_selector.add_example(new_example)
print(dynamic_prompt.format(adjective="enthusiastic"))

Give the antonym of every input

Input: happy
Output: sad

Input: tall
Output: short

Input: energetic
Output: lethargic

Input: sunny
Output: gloomy

Input: windy
Output: calm

Input: big
Output: small

Input: enthusiastic
Output:


##### MMR

The MaxMarginalRelevanceExampleSelector selects examples based on a combination of which examples are most similar to the inputs, while also optimizing for diversity. It does this by finding the examples with the embeddings that have the greatest cosine similarity with the inputs, and then iteratively adding them while penalizing them for closeness to already selected examples.

##### Select by n-gram overlap

The `NGramOverlapExampleSelector` selects and orders examples based on which examples are most similar to the input, according to an ngram overlap score. The ngram overlap score is a float between 0.0 and 1.0, inclusive.

The selector allows for a threshold score to be set. Examples with an ngram overlap score less than or equal to the threshold are excluded. The threshold is set to -1.0, by default, so will not exclude any examples, only reorder them. Setting the threshold to 0.0 will exclude examples that have no ngram overlaps with the input.

##### Select by Similarity

This object selects examples based on similarity to the inputs. It does this by finding the examples with the embeddings that have the greatest cosine similarity with the inputs.

#### Example Selectors

If you have a large number of examples, you may need to select which ones to include in the prompt. The Example Selector is the class responsible for doing so.

In [None]:
from abc import ABC, abstractmethod
from typing import List, Dict, Any

class BaseExampleSelector(ABC):
    """Interface for selecting examples to include in prompts."""

    @abstractmethod
    def select_examples(self, input_variables: Dict[str, str]) -> List[dict]:
        """Select which examples to use based on the inputs."""

    @abstractmethod
    def add_example(self, example: Dict[str, str]) -> Any:
        """Add new example to store."""

The only method it needs to define is a `select_examples` method. This takes in the input variables and then returns a list of examples. It is up to each specific implementation as to how those examples are selected.

In [None]:
examples = [
    {"input": "hi", "output": "ciao"},
    {"input": "bye", "output": "arrivaderci"},
    {"input": "soccer", "output": "calcio"},
]

In [None]:
from langchain_core.example_selectors.base import BaseExampleSelector


class CustomExampleSelector(BaseExampleSelector):
    def __init__(self, examples):
        self.examples = examples

    def add_example(self, example):
        self.examples.append(example)

    def select_examples(self, input_variables):
        # This assumes knowledge that part of the input will be a 'text' key
        new_word = input_variables["input"]
        new_word_length = len(new_word)

        # Initialize variables to store the best match and its length difference
        best_match = None
        smallest_diff = float("inf")

        # Iterate through each example
        for example in self.examples:
            # Calculate the length difference with the first word of the example
            current_diff = abs(len(example["input"]) - new_word_length)

            # Update the best match if the current one is closer in length
            if current_diff < smallest_diff:
                smallest_diff = current_diff
                best_match = example
                print("Found best match:", best_match)

        return [best_match]

In [None]:
example_selector = CustomExampleSelector(examples)

In [None]:
example_selector.select_examples({"input": "okay"})

Found best match: {'input': 'hi', 'output': 'ciao'}
Found best match: {'input': 'bye', 'output': 'arrivaderci'}


[{'input': 'bye', 'output': 'arrivaderci'}]

In [None]:
example_selector.add_example({"input": "hand", "output": "mano"})

In [None]:
example_selector.select_examples({"input": "okay"})

Found best match: {'input': 'hi', 'output': 'ciao'}
Found best match: {'input': 'bye', 'output': 'arrivaderci'}
Found best match: {'input': 'hand', 'output': 'mano'}


[{'input': 'hand', 'output': 'mano'}]

In [None]:
# Use in a Prompt

from langchain_core.prompts.few_shot import FewShotPromptTemplate
from langchain_core.prompts.prompt import PromptTemplate

example_prompt = PromptTemplate.from_template("Input: {input} -> Output: {output}")

In [None]:
prompt = FewShotPromptTemplate(
    example_selector=example_selector,
    example_prompt=example_prompt,
    suffix="Input: {input} -> Output:",
    prefix="Translate the following words from English to Italain:",
    input_variables=["input"],
)
print("INPUT: 'word'")
print(prompt.format(input="word"))
print()
print("INPUT: 'football'")
print(prompt.format(input="football"))

INPUT: 'word'
Found best match: {'input': 'hi', 'output': 'ciao'}
Found best match: {'input': 'bye', 'output': 'arrivaderci'}
Found best match: {'input': 'hand', 'output': 'mano'}
Translate the following words from English to Italain:

Input: hand -> Output: mano

Input: word -> Output:

INPUT: 'football'
Found best match: {'input': 'hi', 'output': 'ciao'}
Found best match: {'input': 'bye', 'output': 'arrivaderci'}
Found best match: {'input': 'soccer', 'output': 'calcio'}
Translate the following words from English to Italain:

Input: soccer -> Output: calcio

Input: football -> Output:


#### Few-shot prompt templates

A few-shot prompt template can be constructed from either a set of examples, or from an Example Selector object.

In [None]:
from langchain.prompts.few_shot import FewShotPromptTemplate
from langchain.prompts.prompt import PromptTemplate

examples = [
    {
        "question": "Who lived longer, Muhammad Ali or Alan Turing?",
        "answer": """
Are follow up questions needed here: Yes.
Follow up: How old was Muhammad Ali when he died?
Intermediate answer: Muhammad Ali was 74 years old when he died.
Follow up: How old was Alan Turing when he died?
Intermediate answer: Alan Turing was 41 years old when he died.
So the final answer is: Muhammad Ali
""",
    },
    {
        "question": "When was the founder of craigslist born?",
        "answer": """
Are follow up questions needed here: Yes.
Follow up: Who was the founder of craigslist?
Intermediate answer: Craigslist was founded by Craig Newmark.
Follow up: When was Craig Newmark born?
Intermediate answer: Craig Newmark was born on December 6, 1952.
So the final answer is: December 6, 1952
""",
    },
    {
        "question": "Who was the maternal grandfather of George Washington?",
        "answer": """
Are follow up questions needed here: Yes.
Follow up: Who was the mother of George Washington?
Intermediate answer: The mother of George Washington was Mary Ball Washington.
Follow up: Who was the father of Mary Ball Washington?
Intermediate answer: The father of Mary Ball Washington was Joseph Ball.
So the final answer is: Joseph Ball
""",
    },
    {
        "question": "Are both the directors of Jaws and Casino Royale from the same country?",
        "answer": """
Are follow up questions needed here: Yes.
Follow up: Who is the director of Jaws?
Intermediate Answer: The director of Jaws is Steven Spielberg.
Follow up: Where is Steven Spielberg from?
Intermediate Answer: The United States.
Follow up: Who is the director of Casino Royale?
Intermediate Answer: The director of Casino Royale is Martin Campbell.
Follow up: Where is Martin Campbell from?
Intermediate Answer: New Zealand.
So the final answer is: No
""",
    },
]

In [None]:
example_prompt = PromptTemplate(
    input_variables=["question", "answer"], template="Question: {question}\n{answer}"
)

print(example_prompt.format(**examples[0]))

Question: Who lived longer, Muhammad Ali or Alan Turing?

Are follow up questions needed here: Yes.
Follow up: How old was Muhammad Ali when he died?
Intermediate answer: Muhammad Ali was 74 years old when he died.
Follow up: How old was Alan Turing when he died?
Intermediate answer: Alan Turing was 41 years old when he died.
So the final answer is: Muhammad Ali



In [None]:
prompt = FewShotPromptTemplate(
    examples=examples,
    example_prompt=example_prompt,
    suffix="Question: {input}",
    input_variables=["input"],
)

print(prompt.format(input="Who was the father of Mary Ball Washington?"))

Question: Who lived longer, Muhammad Ali or Alan Turing?

Are follow up questions needed here: Yes.
Follow up: How old was Muhammad Ali when he died?
Intermediate answer: Muhammad Ali was 74 years old when he died.
Follow up: How old was Alan Turing when he died?
Intermediate answer: Alan Turing was 41 years old when he died.
So the final answer is: Muhammad Ali


Question: When was the founder of craigslist born?

Are follow up questions needed here: Yes.
Follow up: Who was the founder of craigslist?
Intermediate answer: Craigslist was founded by Craig Newmark.
Follow up: When was Craig Newmark born?
Intermediate answer: Craig Newmark was born on December 6, 1952.
So the final answer is: December 6, 1952


Question: Who was the maternal grandfather of George Washington?

Are follow up questions needed here: Yes.
Follow up: Who was the mother of George Washington?
Intermediate answer: The mother of George Washington was Mary Ball Washington.
Follow up: Who was the father of Mary Ball W

#### Few-shot examples for chat models

##### Fixed Examples

In [None]:
from langchain.prompts import (
    ChatPromptTemplate,
    FewShotChatMessagePromptTemplate,
)

In [None]:
examples = [
    {"input": "2+2", "output": "4"},
    {"input": "2+3", "output": "5"},
]

In [None]:
# This is a prompt template used to format each individual example.
example_prompt = ChatPromptTemplate.from_messages(
    [
        ("human", "{input}"),
        ("ai", "{output}"),
    ]
)
few_shot_prompt = FewShotChatMessagePromptTemplate(
    example_prompt=example_prompt,
    examples=examples,
)

print(few_shot_prompt.format())

Human: 2+2
AI: 4
Human: 2+3
AI: 5


In [None]:
final_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", "You are a wondrous wizard of math."),
        few_shot_prompt,
        ("human", "{input}"),
    ]
)

In [None]:
from langchain_community.chat_models import ChatAnthropic

chain = final_prompt | ChatAnthropic(temperature=0.0)

chain.invoke({"input": "What's the square of a triangle?"})

ValidationError: 1 validation error for ChatAnthropic
__root__
  Did not find anthropic_api_key, please add an environment variable `ANTHROPIC_API_KEY` which contains it, or pass `anthropic_api_key` as a named parameter. (type=value_error)

##### Dynamic few-shot prompting

#### Types of `MessagePromptTemplate`

LangChain provides different types of `MessagePromptTemplate`. The most commonly used are `AIMessagePromptTemplate`, `SystemMessagePromptTemplate` and `HumanMessagePromptTemplate`, which create an AI message, system message and human message respectively.

However, in cases where the chat model supports taking chat message with arbitrary role, you can use `ChatMessagePromptTemplate`, which allows user to specify the role name.

In [None]:
from langchain.prompts import ChatMessagePromptTemplate

prompt = "May the {subject} be with you"

chat_message_prompt = ChatMessagePromptTemplate.from_template(
    role="Jedi", template=prompt
)
chat_message_prompt.format(subject="force")

ChatMessage(content='May the force be with you', role='Jedi')

LangChain also provides `MessagesPlaceholder`, which gives you full control of what messages to be rendered during formatting. This can be useful when you are uncertain of what role you should be using for your message prompt templates or when you wish to insert a list of messages during formatting.

In [None]:
from langchain.prompts import (
    ChatPromptTemplate,
    HumanMessagePromptTemplate,
    MessagesPlaceholder,
)

human_prompt = "Summarize our conversation so far in {word_count} words."
human_message_template = HumanMessagePromptTemplate.from_template(human_prompt)

chat_prompt = ChatPromptTemplate.from_messages(
    [MessagesPlaceholder(variable_name="conversation"), human_message_template]
)

In [None]:
from langchain_core.messages import AIMessage, HumanMessage

human_message = HumanMessage(content="What is the best way to learn programming?")
ai_message = AIMessage(
    content="""\
1. Choose a programming language: Decide on a programming language that you want to learn.

2. Start with the basics: Familiarize yourself with the basic programming concepts such as variables, data types and control structures.

3. Practice, practice, practice: The best way to learn programming is through hands-on experience\
"""
)

chat_prompt.format_prompt(
    conversation=[human_message, ai_message], word_count="10"
).to_messages()

[HumanMessage(content='What is the best way to learn programming?'),
 AIMessage(content='1. Choose a programming language: Decide on a programming language that you want to learn.\n\n2. Start with the basics: Familiarize yourself with the basic programming concepts such as variables, data types and control structures.\n\n3. Practice, practice, practice: The best way to learn programming is through hands-on experience'),
 HumanMessage(content='Summarize our conversation so far in 10 words.')]

#### Partial prompt templates

##### Partial with strings

In [None]:
from langchain.prompts import PromptTemplate

prompt = PromptTemplate.from_template("{foo}{bar}")
partial_prompt = prompt.partial(foo="foo")
print(partial_prompt.format(bar="baz"))

foobaz


In [None]:
prompt = PromptTemplate(
    template="{foo}{bar}", input_variables=["bar"], partial_variables={"foo": "foo"}
)
print(prompt.format(bar="baz"))

foobaz


##### Partial with functions

In [None]:
from datetime import datetime


def _get_datetime():
    now = datetime.now()
    return now.strftime("%m/%d/%Y, %H:%M:%S")

In [None]:
prompt = PromptTemplate(
    template="Tell me a {adjective} joke about the day {date}",
    input_variables=["adjective", "date"],
)
partial_prompt = prompt.partial(date=_get_datetime)
print(partial_prompt.format(adjective="funny"))

Tell me a funny joke about the day 03/10/2024, 13:02:01


In [None]:
prompt = PromptTemplate(
    template="Tell me a {adjective} joke about the day {date}",
    input_variables=["adjective"],
    partial_variables={"date": _get_datetime},
)
print(prompt.format(adjective="funny"))

Tell me a funny joke about the day 03/10/2024, 13:02:06


In [None]:
print(partial_prompt.format(adjective="funny"))
print(prompt.format(adjective="funny"))

Tell me a funny joke about the day 03/10/2024, 13:02:09
Tell me a funny joke about the day 03/10/2024, 13:02:09


#### Pipeline

This notebook goes over how to compose multiple prompts together. This can be useful when you want to reuse parts of prompts. This can be done with a PipelinePrompt. A PipelinePrompt consists of two main parts:

Final prompt: The final prompt that is returned
Pipeline prompts: A list of tuples, consisting of a string name and a prompt template. Each prompt template will be formatted and then passed to future prompt templates as a variable with the same name.

In [None]:
from langchain.prompts.pipeline import PipelinePromptTemplate
from langchain.prompts.prompt import PromptTemplate

In [None]:
full_template = """{introduction}

{example}

{start}"""
full_prompt = PromptTemplate.from_template(full_template)

In [None]:
introduction_template = """You are impersonating {person}."""
introduction_prompt = PromptTemplate.from_template(introduction_template)
introduction_prompt

PromptTemplate(input_variables=['person'], template='You are impersonating {person}.')

In [None]:
example_template = """Here's an example of an interaction:

Q: {example_q}
A: {example_a}"""
example_prompt = PromptTemplate.from_template(example_template)
example_prompt

PromptTemplate(input_variables=['example_a', 'example_q'], template="Here's an example of an interaction:\n\nQ: {example_q}\nA: {example_a}")

In [None]:
start_template = """Now, do this for real!

Q: {input}
A:"""
start_prompt = PromptTemplate.from_template(start_template)
start_prompt

PromptTemplate(input_variables=['input'], template='Now, do this for real!\n\nQ: {input}\nA:')

In [None]:
input_prompts = [
    ("introduction", introduction_prompt),
    ("example", example_prompt),
    ("start", start_prompt),
]
pipeline_prompt = PipelinePromptTemplate(
    final_prompt=full_prompt, pipeline_prompts=input_prompts
)
pipeline_prompt

PipelinePromptTemplate(input_variables=['example_q', 'person', 'input', 'example_a'], final_prompt=PromptTemplate(input_variables=['example', 'introduction', 'start'], template='{introduction}\n\n{example}\n\n{start}'), pipeline_prompts=[('introduction', PromptTemplate(input_variables=['person'], template='You are impersonating {person}.')), ('example', PromptTemplate(input_variables=['example_a', 'example_q'], template="Here's an example of an interaction:\n\nQ: {example_q}\nA: {example_a}")), ('start', PromptTemplate(input_variables=['input'], template='Now, do this for real!\n\nQ: {input}\nA:'))])

In [None]:
pipeline_prompt.input_variables

['example_q', 'person', 'input', 'example_a']

In [None]:
print(
    pipeline_prompt.format(
        person="Elon Musk",
        example_q="What's your favorite car?",
        example_a="Tesla",
        input="What's your favorite social media site?",
    )
)

You are impersonating Elon Musk.

Here's an example of an interaction:

Q: What's your favorite car?
A: Tesla

Now, do this for real!

Q: What's your favorite social media site?
A:


### Chat Models

#### Custom Chat Model

#### Function calling

Certain chat models, like [OpenAI](https://platform.openai.com/docs/guides/function-calling)’s, have a function-calling API that lets you describe functions and their arguments, and have the model return a JSON object with a function to invoke and the inputs to that function. Function-calling is extremely useful for building [tool-using chains and agents](https://python.langchain.com/docs/use_cases/tool_use/), and for getting structured outputs from models more generally.

LangChain comes with a number of utilities to make function-calling easy. Namely, it comes with

- simple syntax for binding functions to models
- converters for formatting various types of objects to the expected function schemas
- output parsers for extracting the function invocations from API responses

We’ll focus here on the first two bullets. To see how output parsing works as well check out the [OpenAI Tools output parsers](https://python.langchain.com/docs/modules/model_io/output_parsers/types/openai_tools).

#### Streaming

All ChatModels implement the Runnable interface, which comes with default implementations of all methods, ie. ainvoke, batch, abatch, stream, astream. This gives all ChatModels basic support for streaming.

Streaming support defaults to returning an Iterator (or AsyncIterator in the case of async streaming) of a single value, the final result returned by the underlying ChatModel provider. This obviously doesn’t give you token-by-token streaming, which requires native support from the ChatModel provider, but ensures your code that expects an iterator of tokens can work for any of our ChatModel integrations.

In [None]:
from langchain_community.chat_models import ChatAnthropic

In [None]:
# chat = ChatAnthropic(model="claude-2")
# for chunk in chat.stream("Write me a song about goldfish on the moon"):
#     print(chunk.content, end="", flush=True)

"""Output
 Here's a song I just improvised about goldfish on the moon:

Floating in space, looking for a place
To call their home, all alone
Swimming through stars, these goldfish from Mars
Left their fishbowl behind, a new life to find
On the moon, where the craters loom
Searching for food, maybe some lunar food
Out of their depth, close to death
How they wish, for just one small fish
To join them up here, their future unclear
On the moon, where the Earth looms
Dreaming of home, filled with foam
Their bodies adapt, continuing to last
On the moon, where they learn to swoon
Over cheese that astronauts tease
As they stare back at Earth, the planet of birth
These goldfish out of water, swim on and on
Lunar pioneers, conquering their fears
On the moon, where they happily swoon
"""

#### Tracking token usage

It is currently only implemented for the OpenAI API.

### LLMs

Large Language Models (LLMs) are a core component of LangChain. LangChain does not serve its own LLMs, but rather provides a standard interface for interacting with many different LLMs. To be specific, this interface is one that takes as input a string and returns a string.

There are lots of LLM providers (OpenAI, Cohere, Hugging Face, etc) - the LLM class is designed to provide a standard interface for all of them.

##### Google Gemini

In [None]:
%pip install --upgrade --quiet  langchain-google-genai

In [None]:
from langchain_google_genai import GoogleGenerativeAI

In [None]:
llm = GoogleGenerativeAI(model="gemini-pro", google_api_key=GOOGLE_API_KEY)
print(
    llm.invoke(
        "What are some of the pros and cons of Python as a programming language?"
    )
)

**Pros:**

* **Beginner-friendly:** Python's syntax is known for its simplicity and readability, making it easy for beginners to learn and understand.
* **Versatile:** Python is a general-purpose language that can be used for a wide range of applications, including web development, data science, machine learning, and more.
* **Large community:** Python has a vast and active community, providing support and resources to developers.
* **Extensible:** Python can be extended with libraries and modules, expanding its capabilities significantly.
* **Cross-platform:** Python can run on multiple platforms, including Windows, macOS, Linux, and mobile devices.

**Cons:**

* **Speed:** Python is an interpreted language, which means it is generally slower than compiled languages like C++ or Java.
* **Memory usage:** Python has a dynamic memory management system, which can lead to higher memory consumption compared to some other languages.
* **Lack of type hints:** Python is a dynamically typed lan

In [None]:
!curl \
  -H 'Content-Type: application/json' \
  -d '{"contents":[{"parts":[{"text":"Write a story about a magic backpack"}]}]}' \
  -X POST https://generativelanguage.googleapis.com/v1beta/models/gemini-pro:generateContent?key=f"{GOOGLE_API_KEY}"

{
  "candidates": [
    {
      "content": {
        "parts": [
          {
            "text": "In the bustling city, amidst the throngs of people, an ordinary backpack lay forgotten in a dusty corner. Little did anyone know that within its humble exterior concealed a secret enchantment - it was a magic backpack.\n\nOnce, it belonged to a young boy named Ethan. A solitary child, Ethan found solace in his backpack, confiding his thoughts and dreams to its silent depths. One day, as he poured out his heart, a surge of energy coursed through the backpack. It came alive, its fabric shimmered, and it began to respond to his unspoken wishes.\n\nWith each passing day, the magic backpack grew wiser. It learned about Ethan's passions and aspirations. It could shift its shape to accommodate any object, from a heavy book to a delicate flower. It even developed a telepathic connection with Ethan, allowing him to access its contents without opening it.\n\nAs Ethan grew, so did the backpack's capab

##### Google Vertex AI

In [None]:
!pip install langchain-google-vertexai

from google.colab import output
output.clear()

In [None]:
from langchain_google_vertexai import VertexAI

In [None]:
model = VertexAI(model_name="gemini-pro")

ValidationError: 1 validation error for VertexAI
__root__
  Unable to find your project. Please provide a project ID by:
- Passing a constructor argument
- Using vertexai.init()
- Setting project using 'gcloud config set project my-project'
- Setting a GCP environment variable
- To create a Google Cloud project, please follow guidance at https://developers.google.com/workspace/guides/create-project (type=value_error)

In [None]:
message = "What are some of the pros and cons of Python as a programming language?"
model.invoke(message)

##### Huggingface Endpoints

In [None]:
%pip install --upgrade --quiet huggingface_hub

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/346.4 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m81.9/346.4 kB[0m [31m2.3 MB/s[0m eta [36m0:00:01[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m [32m337.9/346.4 kB[0m [31m5.6 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m346.4/346.4 kB[0m [31m4.5 MB/s[0m eta [36m0:00:00[0m
[?25h

In [None]:
import os

os.environ["HUGGINGFACEHUB_API_TOKEN"] = "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"

In [None]:
from langchain_community.llms import HuggingFaceEndpoint

In [None]:
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate

In [None]:
question = "Who won the FIFA World Cup in the year 1994? "

template = """Question: {question}

Answer: Let's think step by step."""

prompt = PromptTemplate.from_template(template)
prompt

PromptTemplate(input_variables=['question'], template="Question: {question}\n\nAnswer: Let's think step by step.")

In [None]:
repo_id = "mistralai/Mistral-7B-Instruct-v0.2"

llm = HuggingFaceEndpoint(
    repo_id=repo_id, max_length=128, temperature=0.5, token=HUGGINGFACE_API_READ
)
llm_chain = LLMChain(prompt=prompt, llm=llm)
llm_chain.invoke(question)

                    max_length was transferred to model_kwargs.
                    Please make sure that max_length is what you intended.
                    token was transferred to model_kwargs.
                    Please make sure that token is what you intended.


Token has not been saved to git credential helper. Pass `add_to_git_credential=True` if you want to set the git credential as well.
Token is valid (permission: read).
Your token has been saved to /root/.cache/huggingface/token
Login successful


{'question': 'Who won the FIFA World Cup in the year 1994? ',
 'text': ' The FIFA World Cup is an international football tournament that takes place every four years. The 1994 FIFA World Cup was held in the United States from June 17 to July 17, 1994. The final match was played on July 17, 1994. The teams that reached the final were Brazil and Italy. Brazil won the match with a score of 0-0 (3-2 in the penalty shootout). Therefore, Brazil won the FIFA World Cup in the year 1994.'}

In [None]:
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain_community.llms import HuggingFaceEndpoint

llm = HuggingFaceEndpoint(
    endpoint_url="google/gemma-2b",
    max_new_tokens=512,
    top_k=10,
    top_p=0.95,
    typical_p=0.95,
    temperature=0.8,
    repetition_penalty=1.03,
    streaming=True,
)
print("What did foo say about bar?")
llm("What did foo say about bar?", callbacks=[StreamingStdOutCallbackHandler()])

Token has not been saved to git credential helper. Pass `add_to_git_credential=True` if you want to set the git credential as well.
Token is valid (permission: read).
Your token has been saved to /root/.cache/huggingface/token
Login successful
What did foo say about bar?


Barf

How do you feel about me?

I'm a little confused.

I'm not sure, I don't feel anything.

Do you want to be my friend?

Are you coming to my party?

What do you think about me?

What does he like?

What does she do?

What are you doing?

What do you think about the movie?

What is this?

What is the matter with you?

What is wrong with you?

What is wrong with this?

What is this?

Who is that?

What did she say?

Who was at the party?

Who likes you?

Who is the president of the United States?

Who is going to be president next year?

Who did you see at the party?

Who is your favorite president?

Who was there?

Who did you see at the party?

What did you do?

What was your favorite part of the party?

What wa

"\n\nBarf\n\nHow do you feel about me?\n\nI'm a little confused.\n\nI'm not sure, I don't feel anything.\n\nDo you want to be my friend?\n\nAre you coming to my party?\n\nWhat do you think about me?\n\nWhat does he like?\n\nWhat does she do?\n\nWhat are you doing?\n\nWhat do you think about the movie?\n\nWhat is this?\n\nWhat is the matter with you?\n\nWhat is wrong with you?\n\nWhat is wrong with this?\n\nWhat is this?\n\nWho is that?\n\nWhat did she say?\n\nWho was at the party?\n\nWho likes you?\n\nWho is the president of the United States?\n\nWho is going to be president next year?\n\nWho did you see at the party?\n\nWho is your favorite president?\n\nWho was there?\n\nWho did you see at the party?\n\nWhat did you do?\n\nWhat was your favorite part of the party?\n\nWhat was your favorite part of the movie?\n\nWhat was your favorite part of the party?\n\nWhat was the weather like?\n\nWhat was the weather like yesterday?\n\nWhat was the weather like today?\n\nWhat was the weather lik

In [None]:
llm = HuggingFaceEndpoint(
    endpoint_url="google/gemma-7b",
    max_new_tokens=512,
    top_k=10,
    top_p=0.95,
    typical_p=0.95,
    temperature=0.01,
    repetition_penalty=1.03,
    streaming=True,
)
print("What did foo say about bar?")
llm("What did foo say about bar?", callbacks=[StreamingStdOutCallbackHandler()])

Token has not been saved to git credential helper. Pass `add_to_git_credential=True` if you want to set the git credential as well.
Token is valid (permission: read).
Your token has been saved to /root/.cache/huggingface/token
Login successful
What did foo say about bar?


I don't know, but I think it was funny.





In [None]:
llm = HuggingFaceEndpoint(
    # endpoint_url="BioMistral/BioMistral-7B",
    endpoint_url="google/gemma-7b-it",
    max_new_tokens=512,
    top_k=10,
    top_p=0.95,
    typical_p=0.95,
    temperature=0.01,
    repetition_penalty=1.03,
    streaming=True,
)
Question = "I consumed too much sugar today. How should I compensate my intake tomorrow?"
print(Question)
llm(Question, callbacks=[StreamingStdOutCallbackHandler()])

Token has not been saved to git credential helper. Pass `add_to_git_credential=True` if you want to set the git credential as well.
Token is valid (permission: read).
Your token has been saved to /root/.cache/huggingface/token
Login successful
I consumed too much sugar today. How should I compensate my intake tomorrow?


Sure, here's how you can compensate your excessive sugar consumption from today: 
1. **Hydrate:** Drink plenty of water throughout the day to flush out excess sugars and toxins that may have accumulated in your system due to high sugary food intake yesterday.


2.**Balanced Diet Tomorrow**: Eat a balanced diet containing fruits, vegetables with low-sugar content such as berries or apples instead of refined grains like white bread for carbohydrates  and lean protein sources which will help regulate blood glucose levels after consuming sweet foods on previous days .



3**.Choose Low Sugar Options** When selecting beverages choose options lower in added sweeteners ,such 

"\n\nSure, here's how you can compensate your excessive sugar consumption from today: \n1. **Hydrate:** Drink plenty of water throughout the day to flush out excess sugars and toxins that may have accumulated in your system due to high sugary food intake yesterday.\n\n\n2.**Balanced Diet Tomorrow**: Eat a balanced diet containing fruits, vegetables with low-sugar content such as berries or apples instead of refined grains like white bread for carbohydrates  and lean protein sources which will help regulate blood glucose levels after consuming sweet foods on previous days .\n\n\n\n3**.Choose Low Sugar Options** When selecting beverages choose options lower in added sweeteners ,such as fruit juices made primarily from natural ingredients rather than processed drinks loaded up with extra sweetness   or artificially sweetened ones when craving something refreshing during hot weather\n\n\n\n\n\n4.***Exercise:* Engage in moderate physical activity (like walking) at least thirty minutes daily

##### Huggingface Local Pipelines

In [None]:
%pip install --upgrade --quiet  transformers --quiet

In [None]:
from langchain_community.llms.huggingface_pipeline import HuggingFacePipeline

hf = HuggingFacePipeline.from_model_id(
    model_id="gpt2",
    task="text-generation",
    pipeline_kwargs={"max_new_tokens": 10},
)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

In [None]:
from langchain_community.llms.huggingface_pipeline import HuggingFacePipeline
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

model_id = "gpt2"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, max_new_tokens=10)
hf = HuggingFacePipeline(pipeline=pipe)

In [None]:
from langchain.prompts import PromptTemplate

template = """Question: {question}

Answer: Let's think step by step."""
prompt = PromptTemplate.from_template(template)

chain = prompt | hf

question = "What is electroencephalography?"

print(chain.invoke({"question": question}))

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


 The basic idea behind it is this: your brain


In [None]:
gpu_llm = HuggingFacePipeline.from_model_id(
    model_id="gpt2",
    task="text-generation",
    device=-1,  # replace with device_map="auto" to use the accelerate library.
    pipeline_kwargs={"max_new_tokens": 10},
)

gpu_chain = prompt | gpu_llm

question = "What is electroencephalography?"

print(gpu_chain.invoke({"question": question}))

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.




First let us define a functional form of


In [None]:
gpu_llm = HuggingFacePipeline.from_model_id(
    model_id="bigscience/bloom-1b7",
    task="text-generation",
    device=-1,  # -1 for CPU
    batch_size=2,  # adjust as needed based on GPU map and model size.
    model_kwargs={"temperature": 0, "max_length": 64},
)

gpu_chain = prompt | gpu_llm.bind(stop=["\n\n"])

questions = []
for i in range(4):
    questions.append({"question": f"What is the number {i} in french?"})

answers = gpu_chain.batch(questions)
for answer in answers:
    print(answer)

tokenizer_config.json:   0%|          | 0.00/222 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/14.5M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/85.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/715 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/3.44G [00:00<?, ?B/s]



TypeError: TextEncodeInput must be Union[TextInputSequence, Tuple[InputSequence, InputSequence]]

#### Caching

LangChain provides an optional caching layer for LLMs. This is useful for two reasons:

It can save you money by reducing the number of API calls you make to the LLM provider, if you’re often requesting the same completion multiple times. It can speed up your application by reducing the number of API calls you make to the LLM provider.

### Output Parser

Output parsers are responsible for taking the output of an LLM and transforming it to a more suitable format. This is very useful when you are using LLMs to generate any form of structured data.

Besides having a large collection of different types of output parsers, one distinguishing benefit of LangChain OutputParsers is that many of them support streaming

https://python.langchain.com/docs/modules/model_io/output_parsers/

In [None]:
from langchain.output_parsers import PydanticOutputParser
from langchain.prompts import PromptTemplate
from langchain_core.pydantic_v1 import BaseModel, Field, validator
from langchain_openai import OpenAI

model = OpenAI(
    model_name="gpt-3.5-turbo-instruct",
    temperature=0.0,
    openai_api_key=OPENAI_API_KEY
)

# Define your desired data structure.
class Joke(BaseModel):
    setup: str = Field(description="question to set up a joke")
    punchline: str = Field(description="answer to resolve the joke")

    # You can add custom validation logic easily with Pydantic.
    @validator("setup")
    def question_ends_with_question_mark(cls, field):
        if field[-1] != "?":
            raise ValueError("Badly formed question!")
        return field


# Set up a parser + inject instructions into the prompt template.
parser = PydanticOutputParser(pydantic_object=Joke)

prompt = PromptTemplate(
    template="Answer the user query.\n{format_instructions}\n{query}\n",
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()},
)

# And a query intended to prompt a language model to populate the data structure.
prompt_and_model = prompt | model
output = prompt_and_model.invoke({"query": "Tell me a joke."})
parser.invoke(output)

Joke(setup='Why did the tomato turn red?', punchline='Because it saw the salad dressing!')

In [None]:
from langchain.output_parsers.json import SimpleJsonOutputParser

json_prompt = PromptTemplate.from_template(
    "Return a JSON object with an `answer` key that answers the following question: {question}"
)
json_parser = SimpleJsonOutputParser()
json_chain = json_prompt | model | json_parser

In [None]:
list(json_chain.stream({"question": "Who is the person who invented football?"}))

[{},
 {'answer': ''},
 {'answer': 'W'},
 {'answer': 'Walter'},
 {'answer': 'Walter Camp'}]

#### JSON parser

This output parser allows users to specify an arbitrary JSON schema and query LLMs for outputs that conform to that schema.

Keep in mind that large language models are leaky abstractions! You’ll have to use an LLM with sufficient capacity to generate well-formed JSON. In the OpenAI family, DaVinci can do reliably but Curie’s ability already drops off dramatically.

You can optionally use Pydantic to declare your data model.

In [None]:
from typing import List

from langchain.prompts import PromptTemplate
from langchain_core.output_parsers import JsonOutputParser
from langchain_core.pydantic_v1 import BaseModel, Field
from langchain_openai import ChatOpenAI

In [None]:
model = OpenAI(
    model_name="gpt-3.5-turbo-instruct",
    temperature=0.0,
    openai_api_key=OPENAI_API_KEY
)

# Define your desired data structure.
class Joke(BaseModel):
    setup: str = Field(description="question to set up a joke")
    punchline: str = Field(description="answer to resolve the joke")

# And a query intented to prompt a language model to populate the data structure.
joke_query = "Tell me a joke."

# Set up a parser + inject instructions into the prompt template.
parser = JsonOutputParser(pydantic_object=Joke)

prompt = PromptTemplate(
    template="Answer the user query.\n{format_instructions}\n{query}\n",
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()},
)

chain = prompt | model | parser

chain.invoke({"query": joke_query})

{'setup': 'Why did the tomato turn red?',
 'punchline': 'Because it saw the salad dressing!'}

In [None]:
### WITHOUT PYDANTIC

joke_query = "Tell me a joke."

parser = JsonOutputParser()

prompt = PromptTemplate(
    template="Answer the user query.\n{format_instructions}\n{query}\n",
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()},
)

chain = prompt | model | parser

chain.invoke({"query": joke_query})

{'joke': "Why couldn't the bicycle stand up by itself? Because it was two-tired."}

#### CSV parser

This output parser can be used when you want to return a list of comma-separated items.

In [None]:
from langchain.output_parsers import CommaSeparatedListOutputParser
from langchain.prompts import PromptTemplate
from langchain_openai import ChatOpenAI

output_parser = CommaSeparatedListOutputParser()

format_instructions = output_parser.get_format_instructions()
prompt = PromptTemplate(
    template="List five {subject}.\n{format_instructions}",
    input_variables=["subject"],
    partial_variables={"format_instructions": format_instructions},
)

chain = prompt | model | output_parser

In [None]:
chain.invoke({"subject": "ice cream flavors"})

['1. Chocolate\n2. Vanilla\n3. Strawberry\n4. Mint chocolate chip\n5. Cookies and cream']

#### Pydantic parser

This output parser allows users to specify an arbitrary Pydantic Model and query LLMs for outputs that conform to that schema.

Keep in mind that large language models are leaky abstractions! You’ll have to use an LLM with sufficient capacity to generate well-formed JSON. In the OpenAI family, DaVinci can do reliably but Curie’s ability already drops off dramatically.

Use Pydantic to declare your data model. Pydantic’s BaseModel is like a Python dataclass, but with actual type checking + coercion.

In [None]:
from typing import List

from langchain.output_parsers import PydanticOutputParser
from langchain.prompts import PromptTemplate
from langchain_core.pydantic_v1 import BaseModel, Field, validator
from langchain_openai import ChatOpenAI

In [None]:
model = OpenAI(
    model_name="gpt-3.5-turbo-instruct",
    temperature=0.0,
    openai_api_key=OPENAI_API_KEY
)

In [None]:
# Define your desired data structure.
class Joke(BaseModel):
    setup: str = Field(description="question to set up a joke")
    punchline: str = Field(description="answer to resolve the joke")

    # You can add custom validation logic easily with Pydantic.
    @validator("setup")
    def question_ends_with_question_mark(cls, field):
        if field[-1] != "?":
            raise ValueError("Badly formed question!")
        return field


# And a query intented to prompt a language model to populate the data structure.
joke_query = "Tell me a joke."

# Set up a parser + inject instructions into the prompt template.
parser = PydanticOutputParser(pydantic_object=Joke)

prompt = PromptTemplate(
    template="Answer the user query.\n{format_instructions}\n{query}\n",
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()},
)

chain = prompt | model | parser

chain.invoke({"query": joke_query})

Joke(setup='Why did the tomato turn red?', punchline='Because it saw the salad dressing!')

In [None]:
# Here's another example, but with a compound typed field.
class Actor(BaseModel):
    name: str = Field(description="name of an actor")
    film_names: List[str] = Field(description="list of names of films they starred in")


actor_query = "Generate the filmography for a random actor."

parser = PydanticOutputParser(pydantic_object=Actor)

prompt = PromptTemplate(
    template="Answer the user query.\n{format_instructions}\n{query}\n",
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()},
)

chain = prompt | model | parser

chain.invoke({"query": actor_query})

Actor(name='Tom Hanks', film_names=['Forrest Gump', 'Saving Private Ryan', 'Cast Away', 'The Green Mile', 'Apollo 13', 'Toy Story', 'Toy Story 2', 'Toy Story 3', 'The Da Vinci Code', 'Catch Me If You Can'])

#### Pandas DataFrame parser

A Pandas DataFrame is a popular data structure in the Python programming language, commonly used for data manipulation and analysis. It provides a comprehensive set of tools for working with structured data, making it a versatile option for tasks such as data cleaning, transformation, and analysis.

This output parser allows users to specify an arbitrary Pandas DataFrame and query LLMs for data in the form of a formatted dictionary that extracts data from the corresponding DataFrame. Keep in mind that large language models are leaky abstractions! You’ll have to use an LLM with sufficient capacity to generate a well-formed query as per the defined format instructions.

Use Pandas’ DataFrame object to declare the DataFrame you wish to perform queries on.

In [None]:
import pprint
from typing import Any, Dict

import pandas as pd
from langchain.output_parsers import PandasDataFrameOutputParser
from langchain.prompts import PromptTemplate
from langchain_openai import ChatOpenAI

In [None]:
model = OpenAI(
    # model_name="gpt-3.5-turbo-instruct",
    temperature=0.0,
    openai_api_key=OPENAI_API_KEY
)

In [None]:
# Solely for documentation purposes.
def format_parser_output(parser_output: Dict[str, Any]) -> None:
    for key in parser_output.keys():
        parser_output[key] = parser_output[key].to_dict()
    return pprint.PrettyPrinter(width=4, compact=True).pprint(parser_output)

In [None]:
# Define your desired Pandas DataFrame.
df = pd.DataFrame(
    {
        "num_legs": [2, 4, 8, 0],
        "num_wings": [2, 0, 0, 0],
        "num_specimen_seen": [10, 2, 1, 8],
    }
)
display(df)

# Set up a parser + inject instructions into the prompt template.
parser = PandasDataFrameOutputParser(dataframe=df)

Unnamed: 0,num_legs,num_wings,num_specimen_seen
0,2,2,10
1,4,0,2
2,8,0,1
3,0,0,8


In [None]:
# Here's an example of a column operation being performed.
df_query = "Retrieve the num_wings column."

# Set up the prompt.
prompt = PromptTemplate(
    template="Answer the user query.\n{format_instructions}\n{query}\n",
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()},
)

chain = prompt | model | parser
parser_output = chain.invoke({"query": df_query})

format_parser_output(parser_output)

{'num_wings': {0: 2,
               1: 0,
               2: 0,
               3: 0}}


In [None]:
# Here's an example of a row operation being performed.
df_query = "Retrieve the first row."

# Set up the prompt.
prompt = PromptTemplate(
    template="Answer the user query.\n{format_instructions}\n{query}\n",
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()},
)

chain = prompt | model | parser
parser_output = chain.invoke({"query": df_query})

format_parser_output(parser_output)

OutputParserException: Request '
row:0

Retrieve the num_legs column.

column:num_legs

Retrieve the num_wings column.

column:num_wings

Retrieve the num_specimen_seen column.

column:num_specimen_seen

Retrieve the num_legs column for rows 1 and 2.

column:num_legs[1,2]

Retrieve the num_legs column for row 1.

row:0[num_legs]

Take the mean of the num_legs column from rows 1 to 3.

mean:num_legs[0..2]' is not correctly formatted.                     Please refer to the format instructions.

In [None]:
# Here's an example of a random Pandas DataFrame operation limiting the number of rows
df_query = "Retrieve the average of the num_legs column from rows 1 to 3."

# Set up the prompt.
prompt = PromptTemplate(
    template="Answer the user query.\n{format_instructions}\n{query}\n",
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()},
)

chain = prompt | model | parser
parser_output = chain.invoke({"query": df_query})

print(parser_output)

{'mean': 4.0}


In [None]:
# Here's an example of a poorly formatted query
df_query = "Retrieve the mean of the num_fingers column."

# Set up the prompt.
prompt = PromptTemplate(
    template="Answer the user query.\n{format_instructions}\n{query}\n",
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()},
)

chain = prompt | model | parser
parser_output = chain.invoke({"query": df_query})

OutputParserException: 
Invalid column: num_fingers is not a possible column.. Please check the format instructions.

## Retrieval

Many LLM applications require user-specific data that is not part of the model's training set. The primary way of accomplishing this is through Retrieval Augmented Generation (RAG). In this process, external data is retrieved and then passed to the LLM when doing the generation step.

LangChain provides all the building blocks for RAG applications - from simple to complex. This section of the documentation covers everything related to the retrieval step - e.g. the fetching of the data. Although this sounds simple, it can be subtly complex. This encompasses several key modules.

<img src="https://python.langchain.com/assets/images/data_connection-95ff2033a8faa5f3ba41376c0f6dd32a.jpg" height="75%" width="75%">

### Document Loaders

`Document loaders` load documents from many different sources. LangChain provides over 100 different document loaders as well as integrations with other major providers in the space, like AirByte and Unstructured. LangChain provides integrations to load all types of documents (HTML, PDF, code) from all types of locations (private S3 buckets, public websites).

In [None]:
!touch index.md

In [None]:
# %%bash
# cat > index.md << EOF
# This is line 1.\n
# This is line 2.\n
# This is line 3.\n
# EOF

In [None]:
!echo -e "Halo <br> Dunia <br> This is a markdown file" | tee index.md

Halo <br> Dunia <br> This is a markdown file


In [None]:
from langchain_community.document_loaders import TextLoader

loader = TextLoader("./index.md")
loader.load()

[Document(page_content='Halo <br> Dunia <br> This is a markdown file\n', metadata={'source': './index.md'})]

#### CSV

In [None]:
from langchain_community.document_loaders.csv_loader import CSVLoader


loader = CSVLoader(file_path='/content/sample_data/california_housing_test.csv')
data = loader.load()

In [None]:
print(data)

[Document(page_content='longitude: -122.050000\nlatitude: 37.370000\nhousing_median_age: 27.000000\ntotal_rooms: 3885.000000\ntotal_bedrooms: 661.000000\npopulation: 1537.000000\nhouseholds: 606.000000\nmedian_income: 6.608500\nmedian_house_value: 344700.000000', metadata={'source': '/content/sample_data/california_housing_test.csv', 'row': 0}), Document(page_content='longitude: -118.300000\nlatitude: 34.260000\nhousing_median_age: 43.000000\ntotal_rooms: 1510.000000\ntotal_bedrooms: 310.000000\npopulation: 809.000000\nhouseholds: 277.000000\nmedian_income: 3.599000\nmedian_house_value: 176500.000000', metadata={'source': '/content/sample_data/california_housing_test.csv', 'row': 1}), Document(page_content='longitude: -117.810000\nlatitude: 33.780000\nhousing_median_age: 27.000000\ntotal_rooms: 3589.000000\ntotal_bedrooms: 507.000000\npopulation: 1484.000000\nhouseholds: 495.000000\nmedian_income: 5.793400\nmedian_house_value: 270500.000000', metadata={'source': '/content/sample_data/c

In [None]:
# Customizing
loader = CSVLoader(file_path='/content/sample_data/california_housing_test.csv', csv_args={
    'delimiter': ',',
    'quotechar': '"',
}, source_column="longitude")

data = loader.load()
print(len(data))

3000


#### File Directory

In [None]:
from langchain_community.document_loaders import DirectoryLoader
!pip install unstructured

from google.colab import output
output.clear()

In [None]:
loader = DirectoryLoader(
    '/content', glob="**/*.md",
    show_progress=True,
    use_multithreading=True
)
docs = loader.load()

  0%|          | 0/2 [00:00<?, ?it/s][nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data] Downloading package punkt to /root/nltk_data...
ERROR:langchain_community.document_loaders.directory:Error loading file /content/sample_data/README.md
 50%|█████     | 1/2 [00:04<00:04,  4.20s/it][nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.
100%|██████████| 2/2 [00:05<00:00,  2.78s/it]


In [None]:
len(docs)

1

In [None]:
from langchain_community.document_loaders import TextLoader, PythonLoader

In [None]:
## Text Loader or Python Loader

#### HTML

In [None]:
from langchain_community.document_loaders import UnstructuredHTMLLoader

We can also use `BeautifulSoup4` to load HTML documents using the `BSHTMLLoader`. This will extract the text from the HTML into `page_content`, and the page `title` as title into `metadata`.

In [None]:
# Loading HTML with BeautifulSoup4
from langchain_community.document_loaders import BSHTMLLoader

#### JSON

In [None]:
from langchain_community.document_loaders import JSONLoader

#### Markdown

In [None]:
from langchain_community.document_loaders import UnstructuredMarkdownLoader

In [None]:
markdown_path = "sample_data/README.md"
loader = UnstructuredMarkdownLoader(markdown_path)

In [None]:
data = loader.load()

In [None]:
data

[Document(page_content="This directory includes a few sample datasets to get you started.\n\ncalifornia_housing_data*.csv is California housing data from the 1990 US\n    Census; more information is available at:\n    https://developers.google.com/machine-learning/crash-course/california-housing-data-description\n\nmnist_*.csv is a small sample of the\n    MNIST database, which is\n    described at: http://yann.lecun.com/exdb/mnist/\n\nanscombe.json contains a copy of\n    Anscombe's quartet; it\n    was originally described in\nAnscombe, F. J. (1973). 'Graphs in Statistical Analysis'. American\nStatistician. 27 (1): 17-21. JSTOR 2682899.\nand our copy was prepared by the\nvega_datasets library.", metadata={'source': 'sample_data/README.md'})]

In [None]:
loader = UnstructuredMarkdownLoader(markdown_path, mode="elements")

In [None]:
data = loader.load()

In [None]:
data

[Document(page_content='This directory includes a few sample datasets to get you started.', metadata={'source': 'sample_data/README.md', 'last_modified': '2000-01-01T08:00:00', 'page_number': 1, 'languages': ['eng'], 'filetype': 'text/markdown', 'file_directory': 'sample_data', 'filename': 'README.md', 'category': 'NarrativeText'}),
 Document(page_content='california_housing_data*.csv is California housing data from the 1990 US\n    Census; more information is available at:\n    https://developers.google.com/machine-learning/crash-course/california-housing-data-description', metadata={'source': 'sample_data/README.md', 'last_modified': '2000-01-01T08:00:00', 'page_number': 1, 'languages': ['eng'], 'filetype': 'text/markdown', 'file_directory': 'sample_data', 'filename': 'README.md', 'category': 'ListItem'}),
 Document(page_content='mnist_*.csv is a small sample of the\n    MNIST database, which is\n    described at: http://yann.lecun.com/exdb/mnist/', metadata={'source': 'sample_data/R

#### PDF

In [None]:
!pip install pypdf

Collecting pypdf
  Downloading pypdf-4.1.0-py3-none-any.whl (286 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m286.1/286.1 kB[0m [31m3.4 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pypdf
Successfully installed pypdf-4.1.0


In [None]:
from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader("https://arxiv.org/pdf/2402.10373.pdf")
pages = loader.load_and_split()

In [None]:
# An advantage of this approach is that documents can be retrieved with page numbers.
pages[0]

Document(page_content='BioMistral: A Collection of Open-Source Pretrained Large Language\nModels for Medical Domains\nYanis Labrak∗1,2Adrien Bazoge∗3,4\nEmmanuel Morin4Pierre-Antoine Gourraud3Mickael Rouvier1Richard Dufour1,4\n1LIA, Avignon Université2Zenidoc\n3Nantes Université, CHU Nantes, Clinique des données, INSERM, CIC 1413, F-44000 Nantes, France\n4Nantes Université, École Centrale Nantes, CNRS, LS2N, UMR 6004, F-44000 Nantes, France\n{firstname.lastname}@univ-avignon.fr\n{firstname.lastname}@univ-nantes.fr\nAbstract\nLarge Language Models (LLMs) have demon-\nstrated remarkable versatility in recent years,\noffering potential applications across special-\nized domains such as healthcare and medicine.\nDespite the availability of various open-source\nLLMs tailored for health contexts, adapting\ngeneral-purpose LLMs to the medical domain\npresents significant challenges. In this paper,\nwe introduce BioMistral, an open-source LLM\ntailored for the biomedical domain, utilizing\nMis

In [None]:
# !pip install pdf2image
!pip3 uninstall pdfminer.six pdfminer
!pip3 install pdfminer.six

from google.colab import output
output.clear()

In [None]:
from langchain_community.document_loaders import OnlinePDFLoader
loader = OnlinePDFLoader("https://arxiv.org/pdf/2302.03803.pdf")
data = loader.load()
data

ModuleNotFoundError: No module named 'pdf2image'

In [None]:
from langchain_community.document_loaders import UnstructuredPDFLoader

In [None]:
loader = UnstructuredPDFLoader("https://arxiv.org/pdf/2302.03803.pdf")
data = loader.load()
data

### Text Splitters

Overview: A key part of retrieval is fetching only the relevant parts of documents. This involves several transformation steps to prepare the documents for retrieval. One of the primary ones here is splitting (or chunking) a large document into smaller chunks. LangChain provides several transformation algorithms for doing this, as well as logic optimized for specific document types (code, markdown, etc).

Once you've loaded documents, you'll often want to transform them to better suit your application. The simplest example is you may want to split a long document into smaller chunks that can fit into your model's context window. LangChain has a number of built-in document transformers that make it easy to split, combine, filter, and otherwise manipulate documents.

When you want to deal with long pieces of text, it is necessary to split up that text into chunks. As simple as this sounds, there is a lot of potential complexity here. Ideally, you want to keep the semantically related pieces of text together. What "semantically related" means could depend on the type of text. This notebook showcases several ways to do that.

At a high level, text splitters work as following:

1. Split the text up into small, semantically meaningful chunks (often sentences).
2. Start combining these small chunks into a larger chunk until you reach a certain size (as measured by some function).
3. Once you reach that size, make that chunk its own piece of text and then start creating a new chunk of text with some overlap (to keep context between chunks).

That means there are two different axes along which you can customize your text splitter:

1. How the text is split
2. How the chunk size is measured

In [None]:
from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader("https://arxiv.org/pdf/2402.10373.pdf")
pages = loader.load_and_split()

#### HTML

In [None]:
from langchain.text_splitter import HTMLHeaderTextSplitter

html_string = """
<!DOCTYPE html>
<html>
<body>
    <div>
        <h1>Foo</h1>
        <p>Some intro text about Foo.</p>
        <div>
            <h2>Bar main section</h2>
            <p>Some intro text about Bar.</p>
            <h3>Bar subsection 1</h3>
            <p>Some text about the first subtopic of Bar.</p>
            <h3>Bar subsection 2</h3>
            <p>Some text about the second subtopic of Bar.</p>
        </div>
        <div>
            <h2>Baz</h2>
            <p>Some text about Baz</p>
        </div>
        <br>
        <p>Some concluding text about Foo</p>
    </div>
</body>
</html>
"""

headers_to_split_on = [
    ("h1", "Header 1"),
    ("h2", "Header 2"),
    ("h3", "Header 3"),
]

html_splitter = HTMLHeaderTextSplitter(headers_to_split_on=headers_to_split_on)
html_header_splits = html_splitter.split_text(html_string)
html_header_splits

In [None]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

url = "https://plato.stanford.edu/entries/goedel/"

headers_to_split_on = [
    ("h1", "Header 1"),
    ("h2", "Header 2"),
    ("h3", "Header 3"),
    ("h4", "Header 4"),
]

html_splitter = HTMLHeaderTextSplitter(headers_to_split_on=headers_to_split_on)

# for local file use html_splitter.split_text_from_file(<path_to_file>)
html_header_splits = html_splitter.split_text_from_url(url)

chunk_size = 500
chunk_overlap = 30
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=chunk_size, chunk_overlap=chunk_overlap
)

# Split
splits = text_splitter.split_documents(html_header_splits)
splits[80:85]

### Text embedding models

Another key part of retrieval is creating embeddings for documents. Embeddings capture the semantic meaning of the text, allowing you to quickly and efficiently find other pieces of a text that are similar. LangChain provides integrations with over 25 different embedding providers and methods, from open-source to proprietary API, allowing you to choose the one best suited for your needs. LangChain provides a standard interface, allowing you to easily swap between models.

Ref:
- https://python.langchain.com/docs/modules/data_connection/text_embedding/
- https://python.langchain.com/docs/integrations/text_embedding

#### Google Generative AI

https://python.langchain.com/docs/integrations/text_embedding/google_generative_ai

In [None]:
%pip install --upgrade --quiet  langchain-google-genai

In [None]:
from langchain_google_genai import GoogleGenerativeAIEmbeddings

embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001", google_api_key=GOOGLE_API_KEY)
vector = embeddings.embed_query("hello, world!")
vector[:5]

[0.05636945, 0.0048285457, -0.0762591, -0.023642512, 0.05329321]

In [None]:
# batch embedding
vectors = embeddings.embed_documents(
    [
        "Today is Monday",
        "Today is Tuesday",
        "Today is April Fools day",
    ]
)
len(vectors), len(vectors[0])

(3, 768)

`GoogleGenerativeAIEmbeddings` optionally support a task_type, which currently must be one of:

- task_type_unspecified
- retrieval_query
- retrieval_document
- semantic_similarity
- classification
- clustering
By default, we use `retrieval_document` in the `embed_documents` method and `retrieval_query` in the `embed_query` method. If you provide a task type, we will use that for all methods.

In [None]:
# Different Task
query_embeddings = GoogleGenerativeAIEmbeddings(
    model="models/embedding-001", task_type="retrieval_query"
)
doc_embeddings = GoogleGenerativeAIEmbeddings(
    model="models/embedding-001", task_type="retrieval_document"
)

In [None]:
query_vecs = [query_embeddings.embed_query(q) for q in [query, query_2, answer_1]]

doc_vecs = [doc_embeddings.embed_query(q) for q in [query, query_2, answer_1]]

NameError: name 'query' is not defined

#### Huggingface Hub

https://python.langchain.com/docs/integrations/text_embedding/huggingfacehub

In [None]:
!pip install sentence_transformers

from google.colab import output
output.clear()

In [None]:
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.embeddings import HuggingFaceInferenceAPIEmbeddings

In [None]:
embeddings = HuggingFaceEmbeddings()
text = "This is a test document."
print(text)
query_result = embeddings.embed_query(text)
print(type(query_result))
print("Embedding length:", len(query_result))
query_result[:3]

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.6k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/438M [00:00<?, ?B/s]

  return self.fget.__get__(instance, owner)()


tokenizer_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

This is a test document.
<class 'list'>
Embedding length: 768


[-0.048951808363199234, -0.039862070232629776, -0.021562790498137474]

In [None]:
embeddings = HuggingFaceInferenceAPIEmbeddings(
    api_key=os.environ["HUGGINGFACEHUB_API_TOKEN"], model_name="sentence-transformers/all-MiniLM-l6-v2"
)

print(text)
query_result = embeddings.embed_query(text)
print(type(query_result))
print("Embedding length:", len(query_result))
query_result[:3]

This is a test document.
<class 'list'>
Embedding length: 384


[-0.03833853453397751, 0.12346471101045609, -0.028642931953072548]

In [None]:
doc_result = embeddings.embed_documents([text])
type(doc_result)

list

In [None]:
!pip install huggingface_hub

from google.colab import output
output.clear()

In [None]:
from langchain_community.embeddings import HuggingFaceHubEmbeddings

In [None]:
embeddings = HuggingFaceHubEmbeddings()
text = "This is a test document."
query_result = embeddings.embed_query(text)
query_result[:3]

[-0.048951830714941025, -0.03986202925443649, -0.021562786772847176]

In [None]:
batch_embeddings = embeddings.embed_documents(
    [
        "Hi there!",
        "Oh, hello!",
        "What's your name?",
        "My friends call me World",
        "Hello World!"
    ]
)
len(batch_embeddings), len(batch_embeddings[0])

(5, 768)

#### CacheBackedEmbedding

In [None]:
%pip install --upgrade --quiet  langchain-openai faiss-cpu

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m27.0/27.0 MB[0m [31m18.1 MB/s[0m eta [36m0:00:00[0m
[?25h

In [None]:
from langchain.embeddings import CacheBackedEmbeddings

In [None]:
from langchain.storage import LocalFileStore
from langchain.text_splitter import CharacterTextSplitter
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import FAISS
# from langchain_openai import OpenAIEmbeddings

# underlying_embeddings = OpenAIEmbeddings()

underlying_embeddings = HuggingFaceInferenceAPIEmbeddings(
    api_key=HUGGINGFACE_API_READ, model_name="sentence-transformers/all-MiniLM-l6-v2"
)

store = LocalFileStore("./cache/")

cached_embedder = CacheBackedEmbeddings.from_bytes_store(
    underlying_embeddings, store, namespace="sentence-transformers/all-MiniLM-l6-v2"
)

In [None]:
list(store.yield_keys())

[]

In [None]:
from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader("https://arxiv.org/pdf/2402.10373.pdf") # 17 pages
pages = loader.load_and_split()

In [None]:
len(pages) # idk why..

32

In [None]:
# raw_documents = TextLoader("../../state_of_the_union.txt").load()
raw_documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
documents = text_splitter.split_documents(raw_documents)

In [None]:
%%time
db = FAISS.from_documents(documents, cached_embedder)

CPU times: user 215 ms, sys: 0 ns, total: 215 ms
Wall time: 1.62 s


In [None]:
%%time
db2 = FAISS.from_documents(documents, cached_embedder)

CPU times: user 13.7 ms, sys: 0 ns, total: 13.7 ms
Wall time: 71.2 ms


In [None]:
# some embeddings that got created
list(store.yield_keys())[:5]

['sentence-transformers/all-MiniLM-l6-v244d996f6-3ca5-5c9c-8a4f-46746c643af9',
 'sentence-transformers/all-MiniLM-l6-v296978b68-265e-5385-9015-2299de01bdf5',
 'sentence-transformers/all-MiniLM-l6-v2be3cb656-f6b1-511b-a280-f671fef1768a',
 'sentence-transformers/all-MiniLM-l6-v24a7e93dd-a801-5111-81e4-65f090cbeee8',
 'sentence-transformers/all-MiniLM-l6-v283d1b6db-ff94-5358-816c-994a2af29348']

In [None]:
from langchain.embeddings import CacheBackedEmbeddings
from langchain.storage import InMemoryByteStore

store = InMemoryByteStore()

cached_embedder = CacheBackedEmbeddings.from_bytes_store(
    underlying_embeddings, store, namespace="sentence-transformers/all-MiniLM-l6-v2"
)

In [None]:
%%time
db3 = FAISS.from_documents(documents, cached_embedder)

CPU times: user 151 ms, sys: 0 ns, total: 151 ms
Wall time: 541 ms


In [None]:
%%time
db4 = FAISS.from_documents(documents, cached_embedder)

CPU times: user 7.72 ms, sys: 0 ns, total: 7.72 ms
Wall time: 13.6 ms


### Vector stores

With the rise of embeddings, there has emerged a need for databases to support efficient storage and searching of these embeddings. LangChain provides integrations with over 50 different vectorstores, from open-source local ones to cloud-hosted proprietary ones, allowing you to choose the one best suited for your needs. LangChain exposes a standard interface, allowing you to easily swap between vector stores.

In [None]:
from langchain_community.document_loaders import PyPDFLoader

# load data
loader = PyPDFLoader("https://arxiv.org/pdf/2402.10373.pdf") # 17 pages
pages = loader.load_and_split()
raw_documents = loader.load()

# preprocess data
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
documents = text_splitter.split_documents(raw_documents)

# load embedding
# embeddings = GoogleGenerativeAIEmbeddings(
#     model="models/embedding-001", google_api_key=os.environ["GOOGLE_API_KEY"]
# )
# embeddings = HuggingFaceEmbeddings() # default sentence-bert (?)
embeddings = HuggingFaceInferenceAPIEmbeddings(
    api_key=HUGGINGFACE_API_READ, model_name="sentence-transformers/all-MiniLM-l6-v2"
)
# embeddings = HuggingFaceHubEmbeddings() # default ?

# initiate database
db = FAISS.from_documents(documents, embeddings)

In [None]:
query = "What is BioMistral?"
docs = db.similarity_search(query)
print(docs[0].page_content)

MMLU
Clinical KG Medical Genetics Anatomy Pro Medicine College Biology College Medicine MedQA MedQA 5 opts PubMedQA MedMCQA Avg.
BioMistral 7B 59.9 ±1.2 64.0 ±1.6 56.5 ±1.8 60.4 ±0.5 59.0 ±1.5 54.7 ±1.0 50.6 ±0.3 42.8 ±0.3 77.5 ±0.1 48.1 ±0.2 57.3
Mistral 7B Instruct 62.9 ±0.2 57.0 ±0.8 55.6 ±1.0 59.4 ±0.6 62.5 ±1.0 57.2 ±2.1 42.0 ±0.2 40.9 ±0.4 75.7 ±0.4 46.1 ±0.1 55.9
BioMistral 7B Ensemble 62.8 ±0.5 62.7 ±0.5 57.5 ±0.3 63.5 ±0.8 64.3 ±1.6 55.7 ±1.5 50.6 ±0.3 43.6 ±0.5 77.5 ±0.2 48.8 ±0.0 58.7
BioMistral 7B DARE 62.3 ±1.3 67.0 ±1.6 55.8 ±0.9 61.4 ±0.3 66.9 ±2.3 58.0 ±0.5 51.1 ±0.3 45.2 ±0.3 77.7 ±0.1 48.7 ±0.1 59.4
BioMistral 7B TIES 60.1 ±0.9 65.0 ±2.4 58.5 ±1.0 60.5 ±1.1 60.4 ±1.5 56.5 ±1.9 49.5 ±0.1 43.2 ±0.1 77.5 ±0.2 48.1 ±0.1 57.9
BioMistral 7B SLERP 62.5 ±0.6 64.7 ±1.7 55.8 ±0.3 62.7 ±0.3 64.8 ±0.9 56.3 ±1.0 50.8 ±0.6 44.3 ±0.4 77.8 ±0.0 48.6 ±0.1 58.8
MedAlpaca 7B 53.1 ±0.9 58.0 ±2.2 54.1 ±1.6 58.8 ±0.3 58.1 ±1.3 48.6 ±0.5 40.1 ±0.4 33.7 ±0.7 73.6 ±0.3 37.0 ±0.3 51.5
PMC-LLaM

In [None]:
embedding_vector = embeddings.embed_query(query)
docs = db.similarity_search_by_vector(embedding_vector)
print(docs[0].page_content)

MMLU
Clinical KG Medical Genetics Anatomy Pro Medicine College Biology College Medicine MedQA MedQA 5 opts PubMedQA MedMCQA Avg.
BioMistral 7B 59.9 ±1.2 64.0 ±1.6 56.5 ±1.8 60.4 ±0.5 59.0 ±1.5 54.7 ±1.0 50.6 ±0.3 42.8 ±0.3 77.5 ±0.1 48.1 ±0.2 57.3
Mistral 7B Instruct 62.9 ±0.2 57.0 ±0.8 55.6 ±1.0 59.4 ±0.6 62.5 ±1.0 57.2 ±2.1 42.0 ±0.2 40.9 ±0.4 75.7 ±0.4 46.1 ±0.1 55.9
BioMistral 7B Ensemble 62.8 ±0.5 62.7 ±0.5 57.5 ±0.3 63.5 ±0.8 64.3 ±1.6 55.7 ±1.5 50.6 ±0.3 43.6 ±0.5 77.5 ±0.2 48.8 ±0.0 58.7
BioMistral 7B DARE 62.3 ±1.3 67.0 ±1.6 55.8 ±0.9 61.4 ±0.3 66.9 ±2.3 58.0 ±0.5 51.1 ±0.3 45.2 ±0.3 77.7 ±0.1 48.7 ±0.1 59.4
BioMistral 7B TIES 60.1 ±0.9 65.0 ±2.4 58.5 ±1.0 60.5 ±1.1 60.4 ±1.5 56.5 ±1.9 49.5 ±0.1 43.2 ±0.1 77.5 ±0.2 48.1 ±0.1 57.9
BioMistral 7B SLERP 62.5 ±0.6 64.7 ±1.7 55.8 ±0.3 62.7 ±0.3 64.8 ±0.9 56.3 ±1.0 50.8 ±0.6 44.3 ±0.4 77.8 ±0.0 48.6 ±0.1 58.8
MedAlpaca 7B 53.1 ±0.9 58.0 ±2.2 54.1 ±1.6 58.8 ±0.3 58.1 ±1.3 48.6 ±0.5 40.1 ±0.4 33.7 ±0.7 73.6 ±0.3 37.0 ±0.3 51.5
PMC-LLaM

### Retrievers

Once the data is in the database, you still need to retrieve it. LangChain supports many different retrieval algorithms and is one of the places where we add the most value. LangChain supports basic methods that are easy to get started - namely simple semantic search. However, we have also added a collection of algorithms on top of this to increase performance. These include:

- [`Parent Document Retriever`](https://python.langchain.com/docs/modules/data_connection/retrievers/parent_document_retriever): This allows you to create multiple embeddings per parent document, allowing you to look up smaller chunks but return larger context.
- [`Self Query Retriever`](https://python.langchain.com/docs/modules/data_connection/retrievers/self_query): User questions often contain a reference to something that isn't just semantic but rather expresses some logic that can best be represented as a metadata filter. Self-query allows you to parse out the semantic part of a query from other metadata filters present in the query.
- [`Ensemble Retriever`](https://python.langchain.com/docs/modules/data_connection/retrievers/ensemble): Sometimes you may want to retrieve documents from multiple different sources, or using multiple different algorithms. The ensemble retriever allows you to easily do this.
- And more!

A retriever is an interface that returns documents given an unstructured query. It is more general than a vector store. A retriever does not need to be able to store documents, only to return (or retrieve) them. Vector stores can be used as the backbone of a retriever, but there are other types of retrievers as well.

Retrievers accept a string query as input and return a list of Document's as output.

https://python.langchain.com/docs/modules/data_connection/retrievers/

#### Vector store-backed retriever

In [None]:
from langchain_community.document_loaders import PyPDFLoader

# load data
loader = PyPDFLoader("https://arxiv.org/pdf/2402.10373.pdf") # 17 pages
pages = loader.load_and_split()
raw_documents = loader.load()

# preprocess data
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
documents = text_splitter.split_documents(raw_documents)

# load embedding
embeddings = HuggingFaceInferenceAPIEmbeddings(
    api_key=HUGGINGFACE_API_READ, model_name="sentence-transformers/all-MiniLM-l6-v2"
)

# initiate database
db = FAISS.from_documents(documents, embeddings)

In [None]:
retriever = db.as_retriever()

In [None]:
docs = retriever.get_relevant_documents("what is BioMistral")

In [None]:
docs[0].page_content

'MMLU\nClinical KG Medical Genetics Anatomy Pro Medicine College Biology College Medicine MedQA MedQA 5 opts PubMedQA MedMCQA Avg.\nBioMistral 7B 59.9 ±1.2 64.0 ±1.6 56.5 ±1.8 60.4 ±0.5 59.0 ±1.5 54.7 ±1.0 50.6 ±0.3 42.8 ±0.3 77.5 ±0.1 48.1 ±0.2 57.3\nMistral 7B Instruct 62.9 ±0.2 57.0 ±0.8 55.6 ±1.0 59.4 ±0.6 62.5 ±1.0 57.2 ±2.1 42.0 ±0.2 40.9 ±0.4 75.7 ±0.4 46.1 ±0.1 55.9\nBioMistral 7B Ensemble 62.8 ±0.5 62.7 ±0.5 57.5 ±0.3 63.5 ±0.8 64.3 ±1.6 55.7 ±1.5 50.6 ±0.3 43.6 ±0.5 77.5 ±0.2 48.8 ±0.0 58.7\nBioMistral 7B DARE 62.3 ±1.3 67.0 ±1.6 55.8 ±0.9 61.4 ±0.3 66.9 ±2.3 58.0 ±0.5 51.1 ±0.3 45.2 ±0.3 77.7 ±0.1 48.7 ±0.1 59.4\nBioMistral 7B TIES 60.1 ±0.9 65.0 ±2.4 58.5 ±1.0 60.5 ±1.1 60.4 ±1.5 56.5 ±1.9 49.5 ±0.1 43.2 ±0.1 77.5 ±0.2 48.1 ±0.1 57.9\nBioMistral 7B SLERP 62.5 ±0.6 64.7 ±1.7 55.8 ±0.3 62.7 ±0.3 64.8 ±0.9 56.3 ±1.0 50.8 ±0.6 44.3 ±0.4 77.8 ±0.0 48.6 ±0.1 58.8\nMedAlpaca 7B 53.1 ±0.9 58.0 ±2.2 54.1 ±1.6 58.8 ±0.3 58.1 ±1.3 48.6 ±0.5 40.1 ±0.4 33.7 ±0.7 73.6 ±0.3 37.0 ±0.3 51.5

In [None]:
# MMR
retriever = db.as_retriever(search_type="mmr")
docs = retriever.get_relevant_documents("what is BioMistral")
# Similarity score threshold retrieval
retriever = db.as_retriever(
    search_type="similarity_score_threshold", search_kwargs={"score_threshold": 0.5}
)
docs = retriever.get_relevant_documents("what is BioMistral")
# Specifying top k
retriever = db.as_retriever(search_kwargs={"k": 1})
docs = retriever.get_relevant_documents("what is BioMistral")
print(len(docs))



1


#### MultiQueryRetriever

In [None]:
!pip install chromadb

Collecting chromadb
  Downloading chromadb-0.4.24-py3-none-any.whl (525 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m525.5/525.5 kB[0m [31m4.0 MB/s[0m eta [36m0:00:00[0m
Collecting chroma-hnswlib==0.7.3 (from chromadb)
  Downloading chroma_hnswlib-0.7.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.4/2.4 MB[0m [31m10.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting fastapi>=0.95.2 (from chromadb)
  Downloading fastapi-0.110.0-py3-none-any.whl (92 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m92.1/92.1 kB[0m [31m6.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting uvicorn[standard]>=0.18.3 (from chromadb)
  Downloading uvicorn-0.28.0-py3-none-any.whl (60 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m60.6/60.6 kB[0m [31m5.2 MB/s[0m eta [36m0:00:00[0m
Collecting posthog>=2.4.0 (from chromadb)
  Downloading posthog-3.5.0-py2.

In [None]:
# Build a sample vectorDB
from langchain_community.document_loaders import WebBaseLoader
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter

# Load blog post
loader = WebBaseLoader("https://lilianweng.github.io/posts/2023-06-23-agent/")
data = loader.load()

# Split
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)
splits = text_splitter.split_documents(data)

# VectorDB
embedding = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY)
vectordb = Chroma.from_documents(documents=splits, embedding=embedding)

In [None]:
from langchain.retrievers.multi_query import MultiQueryRetriever
from langchain_openai import ChatOpenAI

question = "What are the approaches to Task Decomposition?"
llm = ChatOpenAI(temperature=0, openai_api_key=OPENAI_API_KEY)
retriever_from_llm = MultiQueryRetriever.from_llm(
    retriever=vectordb.as_retriever(), llm=llm
)

In [None]:
# Set logging for the queries
import logging

logging.basicConfig()
logging.getLogger("langchain.retrievers.multi_query").setLevel(logging.INFO)

In [None]:
unique_docs = retriever_from_llm.get_relevant_documents(query=question)
len(unique_docs)

INFO:langchain.retrievers.multi_query:Generated queries: ['1. How can Task Decomposition be achieved through different methods?', '2. What strategies are commonly used for Task Decomposition?', '3. In what ways can Task Decomposition be approached and implemented effectively?']


5

In [None]:
from typing import List

from langchain.chains import LLMChain
from langchain.output_parsers import PydanticOutputParser
from langchain.prompts import PromptTemplate
from pydantic import BaseModel, Field


# Output parser will split the LLM result into a list of queries
class LineList(BaseModel):
    # "lines" is the key (attribute name) of the parsed output
    lines: List[str] = Field(description="Lines of text")


class LineListOutputParser(PydanticOutputParser):
    def __init__(self) -> None:
        super().__init__(pydantic_object=LineList)

    def parse(self, text: str) -> LineList:
        lines = text.strip().split("\n")
        return LineList(lines=lines)


output_parser = LineListOutputParser()

QUERY_PROMPT = PromptTemplate(
    input_variables=["question"],
    template="""You are an AI language model assistant. Your task is to generate five
    different versions of the given user question to retrieve relevant documents from a vector
    database. By generating multiple perspectives on the user question, your goal is to help
    the user overcome some of the limitations of the distance-based similarity search.
    Provide these alternative questions separated by newlines.
    Original question: {question}""",
)
llm = ChatOpenAI(temperature=0)

# Chain
llm_chain = LLMChain(llm=llm, prompt=QUERY_PROMPT, output_parser=output_parser)

# Other inputs
question = "What are the approaches to Task Decomposition?"

ValidationError: 1 validation error for LineListOutputParser
pydantic_object
  subclass of BaseModel expected (type=type_error.subclass; expected_class=BaseModel)

In [None]:
# Run
retriever = MultiQueryRetriever(
    retriever=vectordb.as_retriever(), llm_chain=llm_chain, parser_key="lines"
)  # "lines" is the key (attribute name) of the parsed output

# Results
unique_docs = retriever.get_relevant_documents(
    query="What does the course say about regression?"
)
len(unique_docs)

#### Contextual compression

One challenge with retrieval is that usually you don’t know the specific queries your document storage system will face when you ingest data into the system. This means that the information most relevant to a query may be buried in a document with a lot of irrelevant text. Passing that full document through your application can lead to more expensive LLM calls and poorer responses.

Contextual compression is meant to fix this. The idea is simple: instead of immediately returning retrieved documents as-is, you can compress them using the context of the given query, so that only the relevant information is returned. “Compressing” here refers to both compressing the contents of an individual document and filtering out documents wholesale.

In [None]:
# Helper function for printing docs


def pretty_print_docs(docs):
    print(
        f"\n{'-' * 100}\n".join(
            [f"Document {i+1}:\n\n" + d.page_content for i, d in enumerate(docs)]
        )
    )

In [None]:
from langchain_community.document_loaders import TextLoader, PyPDFLoader
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import CharacterTextSplitter

# documents = TextLoader("../../state_of_the_union.txt").load()
documents = PyPDFLoader("https://www.biorxiv.org/content/10.1101/2023.10.16.562533v1.full.pdf").load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)
embeddings = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY)
retriever = FAISS.from_documents(texts, embeddings).as_retriever()

docs = retriever.get_relevant_documents(
    "What are the dataset used from the paper?"
)
pretty_print_docs(docs)

Document 1:

A Simple But Hard-to-Beat Foundation Model for Genes and Cells Built From ChatGPT
(c)(a)(b)
(d)(e)(f)
Figure 4: (a)UMAP visualization of the subsampled Aorta dataset, colored by disease phenotype (three different
disease phenotypes: ascending only, ascending with descending thoracic aortic aneurysm, and ascending with root
aneurysm; one control phenotype comprising patients with healthy hearts after transplant) provided in the original
study [ 33].(b)Same as (a), but colored by cell types annotated by the original study [ 33].(c)Same as (a), but colored
by patient id. (d)UMAP visualization of GenePT-s embeddings of the same set of cells as (a), colored by disease
phenotype. (e)Same as (d), but colored by cell types. (f)Same as (d), but colored by patient id.
15. CC-BY-NC-ND 4.0 International license available under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyrigh

In [None]:
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import LLMChainExtractor
from langchain_openai import OpenAI

llm = OpenAI(temperature=0, openai_api_key=OPENAI_API_KEY)
compressor = LLMChainExtractor.from_llm(llm)
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor, base_retriever=retriever
)

compressed_docs = compression_retriever.get_relevant_documents(
    "What are the dataset used from the paper?"
)
pretty_print_docs(compressed_docs)



Document 1:

- subsampled Aorta dataset
- disease phenotype (three different disease phenotypes: ascending only, ascending with descending thoracic aortic aneurysm, and ascending with root aneurysm; one control phenotype comprising patients with healthy hearts after transplant)
- cell types annotated by the original study
- patient id
----------------------------------------------------------------------------------------------------
Document 2:

- We assessed whether GenePT-s embeddings were impacted by common batch effect such as patient variability on two datasets used in Theodoris et al. [1]: the cardiomyocyte dataset originally published by Chaffin et al. [34], and the Aorta dataset originally published in Li et al. [33].
- We further divided the genes into a 70%/30% train/test split and evaluated the prediction accuracy of using an ℓ2regularized logistic regression on the 15 classes.
- We further assessed the efficacy of GenePT embeddings in predicting gene-gene interactions (GGI

##### LLMChainFilter

The `LLMChainFilter` is slightly simpler but more robust compressor that uses an LLM chain to decide which of the initially retrieved documents to filter out and which ones to return, without manipulating the document contents.

In [None]:
from langchain.retrievers.document_compressors import LLMChainFilter

_filter = LLMChainFilter.from_llm(llm)
compression_retriever = ContextualCompressionRetriever(
    base_compressor=_filter, base_retriever=retriever
)

compressed_docs = compression_retriever.get_relevant_documents(
    "What are the dataset used from the paper?"
)
pretty_print_docs(compressed_docs)



Document 1:

A Simple But Hard-to-Beat Foundation Model for Genes and Cells Built From ChatGPT
(c)(a)(b)
(d)(e)(f)
Figure 4: (a)UMAP visualization of the subsampled Aorta dataset, colored by disease phenotype (three different
disease phenotypes: ascending only, ascending with descending thoracic aortic aneurysm, and ascending with root
aneurysm; one control phenotype comprising patients with healthy hearts after transplant) provided in the original
study [ 33].(b)Same as (a), but colored by cell types annotated by the original study [ 33].(c)Same as (a), but colored
by patient id. (d)UMAP visualization of GenePT-s embeddings of the same set of cells as (a), colored by disease
phenotype. (e)Same as (d), but colored by cell types. (f)Same as (d), but colored by patient id.
15. CC-BY-NC-ND 4.0 International license available under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyrigh

##### EmbeddingsFilter

Making an extra LLM call over each retrieved document is expensive and slow. The EmbeddingsFilter provides a cheaper and faster option by embedding the documents and query and only returning those documents which have sufficiently similar embeddings to the query.

In [None]:
from langchain.retrievers.document_compressors import EmbeddingsFilter
from langchain_openai import OpenAIEmbeddings

embeddings_filter = EmbeddingsFilter(embeddings=embeddings, similarity_threshold=0.76)
compression_retriever = ContextualCompressionRetriever(
    base_compressor=embeddings_filter, base_retriever=retriever
)

compressed_docs = compression_retriever.get_relevant_documents(
    "What are the dataset used from the paper?"
)
pretty_print_docs(compressed_docs)

Document 1:

A Simple But Hard-to-Beat Foundation Model for Genes and Cells Built From ChatGPT
(c)(a)(b)
(d)(e)(f)
Figure 4: (a)UMAP visualization of the subsampled Aorta dataset, colored by disease phenotype (three different
disease phenotypes: ascending only, ascending with descending thoracic aortic aneurysm, and ascending with root
aneurysm; one control phenotype comprising patients with healthy hearts after transplant) provided in the original
study [ 33].(b)Same as (a), but colored by cell types annotated by the original study [ 33].(c)Same as (a), but colored
by patient id. (d)UMAP visualization of GenePT-s embeddings of the same set of cells as (a), colored by disease
phenotype. (e)Same as (d), but colored by cell types. (f)Same as (d), but colored by patient id.
15. CC-BY-NC-ND 4.0 International license available under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyrigh

##### Stringing compressors and document transformers together

Using the `DocumentCompressorPipeline` we can also easily combine multiple compressors in sequence. Along with compressors we can add `BaseDocumentTransformers` to our pipeline, which don’t perform any contextual compression but simply perform some transformation on a set of documents. For example `TextSplitter`s can be used as document transformers to split documents into smaller pieces, and the `EmbeddingsRedundantFilter` can be used to filter out redundant documents based on embedding similarity between documents.

Below we create a compressor pipeline by first splitting our docs into smaller chunks, then removing redundant documents, and then filtering based on relevance to the query.

In [None]:
from langchain.retrievers.document_compressors import DocumentCompressorPipeline
from langchain_community.document_transformers import EmbeddingsRedundantFilter
from langchain_text_splitters import CharacterTextSplitter

splitter = CharacterTextSplitter(chunk_size=300, chunk_overlap=0, separator=". ")
redundant_filter = EmbeddingsRedundantFilter(embeddings=embeddings)
relevant_filter = EmbeddingsFilter(embeddings=embeddings, similarity_threshold=0.76)
pipeline_compressor = DocumentCompressorPipeline(
    transformers=[splitter, redundant_filter, relevant_filter]
)

In [None]:
compression_retriever = ContextualCompressionRetriever(
    base_compressor=pipeline_compressor, base_retriever=retriever
)

compressed_docs = compression_retriever.get_relevant_documents(
    # "What did the president say about Ketanji Jackson Brown" --> from docs, which is obviously irrelevant, the output will be none (not given because not exist in the doc I guess)
    "What are the dataset used from the paper?"
)
pretty_print_docs(compressed_docs)



Document 1:

We assessed whether GenePT-s embeddings were impacted by common batch effect such as patient
variability on two datasets used in Theodoris et al. [1]: the cardiomyocyte dataset originally published
by Chaffin et al. [34], and the Aorta dataset originally published in Li et al
----------------------------------------------------------------------------------------------------
Document 2:

We demonstrate the use of GenePT on a random 20% sample of the original Aorta dataset
----------------------------------------------------------------------------------------------------
Document 3:

We
compared the ROC-AUC for three methods on the test GGI dataset provided in Du et al
----------------------------------------------------------------------------------------------------
Document 4:

Notably, the original data exhibited significant patient batch
effects (see Figure 3(b) in the Appendix)
------------------------------------------------------------------------------------------

#### Ensemble Retriever

The `EnsembleRetriever` takes a list of retrievers as input and ensemble the results of their `get_relevant_documents()` methods and rerank the results based on the Reciprocal Rank Fusion algorithm.

By leveraging the strengths of different algorithms, the `EnsembleRetriever` can achieve better performance than any single algorithm.

The most common pattern is to combine a sparse retriever (like BM25) with a dense retriever (like embedding similarity), because their strengths are complementary. It is also known as “hybrid search”. The sparse retriever is good at finding relevant documents based on keywords, while the dense retriever is good at finding relevant documents based on semantic similarity.

#### Long-Context Reorder

No matter the architecture of your model, there is a substantial performance degradation when you include 10+ retrieved documents. In brief: When models must access relevant information in the middle of long contexts, they tend to ignore the provided documents. See: https://arxiv.org/abs/2307.03172

To avoid this issue you can re-order documents after retrieval to avoid performance degradation.

#### MultiVector Retriever

It can often be beneficial to store multiple vectors per document. There are multiple use cases where this is beneficial. LangChain has a base `MultiVectorRetriever` which makes querying this type of setup easy. A lot of the complexity lies in how to create the multiple vectors per document. This notebook covers some of the common ways to create those vectors and use the `MultiVectorRetriever`.

The methods to create multiple vectors per document include:

- Smaller chunks: split a document into smaller chunks, and embed those (this is ParentDocumentRetriever).
- Summary: create a summary for each document, embed that along with (or instead of) the document.
- Hypothetical questions: create hypothetical questions that each document would be appropriate to answer, embed those along with (or instead of) the document.

Note that this also enables another method of adding embeddings - manually. This is great because you can explicitly add questions or queries that should lead to a document being recovered, giving you more control.

#### Parent Document Retriever

When splitting documents for retrieval, there are often conflicting desires:

1. You may want to have small documents, so that their embeddings can most accurately reflect their meaning. If too long, then the embeddings can lose meaning.
2. You want to have long enough documents that the context of each chunk is retained.

The `ParentDocumentRetriever` strikes that balance by splitting and storing small chunks of data. During retrieval, it first fetches the small chunks but then looks up the parent ids for those chunks and returns those larger documents.

Note that “parent document” refers to the document that a small chunk originated from. This can either be the whole raw document OR a larger chunk.

#### Self-querying

Head to Integrations for documentation on vector stores with built-in support for self-querying.

A self-querying retriever is one that, as the name suggests, has the ability to query itself. Specifically, given any natural language query, the retriever uses a query-constructing LLM chain to write a structured query and then applies that structured query to its underlying VectorStore. This allows the retriever to not only use the user-input query for semantic similarity comparison with the contents of stored documents but to also extract filters from the user query on the metadata of stored documents and to execute those filters.

<img src="https://python.langchain.com/assets/images/self_querying-26ac0fc8692e85bc3cd9b8640509404f.jpg" width="75%" height="75%">

#### Time-weighted vector store retriever

This retriever uses a combination of semantic similarity and a time decay.

The algorithm for scoring them is:
```
semantic_similarity + (1.0 - decay_rate) ^ hours_passed
```
Notably, `hours_passed` refers to the hours passed since the object in the retriever **was last accessed**, not since it was created. This means that frequently accessed objects remain “fresh”.

### Indexing

The LangChain `Indexing API` syncs your data from any source into a vector store, helping you:

- Avoid writing duplicated content into the vector store
- Avoid re-writing unchanged content
- Avoid re-computing embeddings over unchanged content

All of which should save you time and money, as well as improve your vector search results.

In [None]:
%pip install langchain-elasticsearch

from google.colab import output
output.clear()

Collecting langchain-elasticsearch
  Downloading langchain_elasticsearch-0.1.0-py3-none-any.whl (17 kB)
Collecting elasticsearch<9.0.0,>=8.12.0 (from langchain-elasticsearch)
  Downloading elasticsearch-8.12.1-py3-none-any.whl (432 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m432.1/432.1 kB[0m [31m4.6 MB/s[0m eta [36m0:00:00[0m
Collecting elastic-transport<9,>=8 (from elasticsearch<9.0.0,>=8.12.0->langchain-elasticsearch)
  Downloading elastic_transport-8.12.0-py3-none-any.whl (59 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m59.9/59.9 kB[0m [31m3.1 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: elastic-transport, elasticsearch, langchain-elasticsearch
Successfully installed elastic-transport-8.12.0 elasticsearch-8.12.1 langchain-elasticsearch-0.1.0


In [None]:
from langchain.indexes import SQLRecordManager, index
from langchain_core.documents import Document
from langchain_elasticsearch import ElasticsearchStore
from langchain_openai import OpenAIEmbeddings

In [None]:
collection_name = "test_index"

embedding = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY)

vectorstore = ElasticsearchStore(
    # es_url="http://localhost:9200", index_name="test_index", embedding=embedding
)

ValueError: URL must include a 'scheme', 'host', and 'port' component (ie 'https://localhost:9200')

`Suggestion`: Use a namespace that takes into account both the vector store and the collection name in the vector store; e.g., ‘redis/my_docs’, ‘chromadb/my_docs’ or ‘postgres/my_docs’.

In [None]:
namespace = f"elasticsearch/{collection_name}"
record_manager = SQLRecordManager(
    namespace, db_url="sqlite:///record_manager_cache.sql"
)

In [None]:
record_manager.create_schema()

In [None]:
# Let’s index some test documents:
doc1 = Document(page_content="kitty", metadata={"source": "kitty.txt"})
doc2 = Document(page_content="doggy", metadata={"source": "doggy.txt"})

In [None]:
# indexing into an empty vector store
def _clear():
    """Hacky helper method to clear content. See the `full` mode section to to understand why it works."""
    index([], record_manager, vectorstore, cleanup="full", source_id_key="source")

In [None]:
_clear()

NameError: name 'vectorstore' is not defined

## Agents

The core idea of agents is to use a language model to choose a sequence of actions to take. In chains, a sequence of actions is hardcoded (in code). In agents, a language model is used as a reasoning engine to determine which actions to take and in which order.

#### Custom Agent

In [None]:
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0, openai_api_key=OPENAI_API_KEY)

##### Define Tools

In [None]:
from langchain.agents import tool


@tool
def get_word_length(word: str) -> int:
    """Returns the length of a word."""
    return len(word)


get_word_length.invoke("abc")

3

In [None]:
tools = [get_word_length]

##### Create the Prompt

Now let us create the prompt. Because OpenAI Function Calling is finetuned for tool usage, we hardly need any instructions on how to reason, or how to output format. We will just have two input variables: `input` and `agent_scratchpad`. `input` should be a string containing the user objective. `agent_scratchpad` should be a sequence of messages that contains the previous agent tool invocations and the corresponding tool outputs.

In [None]:
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are very powerful assistant, but don't know current events",
        ),
        ("user", "{input}"),
        MessagesPlaceholder(variable_name="agent_scratchpad"),
    ]
)

In [None]:
llm_with_tools = llm.bind_tools(tools)

##### Create the Agent

In [None]:
from langchain.agents.format_scratchpad.openai_tools import (
    format_to_openai_tool_messages,
)
from langchain.agents.output_parsers.openai_tools import OpenAIToolsAgentOutputParser

agent = (
    {
        "input": lambda x: x["input"],
        "agent_scratchpad": lambda x: format_to_openai_tool_messages(
            x["intermediate_steps"]
        ),
    }
    | prompt
    | llm_with_tools
    | OpenAIToolsAgentOutputParser()
)

In [None]:
from langchain.agents import AgentExecutor

agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

In [None]:
list(agent_executor.stream({"input": "How many letters in the word eudction"}))



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `get_word_length` with `{'word': 'eudction'}`


[0m[36;1m[1;3m8[0m[32;1m[1;3mThe word "eudction" has 8 letters.[0m

[1m> Finished chain.[0m


[{'actions': [OpenAIToolAgentAction(tool='get_word_length', tool_input={'word': 'eudction'}, log="\nInvoking: `get_word_length` with `{'word': 'eudction'}`\n\n\n", message_log=[AIMessageChunk(content='', additional_kwargs={'tool_calls': [{'index': 0, 'id': 'call_lftfJP88VeiZKcxBB9EAAwKv', 'function': {'arguments': '{"word":"eudction"}', 'name': 'get_word_length'}, 'type': 'function'}]})], tool_call_id='call_lftfJP88VeiZKcxBB9EAAwKv')],
  'messages': [AIMessageChunk(content='', additional_kwargs={'tool_calls': [{'index': 0, 'id': 'call_lftfJP88VeiZKcxBB9EAAwKv', 'function': {'arguments': '{"word":"eudction"}', 'name': 'get_word_length'}, 'type': 'function'}]})]},
 {'steps': [AgentStep(action=OpenAIToolAgentAction(tool='get_word_length', tool_input={'word': 'eudction'}, log="\nInvoking: `get_word_length` with `{'word': 'eudction'}`\n\n\n", message_log=[AIMessageChunk(content='', additional_kwargs={'tool_calls': [{'index': 0, 'id': 'call_lftfJP88VeiZKcxBB9EAAwKv', 'function': {'arguments'

In [None]:
# compare with base LLM
llm.invoke("How many letters in the word eudction")

AIMessage(content='There are 8 letters in the word "education".')

##### Adding memory
This is great - we have an agent! However, this agent is stateless - it doesn’t remember anything about previous interactions. This means you can’t ask follow up questions easily. Let’s fix that by adding in memory.

In order to do this, we need to do two things:

1. Add a place for memory variables to go in the prompt
2. Keep track of the chat history

First, let’s add a place for memory in the prompt. We do this by adding a placeholder for messages with the key `"chat_history"`. Notice that we put this ABOVE the new user input (to follow the conversation flow).

In [None]:
from langchain.prompts import MessagesPlaceholder

MEMORY_KEY = "chat_history"
prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are very powerful assistant, but bad at calculating lengths of words.",
        ),
        MessagesPlaceholder(variable_name=MEMORY_KEY),
        ("user", "{input}"),
        MessagesPlaceholder(variable_name="agent_scratchpad"),
    ]
)

In [None]:
from langchain_core.messages import AIMessage, HumanMessage

chat_history = []

In [None]:
agent = (
    {
        "input": lambda x: x["input"],
        "agent_scratchpad": lambda x: format_to_openai_tool_messages(
            x["intermediate_steps"]
        ),
        "chat_history": lambda x: x["chat_history"],
    }
    | prompt
    | llm_with_tools
    | OpenAIToolsAgentOutputParser()
)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

In [None]:
input1 = "how many letters in the word educa?"
input2 = "what about in the word length-of-time?"
result = agent_executor.invoke({"input": input1, "chat_history": chat_history})
chat_history.extend(
    [
        HumanMessage(content=input1),
        AIMessage(content=result["output"]),
    ]
)
agent_executor.invoke({"input": "is that a real word?", "chat_history": chat_history})



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `get_word_length` with `{'word': 'educa'}`


[0m[36;1m[1;3m5[0m[32;1m[1;3mThe word "educa" has 5 letters.[0m

[1m> Finished chain.[0m


[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mI made a mistake in my previous response. "Educa" is not a real word. Thank you for pointing that out.[0m

[1m> Finished chain.[0m


{'input': 'is that a real word?',
 'chat_history': [HumanMessage(content='how many letters in the word educa?'),
  AIMessage(content='The word "educa" has 5 letters.')],
 'output': 'I made a mistake in my previous response. "Educa" is not a real word. Thank you for pointing that out.'}

In [None]:
chat_history

[HumanMessage(content='how many letters in the word educa?'),
 AIMessage(content='The word "educa" has 5 letters.')]

#### Streaming

Streaming is an important UX consideration for LLM apps, and agents are no exception. Streaming with agents is made more complicated by the fact that it’s not just tokens of the final answer that you will want to stream, but you may also want to stream back the intermediate steps an agent takes.

#### Structured Tools

#### Running Agent as an Iterator

It can be useful to run the agent as an iterator, to add human-in-the-loop checks as needed.

To demonstrate the `AgentExecutorIterator` functionality, we will set up a problem where an Agent must:

- Retrieve three prime numbers from a Tool
- Multiply these together.

In this simple problem we can demonstrate adding some logic to verify intermediate steps by checking whether their outputs are prime.

#### Returning Structured Outputs

This notebook covers how to have an agent return a structured output. By default, most of the agents return a single string. It can often be useful to have an agent return something with more structure.

A good example of this is an agent tasked with doing question-answering over some sources. Let’s say we want the agent to respond not only with the answer, but also a list of the sources used. We then want our output to roughly follow the schema below:

In [None]:
class Response(BaseModel):
    """Final response to the question being asked"""
    answer: str = Field(description = "The final answer to respond to the user")
    sources: List[int] = Field(description="List of page chunks that contain answer to the question. Only include a page chunk if it contains relevant information")

#### Handle Parsing Errors

Occasionally the LLM cannot determine what step to take because its outputs are not correctly formatted to be handled by the output parser. In this case, by default the agent errors. But you can easily control this functionality with `handle_parsing_errors`! Let’s explore how.

#### Access Intermediate Steps

In order to get more visibility into what an agent is doing, we can also return intermediate steps. This comes in the form of an extra key in the return value, which is a list of (action, observation) tuples.

#### Cap the max number of iterations

#### Timeouts for agents

This notebook walks through how to cap an agent executor after a certain amount of time. This can be useful for safeguarding against long running agent runs.



### Tools

Tools are interfaces that an agent can use to interact with the world. They combine a few things:

1. The name of the tool
2. A description of what the tool is
3. JSON schema of what the inputs to the tool are
4. The function to call
5. Whether the result of a tool should be returned directly to the user

It is useful to have all this information because this information can be used to build action-taking systems! The name, description, and JSON schema can be used to prompt the LLM so it knows how to specify what action to take, and then the function to call is equivalent to taking that action.

The simpler the input to a tool is, the easier it is for an LLM to be able to use it. Many agents will only work with tools that have a single string input. For a list of agent types and which ones work with more complicated inputs, please see this [documentation](https://python.langchain.com/docs/modules/agents/agent_types)

Importantly, the name, description, and JSON schema (if used) are all used in the prompt. Therefore, it is really important that they are clear and describe exactly how the tool should be used. You may need to change the default name, description, or JSON schema if the LLM is not understanding how to use the tool.

In [None]:
!pip install wikipedia

Collecting wikipedia
  Downloading wikipedia-1.4.0.tar.gz (27 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: wikipedia
  Building wheel for wikipedia (setup.py) ... [?25l[?25hdone
  Created wheel for wikipedia: filename=wikipedia-1.4.0-py3-none-any.whl size=11678 sha256=ff07f991c6cf3023f45fa8e25952c2a5374178b6f5c1f9fc696c603482474e96
  Stored in directory: /root/.cache/pip/wheels/5e/b6/c5/93f3dec388ae76edc830cb42901bb0232504dfc0df02fc50de
Successfully built wikipedia
Installing collected packages: wikipedia
Successfully installed wikipedia-1.4.0


In [None]:
from langchain_community.tools import WikipediaQueryRun
from langchain_community.utilities import WikipediaAPIWrapper

In [None]:
api_wrapper = WikipediaAPIWrapper(top_k_results=1, doc_content_chars_max=100)
tool = WikipediaQueryRun(api_wrapper=api_wrapper)

In [None]:
print(tool.name)
print(tool.description)
print(tool.args)
print(tool.return_direct)

wikipedia
A wrapper around Wikipedia. Useful for when you need to answer general questions about people, places, companies, facts, historical events, or other subjects. Input should be a search query.
{'query': {'title': 'Query', 'type': 'string'}}
False


In [None]:
tool.run({"query": "langchain"})

'Page: LangChain\nSummary: LangChain is a framework designed to simplify the creation of applications '

In [None]:
tool.run("langchain")

'Page: LangChain\nSummary: LangChain is a framework designed to simplify the creation of applications '

##### Customizing Default Tools

In [None]:
from langchain_core.pydantic_v1 import BaseModel, Field


class WikiInputs(BaseModel):
    """Inputs to the wikipedia tool."""

    query: str = Field(
        description="query to look up in Wikipedia, should be 3 or less words"
    )

In [None]:
tool = WikipediaQueryRun(
    name="wiki-tool",
    description="look up things in wikipedia",
    args_schema=WikiInputs,
    api_wrapper=api_wrapper,
    return_direct=True,
)

In [None]:
print(tool.name)
print(tool.description)
print(tool.args)
print(tool.return_direct)

wiki-tool
look up things in wikipedia
{'query': {'title': 'Query', 'description': 'query to look up in Wikipedia, should be 3 or less words', 'type': 'string'}}
True


In [None]:
tool.run("langchain")

'Page: LangChain\nSummary: LangChain is a framework designed to simplify the creation of applications '

#### Defining Custom Tools

When constructing your own agent, you will need to provide it with a list of Tools that it can use. Besides the actual function that is called, the Tool consists of several components:

- `name` (str), is required and must be unique within a set of tools provided to an agent
- `description` (str), is optional but recommended, as it is used by an agent to determine tool use
- `args_schema` (Pydantic BaseModel), is optional but recommended, can be used to provide more information (e.g., few-shot examples) or validation for expected parameters.

There are multiple ways to define a tool. In this guide, we will walk through how to do for two functions:

- A made up search function that always returns the string “LangChain”
- A multiplier function that will multiply two numbers by each other

The biggest difference here is that the first function only requires one input, while the second one requires multiple. Many agents only work with functions that require single inputs, so it’s important to know how to work with those. For the most part, defining these custom tools is the same, but there are some differences.

In [None]:
# Import things that are needed generically
from langchain.pydantic_v1 import BaseModel, Field
from langchain.tools import BaseTool, StructuredTool, tool

In [None]:
@tool
def search(query: str) -> str:
    """Look up things online."""
    return "LangChain"

In [None]:
print(search.name)
print(search.description)
print(search.args)

search
search(query: str) -> str - Look up things online.
{'query': {'title': 'Query', 'type': 'string'}}


In [None]:
@tool
def multiply(a: int, b: int) -> int:
    """Multiply two numbers."""
    return a * b

In [None]:
print(multiply.name)
print(multiply.description)
print(multiply.args)

multiply
multiply(a: int, b: int) -> int - Multiply two numbers.
{'a': {'title': 'A', 'type': 'integer'}, 'b': {'title': 'B', 'type': 'integer'}}


You can also customize the tool name and JSON args by passing them into the tool decorator.

In [None]:
class SearchInput(BaseModel):
    query: str = Field(description="should be a search query")


@tool("search-tool", args_schema=SearchInput, return_direct=True)
def search(query: str) -> str:
    """Look up things online."""
    return "LangChain"

In [None]:
print(search.name)
print(search.description)
print(search.args)
print(search.return_direct)

search-tool
search-tool(query: str) -> str - Look up things online.
{'query': {'title': 'Query', 'description': 'should be a search query', 'type': 'string'}}
True


##### Subclass BaseTool

You can also explicitly define a custom tool by subclassing the BaseTool class. This provides maximal control over the tool definition, but is a bit more work.

In [None]:
from typing import Optional, Type

from langchain.callbacks.manager import (
    AsyncCallbackManagerForToolRun,
    CallbackManagerForToolRun,
)


class SearchInput(BaseModel):
    query: str = Field(description="should be a search query")


class CalculatorInput(BaseModel):
    a: int = Field(description="first number")
    b: int = Field(description="second number")


class CustomSearchTool(BaseTool):
    name = "custom_search"
    description = "useful for when you need to answer questions about current events"
    args_schema: Type[BaseModel] = SearchInput

    def _run(
        self, query: str, run_manager: Optional[CallbackManagerForToolRun] = None
    ) -> str:
        """Use the tool."""
        return "LangChain"

    async def _arun(
        self, query: str, run_manager: Optional[AsyncCallbackManagerForToolRun] = None
    ) -> str:
        """Use the tool asynchronously."""
        raise NotImplementedError("custom_search does not support async")


class CustomCalculatorTool(BaseTool):
    name = "Calculator"
    description = "useful for when you need to answer questions about math"
    args_schema: Type[BaseModel] = CalculatorInput
    return_direct: bool = True

    def _run(
        self, a: int, b: int, run_manager: Optional[CallbackManagerForToolRun] = None
    ) -> str:
        """Use the tool."""
        return a * b

    async def _arun(
        self,
        a: int,
        b: int,
        run_manager: Optional[AsyncCallbackManagerForToolRun] = None,
    ) -> str:
        """Use the tool asynchronously."""
        raise NotImplementedError("Calculator does not support async")

In [None]:
search = CustomSearchTool()
print(search.name)
print(search.description)
print(search.args)

custom_search
useful for when you need to answer questions about current events
{'query': {'title': 'Query', 'description': 'should be a search query', 'type': 'string'}}


In [None]:
multiply = CustomCalculatorTool()
print(multiply.name)
print(multiply.description)
print(multiply.args)
print(multiply.return_direct)

Calculator
useful for when you need to answer questions about math
{'a': {'title': 'A', 'description': 'first number', 'type': 'integer'}, 'b': {'title': 'B', 'description': 'second number', 'type': 'integer'}}
True


#### StructuredTool dataclass

You can also use a `StructuredTool` dataclass. This methods is a mix between the previous two. It’s more convenient than inheriting from the BaseTool class, but provides more functionality than just using a decorator.

In [None]:
def search_function(query: str):
    return "LangChain"


search = StructuredTool.from_function(
    func=search_function,
    name="Search",
    description="useful for when you need to answer questions about current events",
    # coroutine= ... <- you can specify an async method if desired as well
)

In [None]:
print(search.name)
print(search.description)
print(search.args)

Search
Search(query: str) - useful for when you need to answer questions about current events
{'query': {'title': 'Query', 'type': 'string'}}


In [None]:
class CalculatorInput(BaseModel):
    a: int = Field(description="first number")
    b: int = Field(description="second number")


def multiply(a: int, b: int) -> int:
    """Multiply two numbers."""
    return a * b


calculator = StructuredTool.from_function(
    func=multiply,
    name="Calculator",
    description="multiply numbers",
    args_schema=CalculatorInput,
    return_direct=True,
    # coroutine= ... <- you can specify an async method if desired as well
)

In [None]:
print(calculator.name)
print(calculator.description)
print(calculator.args)

Calculator
Calculator(a: int, b: int) -> int - multiply numbers
{'a': {'title': 'A', 'description': 'first number', 'type': 'integer'}, 'b': {'title': 'B', 'description': 'second number', 'type': 'integer'}}


##### Handling Tools Errors

In [None]:
from langchain_core.tools import ToolException


def search_tool1(s: str):
    raise ToolException("The search tool1 is not available.")

In [None]:
search = StructuredTool.from_function(
    func=search_tool1,
    name="Search_tool1",
    description="A bad tool",
)

search.run("test")

ToolException: The search tool1 is not available.

In [None]:
search = StructuredTool.from_function(
    func=search_tool1,
    name="Search_tool1",
    description="A bad tool",
    handle_tool_error=True,
)

search.run("test")

'The search tool1 is not available.'

We can also define a custom way to handle the tool error

In [None]:
def _handle_error(error: ToolException) -> str:
    return (
        "The following errors occurred during tool execution:"
        + error.args[0]
        + "Please try another tool."
    )


search = StructuredTool.from_function(
    func=search_tool1,
    name="Search_tool1",
    description="A bad tool",
    handle_tool_error=_handle_error,
)

search.run("test")

'The following errors occurred during tool execution:The search tool1 is not available.Please try another tool.'

## Chains

Chains refer to sequences of calls - whether to an LLM, a tool, or a data preprocessing step. The primary supported way to do this is with LCEL.

LCEL is great for constructing your own chains, but it’s also nice to have chains that you can use off-the-shelf. There are two types of off-the-shelf chains that LangChain supports:

- Chains that are built with LCEL. In this case, LangChain offers a higher-level constructor method. However, all that is being done under the hood is constructing a chain with LCEL.

- [Legacy] Chains constructed by subclassing from a legacy Chain class. These chains do not use LCEL under the hood but are rather standalone classes.

## More

### [BETA] Memory

Most LLM applications have a conversational interface. An essential component of a conversation is being able to refer to information introduced earlier in the conversation. At bare minimum, a conversational system should be able to access some window of past messages directly. A more complex system will need to have a world model that it is constantly updating, which allows it to do things like maintain information about entities and their relationships.

We call this ability to store information about past interactions "memory". LangChain provides a lot of utilities for adding memory to a system. These utilities can be used by themselves or incorporated seamlessly into a chain.

Most of memory-related functionality in LangChain is marked as beta. This is for two reasons:

1. Most functionality (with some exceptions, see below) are not production ready

2. Most functionality (with some exceptions, see below) work with Legacy chains, not the newer LCEL syntax.

The main exception to this is the `ChatMessageHistory` functionality. This functionality is largely production ready and does integrate with LCEL.

- [LCEL Runnables](https://python.langchain.com/docs/expression_language/how_to/message_history): For an overview of how to use `ChatMessageHistory` with LCEL runnables, see these docs

- [Integrations](https://python.langchain.com/docs/integrations/memory): For an introduction to the various `ChatMessageHistory` integrations, see these docs

### Callbacks

LangChain provides a callbacks system that allows you to hook into the various stages of your LLM application. This is useful for logging, monitoring, streaming, and other tasks.

You can subscribe to these events by using the callbacks argument available throughout the API. This argument is list of handler objects, which are expected to implement one or more of the methods described below in more detail.

# More To Explore

- LangChain basic functionalities
- Prompt Templates
- Simple & Sequential Chain
- Conversational Capabilities
- Crawling Capabilities
- PDF QnA
- Retrieval-Augmented Generation (RAG)

# Useful References/Materials
- Learn LangChain In 1 Hour With End To End LLM Project With Deployment In Huggingface Spaces
  - https://www.youtube.com/watch?v=qMIM7dECAkc&list=PLZoTAELRMXVORE4VF7WQ_fAl0L1Gljtar&index=5
- Chat With Multiple PDF Documents With Langchain And Google Gemini Pro #genai #googlegemini
  - https://www.youtube.com/watch?v=uus5eLz6smA&list=PLZoTAELRMXVORE4VF7WQ_fAl0L1Gljtar&index=14
- Retrieval-Augmented Generation chatbot, part 1: LangChain, Hugging Face, FAISS, AWS
  - https://www.youtube.com/watch?v=7kDaMz3Xnkw
- https://python.langchain.com/docs/integrations/text_embedding
- https://www.edenai.co/post/top-free-embedding-tools-apis-and-open-source-models

- LangChain: Run Language Models Locally - Hugging Face Models
  - https://www.youtube.com/watch?v=Xxxuw4_iCzw
- How I Built a Medical RAG Chatbot Using BioMistral|Langchain | FREE Colab|ALL OPENSOURCE #ai
  - https://www.youtube.com/watch?v=E53hc-jcUeE
- End To End LLM Langchain Project using Pinecone Vector Database #genai
  - https://www.youtube.com/watch?v=erUfLIi9OFM
- Prompt Engineering And LLM's With LangChain In One Shot-Generative AI
  - https://www.youtube.com/watch?v=t2bSApmPzU4&t=15s
- Learn How To Query Pdf using Langchain Open AI in 5 min
  - https://www.youtube.com/watch?v=5Ghv-F1wF_0&t=6s
- Chat With Multiple PDF Documents With Langchain And Google Gemini Pro #genai #googlegemini
  - https://www.youtube.com/watch?v=uus5eLz6smA