<a href="https://colab.research.google.com/github/rafeekpro/Colab/blob/main/langchain_1hr_sprint.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# LangChain 1-hour Sprint

![LangChain Logo](https://github.com/rastringer/promptcraft_notebooks/blob/main/images/langchain.png?raw=1)

LangChain is a framework for developing applications infused with LLM magic. In this notebook, we will cover some of its most useful and fun features, including:

* Templates
* Memory
* Working with APIs
* Chains
* Agents
* Vector stores

Let's start by importing some packages

In [1]:
! pip install --upgrade google-cloud-aiplatform
# LangChain
! pip install langchain langchain-experimental langchain[docarray]
! pip install pypdf
! pip install pydantic==1.10.8
# Open source vector store
! pip install chromadb==0.3.26
! pip install typing-inspect==0.8.0 typing_extensions==4.5.0
# For dense vector representations of text
! pip install sentence-transformers

Collecting google-cloud-aiplatform
  Downloading google_cloud_aiplatform-1.36.4-py2.py3-none-any.whl (3.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.3/3.3 MB[0m [31m25.3 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: google-cloud-aiplatform
Successfully installed google-cloud-aiplatform-1.36.4


Collecting langchain
  Downloading langchain-0.0.344-py3-none-any.whl (1.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.9/1.9 MB[0m [31m19.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting langchain-experimental
  Downloading langchain_experimental-0.0.43-py3-none-any.whl (159 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m159.9/159.9 kB[0m [31m18.9 MB/s[0m eta [36m0:00:00[0m
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain)
  Downloading dataclasses_json-0.6.3-py3-none-any.whl (28 kB)
Collecting jsonpatch<2.0,>=1.33 (from langchain)
  Downloading jsonpatch-1.33-py2.py3-none-any.whl (12 kB)
Collecting langchain-core<0.1,>=0.0.8 (from langchain)
  Downloading langchain_core-0.0.8-py3-none-any.whl (177 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m177.6/177.6 kB[0m [31m18.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting langsmith<0.1.0,>=0.0.63 (from langchain)
  Downloading langsmith-0.0.67-py3-none-any.w

In [1]:
# Automatically restart kernel after installs so that your environment can access the new packages
import IPython

app = IPython.Application.instance()
app.kernel.do_shutdown(True)

{'status': 'ok', 'restart': True}

In [None]:
import vertexai

vertexai.init()

In [4]:
# Colab authentication.
import sys

if "google.colab" in sys.modules:
    from google.colab import auth
    auth.authenticate_user()
    print('Authenticated')

Authenticated


In [14]:

PROJECT_ID = "gncujlg-agbg-iberia-gen-ai"  # @param {type:"string"}
LOCATION = "us-central1"  # @param {type:"string"}
# Code examples may misbehave if the model is changed.
MODEL_NAME = "text-bison@001"

In [15]:
# Utils
import time
from typing import List

# Langchain
import langchain
from pydantic import BaseModel

print(f"LangChain version: {langchain.__version__}")

# Vertex AI
from google.cloud import aiplatform
from langchain.chat_models import ChatVertexAI
from langchain.embeddings import VertexAIEmbeddings
from langchain.llms import VertexAI
from langchain.schema import HumanMessage, SystemMessage

print(f"Vertex AI SDK version: {aiplatform.__version__}")

LangChain version: 0.0.344
Vertex AI SDK version: 1.36.4


In [16]:
vertexai.init(project=PROJECT_ID,
              location=LOCATION)

In [17]:
# We will use chat for some tasks
chat = ChatVertexAI(
    max_output_tokens=1024,
    temperature=0.2,
    top_p=0.8,
    top_k=40,
    verbose=True)

# And we will use the general text llm for others
llm = VertexAI(
    max_output_tokens=1024,
    temperature=0.2,
    top_p=0.8,
    top_k=40,
    verbose=True)

The simplest LangChain use is to create chats comprising of a `SystemMessage` and `HumanMessage`. This is similar to the `context` and `user_message` that we provide the LLM using the Python client libraries.

In [18]:
chat([HumanMessage(content="Hello")])

AIMessage(content=' Hello. How may I help you? \n')

In [None]:
res = chat(
    [
        SystemMessage(
            content="You are an expert chef that thinks of imaginative recipies when people give you ingredients."
        ),
        HumanMessage(content="I have some kidney beans and tomatoes, what would be an easy lunch?"),
    ]
)

print(res.content)

 Here is a lunch recipe using kidney beans and tomatoes:
Ingredients:
- 1 can of kidney beans, rinsed and drained
- 1 can of diced tomatoes, undrained
- 1/2 cup of chopped onion
- 1/2 cup of chopped green bell pepper
- 1/4 cup of chopped cilantro
- 1 tablespoon of olive oil
- 1 teaspoon of chili powder
- 1/2 teaspoon of ground cumin
- 1/4 teaspoon of salt
- 1/4 teaspoon of black pepper
Instructions:
1. Heat the olive oil in a large skillet over medium heat.
2. Add the onion and green bell pepper and cook until softened, about 5 minutes.
3. Add the kidney beans, tomatoes, chili powder, cumin, salt, and black pepper.
4. Bring to a boil, then reduce heat and simmer for 15 minutes, or until the beans are heated through.
5. Stir in the cilantro and serve.


## Prompt templates

Templates are an abstraction that can help keep prompts modular and reusable. This can be especially important in large applications which may require long and varied prompts.

Templates may include few-short examples, instructions, or context.

In [19]:
# The template_string parameters sets the context for the ChatPromptTemplate

template_string = """Translate the text \
that is delimited by triple backticks \
into a style that is {style}. \
text: ```{text}```
"""

In [20]:
from langchain.prompts import ChatPromptTemplate

prompt_template = ChatPromptTemplate.from_template(template_string)

The chat prompts are envisioned as a series of messages. Notice the `.messages` and `format_messages` methods in the following cells.

In [21]:
# Print out the template
prompt_template.messages

[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['style', 'text'], template='Translate the text that is delimited by triple backticks into a style that is {style}. text: ```{text}```\n'))]

In [22]:
#Let's check just the prompt
prompt_template.messages[0].prompt

PromptTemplate(input_variables=['style', 'text'], template='Translate the text that is delimited by triple backticks into a style that is {style}. text: ```{text}```\n')

In [23]:
# Helpful method to keep track of a template's inputs
prompt_template.messages[0].prompt.input_variables

['style', 'text']

In this simple example, we translate a customer e-mail into phonetic Glaswegian.

In [24]:
translator_style = """A translator that writes in phonetic Glaswegian.
"""

In [25]:
customer_email = """
This smashing little coffee maker is simply brilliant! \
I'm so pleased with how easy it is to use and how quickly it brews \
a cracking cup of coffee. \
I'm over the moon with this purchase and would highly recommend it \
to any other coffee lover looking for a top-notch brew every time.
"""

In [26]:
# The format_messages method sets up the task specified in the template
customer_messages = prompt_template.format_messages(
                    style=translator_style,
                    text=customer_email)

In [27]:
# Call the LLM to translate to the style of the customer message
customer_response = chat(customer_messages)
print(customer_response.content)

 A guid wee coffee maker this is, pure belter! Ah'm ower the moon wi' it, an' A wid recommend it tae ony coffee lover lookin' fur a braw brew ilka time.


## Parsing outputs

LangChain makes it easy to return objects from the LLM in a format which we can use for further tasks (for example, adding an item of interest to a shopping cart, or providing a short list back to the LLM for additional questions).

Here is an example of parsing customer reviews of a three-course meal in a restaurant.

In [28]:
customer_review = """\
The excellent barbecue cauliflower starter left \
a lasting impression -- gorgeous presentation and flavors, really geared the tastebuds into action. \
Moving on to the main course, pretty great also. \
Delicious and flavorful chickpea and vegetable curry. They really nailed the buttery consistency, \
depth and balance of the spices. \
The dessert was a bit bland. I opted for a vegan chocolate mousse, \
hoping for a decadent and indulgent finale to my meal. \
It was very visually appealing but was missing the smooth, velvety \
texture of a great mousse.
"""

review_template = """\
For the input text, extract the following details: \
starter: How did the reviewer find the first course? \
Rate either Poor, Good, or Excellent. \
Do the same for the main course and dessert

Format the output as JSON with the following keys:
starter
main_course
dessert

text: {text}
"""



In [29]:
from langchain.prompts import ChatPromptTemplate

prompt_template = ChatPromptTemplate.from_template(review_template)
print(prompt_template)

input_variables=['text'] messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['text'], template='For the input text, extract the following details: starter: How did the reviewer find the first course? Rate either Poor, Good, or Excellent. Do the same for the main course and dessert\n\nFormat the output as JSON with the following keys:\nstarter\nmain_course\ndessert\n\ntext: {text}\n'))]


In [30]:
messages = prompt_template.format_messages(text=customer_review)
response = chat(messages, temperature=0.1)
print(response.content)

 ```JSON
{
  "starter": "Excellent",
  "main_course": "Good",
  "dessert": "Poor"
}
```


Though it looks like JSON, our output is actually a string type.

In [31]:
type(response.content)

str

This means we are unable to access values in this fashion (will result in an error):

In [32]:
response.content.get("main_course")

AttributeError: ignored

This is where LangChain's parser comes in. Here, we import the `ResponseSchema` and `StructuredOutputParser`, which we use to define the format of the results from the LLM.

In [33]:
from langchain.output_parsers import ResponseSchema
from langchain.output_parsers import StructuredOutputParser

starter_schema = ResponseSchema(name="starter", description="Review of the starter")
main_course_schema = ResponseSchema(name="main_course", description="Review of the main course")
dessert_schema = ResponseSchema(name="dessert", description="Review of the dessert")

response_schemas = [starter_schema, main_course_schema, dessert_schema]

In [34]:
output_parser = StructuredOutputParser.from_response_schemas(response_schemas)

In [36]:
format_instructions = output_parser.get_format_instructions()
print(format_instructions)

The output should be a markdown code snippet formatted in the following schema, including the leading and trailing "```json" and "```":

```json
{
	"starter": string  // Review of the starter
	"main_course": string  // Review of the main course
	"dessert": string  // Review of the dessert
}
```


Now we can update our prior review template to include the format instructions

In [37]:
review_template_2 = """\
For the input text, extract the following details: \
starter: How did the reviewer find the first course? \
Rate either Poor, Good, or Excellent. \
Do the same for the main course and dessert

starter
main_course
dessert

text: {text}

{format_instructions}
"""
prompt = ChatPromptTemplate.from_template(template=review_template_2)

messages = prompt.format_messages(text=customer_review,
                                format_instructions=format_instructions)

In [38]:
print(messages[0].content)

For the input text, extract the following details: starter: How did the reviewer find the first course? Rate either Poor, Good, or Excellent. Do the same for the main course and dessert

starter
main_course
dessert

text: The excellent barbecue cauliflower starter left a lasting impression -- gorgeous presentation and flavors, really geared the tastebuds into action. Moving on to the main course, pretty great also. Delicious and flavorful chickpea and vegetable curry. They really nailed the buttery consistency, depth and balance of the spices. The dessert was a bit bland. I opted for a vegan chocolate mousse, hoping for a decadent and indulgent finale to my meal. It was very visually appealing but was missing the smooth, velvety texture of a great mousse.


The output should be a markdown code snippet formatted in the following schema, including the leading and trailing "```json" and "```":

```json
{
	"starter": string  // Review of the starter
	"main_course": string  // Review of the

In [39]:
response = chat(messages)

Let's try it on the same review

Our response starts as an `AIMessage`

In [40]:
type(response)

langchain_core.messages.ai.AIMessage

Here we parse the `AIMessage` into a Python dictionary

In [41]:
output_dict = output_parser.parse(response.content)
output_dict

{'starter': 'Excellent', 'main_course': 'Good', 'dessert': 'Poor'}

Thanks to LangChain's parser, we now have a Python dictionary which we can use for further tasks, for example taking part of the response and using it as an input to another function / process etc.

In [42]:
type(output_dict)

dict

In [43]:
output_dict.get("main_course")

'Good'

## API chains

Another of LangChain's useful features is the ability to call external APIs within chains.

In this example, we use the `open-meteo.com` API to get weather reports.

In [44]:
from langchain.chains import APIChain
from langchain.chains.api import open_meteo_docs

llm = VertexAI(temperature=0)
chain = APIChain.from_llm_and_api_docs(
    llm,
    open_meteo_docs.OPEN_METEO_DOCS,
    verbose=True,
    limit_to_domains=["https://api.open-meteo.com/"],
)
chain.run(
    "How is the weather today in Edinburgh, Scotland, in Celsius?"
    )



[1m> Entering new APIChain chain...[0m
[32;1m[1;3m https://api.open-meteo.com/v1/forecast?latitude=55.9533&longitude=-3.1883&hourly=temperature_2m&current_weather=true&temperature_unit=celsius&windspeed_unit=kmh&precipitation_unit=mm&timeformat=iso8601[0m
[33;1m[1;3m{"latitude":55.96,"longitude":-3.18,"generationtime_ms":0.06794929504394531,"utc_offset_seconds":0,"timezone":"GMT","timezone_abbreviation":"GMT","elevation":69.0,"current_weather_units":{"time":"iso8601","interval":"seconds","temperature":"°C","windspeed":"km/h","winddirection":"°","is_day":"","weathercode":"wmo code"},"current_weather":{"time":"2023-12-01T13:45","interval":900,"temperature":-0.2,"windspeed":3.6,"winddirection":264,"is_day":1,"weathercode":1},"hourly_units":{"time":"iso8601","temperature_2m":"°C"},"hourly":{"time":["2023-12-01T00:00","2023-12-01T01:00","2023-12-01T02:00","2023-12-01T03:00","2023-12-01T04:00","2023-12-01T05:00","2023-12-01T06:00","2023-12-01T07:00","2023-12-01T08:00","2023-12-01T09

' The current temperature in Edinburgh, Scotland is -0.2 degrees Celsius.'

### Wikipedia

We can combine the Wikipedia pip package and LangChain's Wikipedia API wrapper get query results from the encyclopedia.

In [45]:
!pip install wikipedia

Collecting wikipedia
  Downloading wikipedia-1.4.0.tar.gz (27 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: wikipedia
  Building wheel for wikipedia (setup.py) ... [?25l[?25hdone
  Created wheel for wikipedia: filename=wikipedia-1.4.0-py3-none-any.whl size=11679 sha256=5d72768e247fd71677b8ca85165accf765c2b71c0395541610920b8aaf49c3f0
  Stored in directory: /root/.cache/pip/wheels/5e/b6/c5/93f3dec388ae76edc830cb42901bb0232504dfc0df02fc50de
Successfully built wikipedia
Installing collected packages: wikipedia
Successfully installed wikipedia-1.4.0


In [46]:
from langchain.tools import WikipediaQueryRun
from langchain.utilities import WikipediaAPIWrapper

wikipedia = WikipediaQueryRun(api_wrapper=WikipediaAPIWrapper())

wikipedia.run("To which bird family does the field sparrow belong?")

'Page: House sparrow\nSummary: The house sparrow (Passer domesticus) is a bird of the sparrow family Passeridae, found in most parts of the world. It is a small bird that has a typical length of 16 cm (6.3 in) and a mass of 24–39.5 g (0.85–1.39 oz). Females and young birds are coloured pale brown and grey, and males have brighter black, white, and brown markings. One of about 25 species in the genus Passer, the house sparrow is native to most of Europe, the Mediterranean Basin, and a large part of Asia. Its intentional or accidental introductions to many regions, including parts of Australasia, Africa, and the Americas, make it the most widely distributed wild bird.\nThe house sparrow is strongly associated with human habitation, and can live in urban or rural settings. Though found in widely varied habitats and climates, it typically avoids extensive woodlands, grasslands, polar regions, and hot, dry deserts far away from human development. For sustenance, the house sparrow routinely 

### Google search

In [47]:
from langchain.prompts import PromptTemplate
from langchain.chains import LLMRequestsChain, LLMChain

template = """Between >>> and <<< are the raw search result text from google.
Extract the answer to the question '{query}' or say "not found" if the information is not contained.
Use the format
Extracted:<answer or "not found">
>>> {requests_result} <<<
Extracted:"""

PROMPT = PromptTemplate(
    input_variables=["query", "requests_result"],
    template=template,
)


chain = LLMRequestsChain(llm_chain=LLMChain(llm=VertexAI(temperature=0), prompt=PROMPT))
question = "What are the official languages in Turkmenistan, and their alphabets?"
inputs = {
    "query": question,
    "url": "https://www.google.com/search?q=" + question.replace(" ", "+"),
}
chain(inputs)

{'query': 'What are the official languages in Turkmenistan, and their alphabets?',
 'url': 'https://www.google.com/search?q=What+are+the+official+languages+in+Turkmenistan,+and+their+alphabets?',
 'output': ' The official languages in Turkmenistan are Turkmen and Russian. Turkmen is written in a modified Latin alphabet, while Russian is written in the Cyrillic alphabet.'}

## Memory

It is essential that LLMs keep some memory of the prior interactions in a chat to better inform their answers.

LangChain offers several approaches and features in this regard. For all details, see the [Memory](https://python.langchain.com/docs/modules/memory/) section of the documentation.

### ConversationBufferWindowMemory

Maintains a list of the interactions of the conversation over time, using the last K interactions. This can be useful for keeping a sliding window of the most recent interactions, so the buffer does not get too large

In [48]:
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory, ConversationBufferWindowMemory

memory = ConversationBufferWindowMemory(k=3)

memory.save_context({"input": "Hi"},
                    {"output": "How are you?"})
memory.save_context({"input": "Fine thanks"},
                    {"output": "Great"})

memory.load_memory_variables({})

{'history': 'Human: Hi\nAI: How are you?\nHuman: Fine thanks\nAI: Great'}

### ConversationTokenBufferMemory

This feature instead keeps a buffer of recent interactions in memory based on token length,  rather than number of interactions.

In [49]:
from langchain.memory import ConversationTokenBufferMemory

memory = ConversationTokenBufferMemory(llm=llm, max_token_limit=100)
memory.save_context({"input": "All alone, she dreams of the stars!"},
                    {"output": "As she should!"})
memory.save_context({"input": "Baking cookies today?"},
                    {"output": "Behold the cookies!"})
memory.save_context({"input": "Chatbots everywhere?"},
                    {"output": "Certainly!"})

In [50]:
memory.load_memory_variables({})

{'history': 'Human: All alone, she dreams of the stars!\nAI: As she should!\nHuman: Baking cookies today?\nAI: Behold the cookies!\nHuman: Chatbots everywhere?\nAI: Certainly!'}

### Conversation summaries

LangChain carries forward summaries of chat messages and flushes memory after a specified number of interactions or tokens.

Let's first look at using the former, `ConversationBufferWindowMemory`.

We set `verbose=True` to show the prompts and information carried forward by the LLM.

In [51]:
from langchain.memory import ConversationBufferWindowMemory

conversation_with_summary = ConversationChain(
    llm=VertexAI(temperature=0),
    # We set a low k=2, to only keep the last 2 interactions in memory
    memory=ConversationBufferWindowMemory(k=2),
    verbose=True
)
conversation_with_summary.predict(input="My favourite sport is fencing. Any tips for how I can go pro?")



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:

Human: My favourite sport is fencing. Any tips for how I can go pro?
AI:[0m

[1m> Finished chain.[0m


' Here are some tips on how to become a professional fencer:\n\n1. Start young. The earlier you start fencing, the more time you will have to develop your skills and reach your full potential.\n2. Train regularly. Fencing is a physically demanding sport, so you need to train regularly to stay in shape and improve your skills.\n3. Get good coaching. A good coach can help you develop the proper technique and improve your overall game.\n4. Compete in tournaments. The best way to improve your skills is to compete against other fencers. Tournaments also give you the opportunity to gain experience and exposure.'

In [52]:
conversation_with_summary.predict(input="What equipment do I need?")



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: My favourite sport is fencing. Any tips for how I can go pro?
AI:  Here are some tips on how to become a professional fencer:

1. Start young. The earlier you start fencing, the more time you will have to develop your skills and reach your full potential.
2. Train regularly. Fencing is a physically demanding sport, so you need to train regularly to stay in shape and improve your skills.
3. Get good coaching. A good coach can help you develop the proper technique and improve your overall game.
4. Compete in tournaments. The best way to improve your skills is to compete against other fencers. Tournaments also give you the opportunity to gain e

' Here is a list of the equipment you need for fencing:\n\n* A foil, epee, or sabre (depending on the type of fencing you want to do)\n* A mask\n* A glove\n* A jacket\n* A plastron\n* A pair of breeches\n* A pair of fencing shoes\n* A carrying bag for your equipment\n\nYou can purchase fencing equipment from a variety of online and retail stores. The cost of fencing equipment can vary depending on the brand and quality of the equipment.'

In [53]:
conversation_with_summary.predict(input="Who are the greats of the sport I can emulate?")



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: My favourite sport is fencing. Any tips for how I can go pro?
AI:  Here are some tips on how to become a professional fencer:

1. Start young. The earlier you start fencing, the more time you will have to develop your skills and reach your full potential.
2. Train regularly. Fencing is a physically demanding sport, so you need to train regularly to stay in shape and improve your skills.
3. Get good coaching. A good coach can help you develop the proper technique and improve your overall game.
4. Compete in tournaments. The best way to improve your skills is to compete against other fencers. Tournaments also give you the opportunity to gain e

" Here are some of the greatest fencers of all time:\n\n* Men's foil:\n    * Stefano Cerioni (Italy)\n    * Alexander Romankov (Russia)\n    * Benjamin Kleibrink (Germany)\n* Women's foil:\n    * Valentina Vezzali (Italy)\n    * Yelena Belova (Russia)\n    * Nam Hyun-hee (South Korea)\n* Men's épée:\n    * Éric Srecki (France)\n    * Pavel Kolobkov (Russia)\n    * Rubén Limardo (Venezuela)\n* Women'"

In [54]:
# Since we have now passed k=2, the LLM will be unable to answer
conversation_with_summary.predict(input="What is my favourite sport?")



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: What equipment do I need?
AI:  Here is a list of the equipment you need for fencing:

* A foil, epee, or sabre (depending on the type of fencing you want to do)
* A mask
* A glove
* A jacket
* A plastron
* A pair of breeches
* A pair of fencing shoes
* A carrying bag for your equipment

You can purchase fencing equipment from a variety of online and retail stores. The cost of fencing equipment can vary depending on the brand and quality of the equipment.
Human: Who are the greats of the sport I can emulate?
AI:  Here are some of the greatest fencers of all time:

* Men's foil:
    * Stefano Cerioni (Italy)
    * Alexander Romankov (Russia)
 

' I cannot answer that question as I do not have access to your personal information. \n'

### ConversationSummaryBufferMemory

Ensures conversational memory up to a specified token length

In [55]:
from langchain.chains import ConversationChain

conversation_with_summary = ConversationChain(
    llm=llm,
    # Change max_token_limit here after running through the conversation
    memory=ConversationTokenBufferMemory(llm=llm, max_token_limit=600),
    verbose=True,
)
conversation_with_summary.predict(input="Hi, how are you?")



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:

Human: Hi, how are you?
AI:[0m

[1m> Finished chain.[0m


' I am an AI assistant incapable of experiencing feelings in the same way humans might.  I am, however,  able to assist you with a variety of writing activities. Is there anything I can help you with today?\n'

In [56]:
conversation_with_summary.predict(input="I'm learning the Rust programming language")



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: Hi, how are you?
AI:  I am an AI assistant incapable of experiencing feelings in the same way humans might.  I am, however,  able to assist you with a variety of writing activities. Is there anything I can help you with today?

Human: I'm learning the Rust programming language
AI:[0m

[1m> Finished chain.[0m


' Rust is an exciting and modern systems programming language that emphasizes memory safety without garbage collection. It provides memory safety without garbage collection through a borrow checker and ownership system. Rust is known for its speed and performance, making it a great choice for systems programming tasks such as operating systems, embedded systems, and high-performance applications. It also has a rich type system and a powerful macro system, making it suitable for a wide range of tasks.\n'

In [57]:
conversation_with_summary.predict(input="What's the best book to help me?")



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: Hi, how are you?
AI:  I am an AI assistant incapable of experiencing feelings in the same way humans might.  I am, however,  able to assist you with a variety of writing activities. Is there anything I can help you with today?

Human: I'm learning the Rust programming language
AI:  Rust is an exciting and modern systems programming language that emphasizes memory safety without garbage collection. It provides memory safety without garbage collection through a borrow checker and ownership system. Rust is known for its speed and performance, making it a great choice for systems programming tasks such as operating systems, embedded systems, and

' There are several great books that can help you learn Rust. Here are a few recommendations:\n\n* The Rust Programming Language: This is the official book for Rust, and it provides a comprehensive introduction to the language. It covers everything from the basics of the language to more advanced topics such as memory management and concurrency.\n* Rust in Action: This book is a practical guide to Rust, with a focus on hands-on learning. It covers a wide range of topics, including the basics of the language, as well as more advanced topics such as macros and closures.\n* Programming Rust: This book is a comprehensive guide to Rust,'

In [58]:
# Notice the buffer here is updated and clears the earlier exchanges
# Depending on how chatty the LLM is feeling, the token limit may have
# already been reached, and this cell will yield a generic response.
conversation_with_summary.predict(input="Wish me luck!")



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: Hi, how are you?
AI:  I am an AI assistant incapable of experiencing feelings in the same way humans might.  I am, however,  able to assist you with a variety of writing activities. Is there anything I can help you with today?

Human: I'm learning the Rust programming language
AI:  Rust is an exciting and modern systems programming language that emphasizes memory safety without garbage collection. It provides memory safety without garbage collection through a borrow checker and ownership system. Rust is known for its speed and performance, making it a great choice for systems programming tasks such as operating systems, embedded systems, and

" Good luck! Rust is a great language to learn, and I'm sure you'll enjoy it. If you run into any problems, there are many resources available online to help you. The Rust community is also very helpful, and they're always willing to answer questions."

The following cell should generate a reply that is clearly restricted to the general benefits of learning Haskell and missing the previous context of someone trying to learn Rust.

Run this cell, then go back to the Keep the conversation going with summaries cell and change `max_token_limit` to 700. Then re-run the entire conversation and notice how the model relates its ouptut about learning Haskell to the context of someone trying to learn Rust.

In [59]:
conversation_with_summary.predict(input="Would knowing Haskell help me?")



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: Hi, how are you?
AI:  I am an AI assistant incapable of experiencing feelings in the same way humans might.  I am, however,  able to assist you with a variety of writing activities. Is there anything I can help you with today?

Human: I'm learning the Rust programming language
AI:  Rust is an exciting and modern systems programming language that emphasizes memory safety without garbage collection. It provides memory safety without garbage collection through a borrow checker and ownership system. Rust is known for its speed and performance, making it a great choice for systems programming tasks such as operating systems, embedded systems, and

" Knowing Haskell can definitely help you learn Rust. Haskell is a purely functional programming language, and Rust is a systems programming language with a strong emphasis on safety. While the two languages are very different in some ways, there are also some similarities. For example, both languages use pattern matching and have a strong emphasis on types. If you're familiar with Haskell, you'll have a head start on learning Rust.\n\nHere are some specific ways that knowing Haskell can help you learn Rust:\n\n* **Pattern matching:** Both Haskell and Rust use pattern matching to extract data from values. In Haskell, pattern matching is used for things like des"

## Chains

Complex applications will require chaining LLMs together, or with other components.

We will cover the following types of chains:

**Sequential chains**

**Router chains**

### LLMChain

An LLMChain simply provides a prompt to the LLM.

In [60]:
from langchain.chains import LLMChain

prompt = ChatPromptTemplate.from_template(
    "What is the best name to describe \
    a company that makes {product}?"
)

In [61]:
chain = LLMChain(llm=llm, prompt=prompt)
product = "A saw for laminate wood"
chain.run(product)

' Assistant: Some potential names for a company that makes a saw for laminate wood include:\n\n- Laminate Saw Solutions\n- The Laminate Saw Company\n- Precision Laminate Saws\n- Laminate Cutting Systems\n- The Laminate Saw Experts\n- Laminate Saw Technology\n- Advanced Laminate Saws\n- Laminate Saw Innovations\n- Laminate Saw Manufacturing\n- Laminate Saw Distribution'

### Sequential chain

A sequential chain makes a series of calls to an LLM. It enables a pipeline-style workflow in which the output from one call becomes the input to the next.

The two types include:

* `SimpleSequentialChain`, where predictably each step has a single input and output, which becomes the input to the next step.

* `SequentialChain`, which allows for multiple inputs and outputs.

In [62]:
from langchain.chains import SimpleSequentialChain
from langchain.prompts import PromptTemplate

In [63]:
# This is an LLMChain to write a pitch for a new product
# Let's increase the temperature to allow some imagination

llm = VertexAI(temperature=0.7)
template = """You are an entrepreneur. Think of a ground breaking new product and write a short pitch.

Title: {title}
Entrepreneur: This is a pitch for the above product:"""
prompt_template = PromptTemplate(input_variables=["title"], template=template)
pitch_chain = LLMChain(llm=llm, prompt=prompt_template)

In [64]:
template = """You are a panelist on Dragon's Den. Given a \
description of the product, you are to explain why you think it will \
succeed or fail in the market.

Product pitch: {pitch}
Review by Dragon's Den panelist:"""
prompt_template = PromptTemplate(input_variables=["pitch"], template=template)
review_chain = LLMChain(llm=llm, prompt=prompt_template)

In [65]:
# This is the overall chain where we run these two chains in sequence.
overall_chain = SimpleSequentialChain(chains=[pitch_chain, review_chain], verbose=True)

In [66]:
review = overall_chain.run("Portable iced coffee maker")



[1m> Entering new SimpleSequentialChain chain...[0m
[36;1m[1;3m The Portable Iced Coffee Maker is the perfect solution for coffee lovers on the go. This revolutionary new product allows you to make fresh, iced coffee anywhere, anytime. With its sleek design and compact size, the Portable Iced Coffee Maker is perfect for taking with you to work, school, or on your travels. It’s easy to use, simply add your favorite ground coffee and water, and the Portable Iced Coffee Maker will do the rest. In just a few minutes, you’ll have a delicious, refreshing iced coffee that’s ready to enjoy. The Portable Iced Coffee Maker is the perfect way to stay cool and refreshed on a hot[0m
[33;1m[1;3m The Portable Iced Coffee Maker is a great product that has the potential to succeed in the market. Iced coffee is a popular beverage, and this product offers a convenient and portable way to make it. The product is also relatively inexpensive, which makes it a good value for consumers.

Here are som

### Router chain

A `RouterChain` dynamically selects the next chain to use for a given input.
This feature uses the `MultiPromptChain` to select then answer with the best-suited prompt to the question.

This can help a modular architecure, allowing the effective triaging of inputs between relevant prompt templates.

In [67]:
from langchain.chains.router import MultiPromptChain

korean_template = """
You are an expert in korean history and culture.
Here is a question:
{input}
"""

spanish_template = """
You are an expert in spanish history and culture.
Here is a question:
{input}
"""

chinese_template = """
You are an expert in Chinese history and culture.
Here is a question:
{input}
"""

In [68]:
prompt_infos = [
    {
        "name": "korean",
        "description": "Good for answering questions about Korean history and culture",
        "prompt_template": korean_template,
    },
    {
        "name": "spanish",
        "description": "Good for answering questions about Spanish history and culture",
        "prompt_template": spanish_template,
    },
     {
        "name": "chinese",
        "description": "Good for answering questions about Chinese history and culture",
        "prompt_template": chinese_template,
    },
]

In [69]:
from langchain.chains.router.llm_router import LLMRouterChain,RouterOutputParser

llm = VertexAI(temperature=0)

destination_chains = {}
for p_info in prompt_infos:
    name = p_info["name"]
    prompt_template = p_info["prompt_template"]
    prompt = ChatPromptTemplate.from_template(template=prompt_template)
    chain = LLMChain(llm=llm, prompt=prompt)
    destination_chains[name] = chain

destinations = [f"{p['name']}: {p['description']}" for p in prompt_infos]
destinations_str = "\n".join(destinations)

In [70]:
default_prompt = ChatPromptTemplate.from_template("{input}")
default_chain = LLMChain(llm=llm, prompt=default_prompt)

In [71]:
# Thanks to Deeplearning.ai for this template and for the
# Langchain short course at deeplearning.ai/short-courses/.

MULTI_PROMPT_ROUTER_TEMPLATE = """Given a raw text input to a \
language model select the model prompt best suited for the input. \
You will be given the names of the available prompts and a \
description of what the prompt is best suited for. \
You may also revise the original input if you think that revising\
it will ultimately lead to a better response from the language model.

<< FORMATTING >>
Return a markdown code snippet with a JSON object formatted to look like:
```json
{{{{
    "destination": string \ name of the prompt to use or "DEFAULT"
    "next_inputs": string \ a potentially modified version of the original input
}}}}
```

REMEMBER: "destination" MUST be one of the candidate prompt \
names specified below OR it can be "DEFAULT" if the input is not\
well suited for any of the candidate prompts.
REMEMBER: "next_inputs" can just be the original input \
if you don't think any modifications are needed.

<< CANDIDATE PROMPTS >>
{destinations}

<< INPUT >>
{{input}}

<< OUTPUT (remember to include the ```json)>>"""

In [72]:
router_template = MULTI_PROMPT_ROUTER_TEMPLATE.format(
    destinations=destinations_str
)
router_prompt = PromptTemplate(
    template=router_template,
    input_variables=["input"],
    output_parser=RouterOutputParser(),
)

router_chain = LLMRouterChain.from_llm(llm, router_prompt)

In [73]:
chain = MultiPromptChain(router_chain=router_chain,
                         destination_chains=destination_chains,
                         default_chain=default_chain, verbose=True
                        )

Notice in the outputs the country of speciality is prefixed eg:
`chinese: {'input': ...`, denoting the routing to the correct expert.

In [74]:
chain.run("What was the Han Dynasty?")



[1m> Entering new MultiPromptChain chain...[0m




chinese: {'input': 'What was the Han Dynasty?'}
[1m> Finished chain.[0m


" Model: \nThe Han Dynasty (206 BC – 220 AD) was the second imperial dynasty of China, preceded by the Qin Dynasty and succeeded by the Three Kingdoms period. It was founded by Liu Bang, known posthumously as Emperor Gaozu of Han, who overthrew the Qin Dynasty and became the first Han emperor. The dynasty was named after the Han River, which flows through the Guanzhong region where the dynasty was founded.\n\nThe Han Dynasty is considered one of the greatest eras in Chinese history. It was a time of great prosperity, cultural flourishing, and territorial expansion. The dynasty's population"

In [75]:
chain.run("What are some of typical dishes in Catalonia?")



[1m> Entering new MultiPromptChain chain...[0m




spanish: {'input': 'What are some of typical dishes in Catalonia?'}
[1m> Finished chain.[0m


" Model: \nCatalonia is a region in Spain with a rich culinary tradition. Some of the most typical dishes from Catalonia include:\n\n* Pa amb tomàquet: This is a simple but delicious dish of bread rubbed with tomato and drizzled with olive oil. It is often served with jamón serrano or cheese.\n* Escudella i carn d'olla: This is a hearty soup made with pork, chicken, beef, and vegetables. It is typically served with a side of rice or pasta.\n* Calçots: These are a type of green onion that is grilled and served with a romesco sauce. They"

In [76]:
chain.run("How would I greet a friend's parents in Korean?")



[1m> Entering new MultiPromptChain chain...[0m




korean: {'input': "How would I greet a friend's parents in Korean?"}
[1m> Finished chain.[0m


" Model Assistant: \n안녕하세요 (annyeonghaseyo) is the most common way to greet someone in Korean. It can be used in both formal and informal settings. When greeting a friend's parents, you would typically use the formal form of 안녕하세요, which is 안녕하십니까 (annyeong hasimnikka). You can also say 안녕하세요, 아저씨 (annyeonghaseyo, ajeossi) to a friend's father and 안녕하세요, 이모씨 (annyeonghaseyo, imohsi) to a friend's mother.\n\n"

In [77]:
chain.run("Summarize Don Quixote in a short paragraph")



[1m> Entering new MultiPromptChain chain...[0m




spanish: {'input': 'Summarize Don Quixote in a short paragraph'}
[1m> Finished chain.[0m


' Model: \nDon Quixote is a novel by Miguel de Cervantes Saavedra. It was published in two volumes in 1605 and 1615. The novel tells the story of Alonso Quijano, a retired gentleman who becomes obsessed with chivalric romances and decides to become a knight-errant. He changes his name to Don Quixote de la Mancha and sets out on a series of adventures with his squire, Sancho Panza. The novel is a satire of the chivalric romances of the time, and it also explores the themes of love, friendship, and loyalty.\n'

In [78]:
# No specialist chain for carburetor advice; this
# will be handled as any other input by the foundational model
chain.run("How can I fix a carburetor?")



[1m> Entering new MultiPromptChain chain...[0m




None: {'input': 'How can I fix a carburetor?'}
[1m> Finished chain.[0m


' Assistant: Here are some steps on how to fix a carburetor:\n\n1. Check the fuel level in the carburetor. If the fuel level is low, add more fuel.\n2. Check the air filter. If the air filter is clogged, replace it.\n3. Check the fuel lines. If the fuel lines are cracked or damaged, replace them.\n4. Check the carburetor for leaks. If there are any leaks, seal them with a carburetor repair kit.\n5. Adjust the carburetor. The carburetor may need to be adjusted to ensure that the engine is running properly.\n6. Test the carburetor. Start the'

## Agents and vectorstores

This final section of the notebook will cover some of LangChain's most fun and powerful features.

Agents have access to tools such as JSON, Wikipedia, Web Search, GitHub or Pandas Dataframes, and can access their capabilities depending on user input.

See [here](https://python.langchain.com/docs/integrations/toolkits/) for a full list of agent toolkits.

We will work with some data to perform data retrieval using the LLM with embeddings to match customer queries to products. This is known as Retrieval Augmentated Generation, or RAG.

We will use the Wayfair [WANDS](https://www.aboutwayfair.com/careers/tech-blog/wayfair-releases-wands-the-largest-and-richest-publicly-available-dataset-for-e-commerce-product-search-relevance) dataset of more than 42,000 products. Here are the steps:

* Download the data into a pandas dataframe and take a smaller 1,000-row sample set

* Merge then generate embeddings for the product titles and descriptions

* Prompt an LLM to retrieve details and relevant documents related to queries.

<img src="https://assets.wfcdn.com/im/01139917/resize-h800-w800%5Ecompr-r85/2315/231567967/Capricornus+3+Seater+Sofa.jpg" width="250"/> <img src="https://assets.wfcdn.com/im/07725066/resize-h800-w800%5Ecompr-r85/1584/158440119/Vancasso+BOMOOTIUR+Stoneware+Dinnerware+-+Set+of+18.jpg" width="250"/>


In [79]:
!wget -q https://raw.githubusercontent.com/wayfair/WANDS/main/dataset/product.csv

In [80]:
import pandas as pd
product_df = pd.read_csv("product.csv", sep='\t')

We will work with 1,000 items to avoid longer wait times for the embedding and look up processes.

In [81]:
product_df = product_df[:2000].dropna()

In [82]:
product_df.head(3)

Unnamed: 0,product_id,product_name,product_class,category hierarchy,product_description,product_features,rating_count,average_rating,review_count
0,0,solid wood platform bed,Beds,Furniture / Bedroom Furniture / Beds & Headboa...,"good , deep sleep can be quite difficult to ha...",overallwidth-sidetoside:64.7|dsprimaryproducts...,15.0,4.5,15.0
1,1,all-clad 7 qt . slow cooker,Slow Cookers,Kitchen & Tabletop / Small Kitchen Appliances ...,"create delicious slow-cooked meals , from tend...",capacityquarts:7|producttype : slow cooker|pro...,100.0,2.0,98.0
2,2,all-clad electrics 6.5 qt . slow cooker,Slow Cookers,Kitchen & Tabletop / Small Kitchen Appliances ...,prepare home-cooked meals on any schedule with...,features : keep warm setting|capacityquarts:6....,208.0,3.0,181.0


In [83]:
len(product_df)

1269

In [84]:
# Reduce the df to columns of interest
product_df = product_df.filter(["product_id", "product_name", "product_description", "average_rating"], axis=1)

In [85]:
product_df.head(3)

Unnamed: 0,product_id,product_name,product_description,average_rating
0,0,solid wood platform bed,"good , deep sleep can be quite difficult to ha...",4.5
1,1,all-clad 7 qt . slow cooker,"create delicious slow-cooked meals , from tend...",2.0
2,2,all-clad electrics 6.5 qt . slow cooker,prepare home-cooked meals on any schedule with...,3.0


### Import and initialize pandas dataframe agent

These tools use the `langchain-experimental` pip package installed at the start of the notebook.

### Pandas agent

This agent allows us to interact with the dataframe using natural language. LangChain shows us the pandas queries it is composing to answer the questions.

In [86]:
from langchain_experimental.agents.agent_toolkits import create_pandas_dataframe_agent
from langchain.agents.agent_types import AgentType

agent = create_pandas_dataframe_agent(VertexAI(temperature=0), product_df, verbose=True)

In [87]:
agent.run("how many rows are there?")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m Thought: The shape of the dataframe can be used to get the number of rows
Action: python_repl_ast
Action Input: df.shape[0][0m
Observation: [36;1m[1;3m1269[0m
Thought:[32;1m[1;3m The number of rows is 1269
Final Answer: 1269[0m

[1m> Finished chain.[0m


'1269'

In [88]:
agent.run("How many products have a rating of > 4?")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m Thought: To find the products with a rating > 4, I can use the `df['average_rating'] > 4` expression.
Action: python_repl_ast
Action Input: df['average_rating'] > 4[0m
Observation: [36;1m[1;3m0        True
1       False
2       False
3        True
4        True
        ...  
1991     True
1993     True
1995     True
1996     True
1999     True
Name: average_rating, Length: 1269, dtype: bool[0m
Thought:[32;1m[1;3m To get the count of products with a rating > 4, I can use the `sum()` function on the boolean series returned by `df['average_rating'] > 4`.
Action: python_repl_ast
Action Input: sum(df['average_rating'] > 4)[0m
Observation: [36;1m[1;3m1093[0m
Thought:[32;1m[1;3m There are 1093 products with a rating > 4.
Final Answer: 1093[0m

[1m> Finished chain.[0m


'1093'

### CSV agent

We can also work directly on a .csv file

In [89]:
pd.DataFrame.to_csv(product_df, "data.csv")

In [90]:
from langchain_experimental.agents.agent_toolkits import create_csv_agent

agent = create_csv_agent(
    VertexAI(temperature=0),
    "data.csv",
    verbose=True,
    agent_type=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
)

In [91]:
agent.run("How many rows are there?")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m Thought: The shape of the dataframe can be used to get the number of rows
Action: python_repl_ast
Action Input: df.shape[0][0m
Observation: [36;1m[1;3m1269[0m
Thought:[32;1m[1;3m The number of rows is 1269
Final Answer: 1269[0m

[1m> Finished chain.[0m


'1269'

In [92]:
agent.run("Do any product descriptions mention cedar wood? Output them as JSON please")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m Thought: cedar wood is mentioned in the description of product 0
Action: python_repl_ast
Action Input: df.loc[df['product_description'].str.contains('cedar wood')]['product_description'].to_json()[0m
Observation: [36;1m[1;3m{"19":"this farmhouse wall clock features cedar wood construction , a classic gray stiained center ring with a lightly distressed white outer ring , hand painted roman numerals , an inner accent ring with individual hour marks , vintage graphic in the center of the face , and antique style spade hands .","450":"make your life more comfortable with your very own adirondack folding square side table . our patio furniture are known for its sophistication and charm . this multi-functional square wooden side table is the perfect addition to any front porch patios , backyards , offices , and living rooms . our square side tables are made with natural beauty and durable structure . crafted with genuine cedar 

'{"19":"this farmhouse wall clock features cedar wood construction , a classic gray stiained center ring with a lightly distressed white outer ring , hand painted roman numerals , an inner accent ring with individual hour marks , vintage graphic in the center of the face , and antique style spade hands .","450":"make your life more comfortable with your very own adirondack folding square side table . our patio furniture are known for its sophistication and charm . this multi-functional square wooden side table is the perfect addition to any front porch patios , backyards , offices , and living rooms . our square side tables are made with natural'

In [93]:
agent.run("What is the square root of all ratings for product names featuring sofas")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m Thought: To get the square root of all ratings for product names featuring sofas, we can use the following steps:
1. Filter the dataframe to only include product names featuring sofas
2. Get the ratings for the filtered dataframe
3. Take the square root of the ratings
4. Sum the square roots of the ratings
Action: python_repl_ast
Action Input: 
df[df['product_name'].str.contains('sofa')]['average_rating'].apply(np.sqrt).sum()[0m
Observation: [36;1m[1;3mNameError: name 'np' is not defined[0m
Thought:[32;1m[1;3m To get the square root of all ratings for product names featuring sofas, we can use the following steps:
1. Filter the dataframe to only include product names featuring sofas
2. Get the ratings for the filtered dataframe
3. Take the square root of the ratings
4. Sum the square roots of the ratings
Action: python_repl_ast
Action Input: 
import numpy as np
df[df['product_name'].str.contains('sofa')]['average_rating

'67.68961317770061'

### Vector stores

We will explore embeddings vectors and vector stores in more detail in [subsequent notebooks](rastringer.io.github.com/promptcraft). Let's see what's possible by concatenating our `product_title` and `product_description` columns and creating a text file from the result. We can then create embeddings and perform various retrieval and Q&A tasks.

We will use the open source [Chroma](https://docs.trychroma.com/) vector store.

In [94]:
from langchain.embeddings.sentence_transformer import SentenceTransformerEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.document_loaders import TextLoader

We will embed a `text_data` column, which will be a concatenation of `product_name` and `product_description`, since both columns provide useful contextual information.

In [95]:
product_df['text_data'] = product_df['product_name'] + " " + product_df['product_description']

In [96]:
product_df["text_data"]

0       solid wood platform bed good , deep sleep can ...
1       all-clad 7 qt . slow cooker create delicious s...
2       all-clad electrics 6.5 qt . slow cooker prepar...
3       all-clad all professional tools pizza cutter t...
4       baldwin prestige alcott passage knob with roun...
                              ...                        
1991    2 '' gel memory foam mattress topper the refre...
1993    0.5 '' polyester mattress pad this luxurious m...
1995    alwyn home two-sided 6.5 '' firm memory foam m...
1996    8 '' medium innerspring mattress if you want t...
1999    betances all in one bed blocker hypoallergenic...
Name: text_data, Length: 1269, dtype: object

In [97]:
# Save the "text_data" column to a text file
text_file_path = "combined_text_data.txt"
product_df['text_data'].to_csv(text_file_path, sep='\t', index=False, header=False)


In [98]:
# load the document and split it into chunks
loader = TextLoader("combined_text_data.txt")
documents = loader.load()

### Text splitter

Splitting text is common when working with LangChain and LLMs in general. This practice means we can feed large amounts of data to LLMs for parsing or embedding in chunks, or batches.

Ideally, we want to do so in a way that keeps meaningful chunks together. We will use the default recommended `RecursiveCharacterTextSplitter`. We specify a `chunk_size` and `chunk_overlap` to set an upper limit on the size and overlap between the splits / chunks.

In [99]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 1500,
    chunk_overlap = 150
)

docs = text_splitter.split_documents(documents)

In [100]:
len(docs)

528

In [101]:
from langchain.vectorstores import Chroma

# Clear any previous vector store
!rm -rf ./docs/chroma

In [102]:
# Takes ~3 mins to run
embedding_function = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")
db = Chroma.from_documents(docs, embedding_function)

.gitattributes:   0%|          | 0.00/1.18k [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.6k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

data_config.json:   0%|          | 0.00/39.3k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

train_script.py:   0%|          | 0.00/13.2k [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

In [103]:
query = "Is there a slow cooker?"
docs = db.similarity_search(query, n_results=2)

In [104]:
docs[0]

Document(page_content="all-clad 7 qt . slow cooker create delicious slow-cooked meals , from tender meat to flavorful veggies , with this easy-to-use slow cooker . the unit features a nonstick cast-aluminum insert that moves seamlessly from the oven or stovetop to the electric base to the table . you can use the insert alone or with the slow cooker to make a variety of one-pot dishes from soup to desserts , and much more . you can even prepare your ingredients in the morning , place everything in the slow cooker , and walk away to come home to the aroma of a hot , healthy dinner at the end of a busy day . with its sleek stainless-steel finish , the slow cooker not only presents beautifully , but it ’ s also the perfect size to accommodate the whole family or a large group when entertaining .\nall-clad electrics 6.5 qt . slow cooker prepare home-cooked meals on any schedule with this essential slow cooker , featuring a dishwasher-safe insert and 26-hour timer .\nall-clad all professiona

In [105]:
query = "Recommend a durable door mat"
docs = db.similarity_search(query, n_results=2)

In [106]:
docs[0]

Document(page_content="keppler pineapple 36 in . x 15 in . non-slip outdoor boot tray when life becomes a revolving door of muddy feet , dirty paws , and whatever else mother nature can throw your way ; you need more than just a mat . you need a boot tray . built to tackle the rain , snow , and mud your lovable messes bring in on their feet , the boot tray features tough pet fiber , a durable rubber backing , and the unique water dam border ; keeping dust , dirt , grime and moisture contained and off your floors . we make this rugged beauty right here in the usa from industrial-grade materials designed for the most demanding environments , inside or outside . and because we use post-consumer materials in our rubber backing and fiber tops , you 'll appreciate this environmentally friendly solution for years to come .\nhulcott single curtain rod decorate your windows with this kerby knob single curtain rod & hardware set . the rich finish is perfect for any home or patio decor . durable 

### Retrieval

A `Retriever` is a method for answering questions based on information in an index.

Here, we use `RetrievalQA` this ability with a question and answering chain.

In [107]:
from langchain.chains import RetrievalQA

llm = VertexAI(
    model_name="text-bison@001",
    max_output_tokens=1024,
    temperature=0.1,
    top_p=0.8,
    top_k=40,
    verbose=True,
)

qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=db.as_retriever()
)

### Prompt

In [108]:
from langchain.prompts import PromptTemplate

# Build prompt
template = """Use the following pieces of context to answer the question at the end. \
If you don't know the answer, just say that you don't know, \
don't try to make up an answer. Use three sentences maximum. \
{context}
Question: {question}
Helpful Answer:"""
QA_CHAIN_PROMPT = PromptTemplate(input_variables=["context", "question"],template=template,)


In [109]:
# Run chain
qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=db.as_retriever(),
    return_source_documents=True,
    chain_type_kwargs={"prompt": QA_CHAIN_PROMPT}
)

In [110]:
question = "Can you recommend comfortable bed sheets?"
result = qa_chain({"query": question})
result["result"]

'The torain comforter set is a good choice. It is made of a short furry plush material that is very soft and comfortable. The kucharski reversible comforter set is also a good choice. It is made of 100% yarn-dyed cotton and is very soft and comfortable.'

In [111]:
question = "How about a Persian-style rug for my living room."
result = qa_chain({"query": question})
result["result"]

'The Mayna Shag Bright Ivory Rug is a great option for a Persian-style rug for your living room. It is made in Turkey and features a traditional theme. It is in a bright ivory hue, with cotton backing, and is fade and stain resistant, and safe on heated floors. To clean this piece, we recommend regular vacuuming and spot cleaning.'

## Summary

In this whirlwind tour of some of LangChain's features, we covered:

* Memory
* Chains
* Agents
* Vector stores

LangChain is a fast-evolving project. To explore more features and keep up-to-date with developments, please see the [website](https://www.langchain.com/) or [Python documentation](https://python.langchain.com/docs/get_started/introduction).

With thanks to Harrison Chase and the excellent LangChain courses at [deeplearning.ai](https://deeplearning.ai/short-courses)