# LangChain Challenge Lab

![LangChain Logo](https://github.com/rastringer/promptcraft_notebooks/blob/main/images/langchain.png?raw=1)

LangChain is a framework for developing applications infused with LLM magic. In this notebook, we will cover some of its most useful and fun features, including:

* Templates
* Memory
* Working with APIs
* Chains
* Agents
* Vector stores

Let's start by importing some packages. It takes a while so be patient and keep looking for the Done message at the end before moving on to the next step.

In [None]:
! pip install --upgrade google-cloud-aiplatform
# LangChain
! pip install langchain langchain-experimental langchain[docarray]
! pip install -U langchain-google-vertexai
! pip install pypdf
! pip install pydantic==1.10.8
# Open source vector store
! pip install chromadb==0.3.26
! pip install typing-inspect==0.8.0 typing_extensions==4.5.0
# For dense vector representations of text
! pip install sentence-transformers

print('Done')

In [None]:
# Automatically restart kernel after installs so that your environment can access the new packages
import IPython

app = IPython.Application.instance()
app.kernel.do_shutdown(True)

In [None]:
import vertexai

vertexai.init()

In [None]:
# Utils
import time
from typing import List

# Langchain
import langchain
from pydantic import BaseModel

print(f"LangChain version: {langchain.__version__}")

# Vertex AI
from google.cloud import aiplatform
from langchain_google_vertexai import ChatVertexAI
from langchain.embeddings import VertexAIEmbeddings
from langchain_google_vertexai import VertexAI
from langchain.schema import HumanMessage, SystemMessage

print(f"Vertex AI SDK version: {aiplatform.__version__}")

### In the rest of the notebook we will need to do some tasks with a single prompt and some with a chat session. Set up a variable for each of those using the appropriate class. Use the following parameters for both.
 <font color='blue' face="Courier New" size="+2"> 
 max_output_tokens=1024
 <br>
    temperature=0.2
 <br>
    top_p=0.8
 <br>
    top_k=40
 <br>
    verbose=True
    </font>
<br>
<br>
<details><summary>Click for <b>hint</b></summary>
<p>
Take a look at some of the imports we did in the previous step to help identify which classes you need to initialize.
<br>
</p>
</details>

<details><summary>Click for <b>code</b></summary>
<p>

```python

chat = ChatVertexAI(
    max_output_tokens=1024,
    temperature=0.2,
    top_p=0.8,
    top_k=40,
    verbose=True)

llm = VertexAI(
    max_output_tokens=1024,
    temperature=0.2,
    top_p=0.8,
    top_k=40,
    verbose=True)
    
```
</p>
</details>

In [None]:
# We will use chat for some tasks
chat = 

# And we will use the general text llm for others
llm = 

### The simplest LangChain use is to create chats comprising of a  <font color='blue' face="Courier New" size="+3">SystemMessage</font> setting up the context as being something that indicates the LLM should be a creative, expert chef and <font color='blue' face="Courier New" size="+3">HumanMessage</font> in which you ask it to build a recipe with a few ingredients of your choice. This is similar to the <font color='blue' face="Courier New" size="+3">context</font> and <font color='blue' face="Courier New" size="+3">user_message</font> that we provide the LLM using the Python client libraries.
The 
<br>
<details><summary>Click for <b>hint</b></summary>
<p>
Use the  <font color='blue' face="Courier New" size="+2">chat</font> function
<br>
It takes a  <font color='blue' face="Courier New" size="+2">list</font> of parameters
<br>
<br>
</p>
</details>

<details><summary>Click for <b>code</b></summary>
<p>

```python
res = chat(
    [
        SystemMessage(
            content="You are an expert chef that thinks of imaginative recipies when people give you ingredients."
        ),
        HumanMessage(content="I have some kidney beans and tomatoes, what would be an easy lunch?"),
    ]
)

print(res.content)
```
</p>
</details>

## Prompt templates

Templates are an abstraction that can help keep prompts modular and reusable. This can be especially important in large applications which may require long and varied prompts.

Templates may include few-short examples, instructions, or context.

### The <font color='blue' face="Courier New" size="+2">template_string</font> parameters sets the context for the <font color='blue' face="Courier New" size="+2">ChatPromptTemplate</font>
### Initialize the <font color='blue' face="Courier New" size="+2">prompt_template</font> variable below using the provided template string.
<br>
<details><summary>Click for <b>hint</b></summary>
<p>
If you're not sure what method to use in the <font color='blue' face="Courier New" size="+2">ChatPromptTemplate</font> class use <font color='blue' face="Courier New" size="+2">dir</font> to explore its methods.
<br>
</p>
</details>

<details><summary>Click for <b>code</b></summary>
<p>

```python
from langchain.prompts import ChatPromptTemplate

template_string = """
Translate the text that is delimited by triple backticks into a style that is {style}. text: ```{text}```
"""
prompt_template = ChatPromptTemplate.from_template(template_string)
print(prompt_template)
```
</p>
</details>



In [None]:
from langchain.prompts import ChatPromptTemplate

template_string = """Translate the text \
that is delimited by triple backticks \
into a style that is {style}. \
text: ```{text}```
"""
prompt_template = 
print(prompt_template)

The chat prompts can be thought of as a series of messages. Notice the <font color='blue' face="Courier New" size="+2">.messages</font> and <font color='blue' face="Courier New" size="+2">.format_messages</font> methods in the following cells.

In [None]:
# Print out the template
prompt_template.messages

In [None]:
#Let's check just the prompt
prompt_template.messages[0].prompt

In [None]:
# Helpful method to keep track of a template's inputs
prompt_template.messages[0].prompt.input_variables

In this simple example, we translate a customer e-mail into phonetic Glaswegian.

In [None]:
translator_style = """A translator that writes in phonetic Glaswegian."""

In [None]:
customer_email = """
This smashing little coffee maker is simply brilliant! \
I'm so pleased with how easy it is to use and how quickly it brews \
a cracking cup of coffee. \
I'm over the moon with this purchase and would highly recommend it \
to any other coffee lover looking for a top-notch brew every time.
"""

In [None]:
# The format_messages method sets up the task specified in the template
customer_messages = prompt_template.format_messages(
                    style=translator_style,
                    text=customer_email)

In [None]:
# Call the LLM to translate to the style of the customer message
customer_response = chat(customer_messages)
print(customer_response.content)

## Parsing outputs

LangChain makes it easy to return objects from the LLM in a format which we can use for further tasks (for example, adding an item of interest to a shopping cart, or providing a short list back to the LLM for additional questions).

Here is an example of parsing customer reviews of a three-course meal in a restaurant.

In [None]:
customer_review = """\
The excellent barbecue cauliflower starter left \
a lasting impression -- gorgeous presentation and flavors, really geared the tastebuds into action. \
Moving on to the main course, pretty great also. \
Delicious and flavorful chickpea and vegetable curry. They really nailed the buttery consistency, \
depth and balance of the spices. \
The dessert was a bit bland. I opted for a vegan chocolate mousse, \
hoping for a decadent and indulgent finale to my meal. \
It was very visually appealing but was missing the smooth, velvety \
texture of a great mousse.
"""

review_template = """\
For the input text, extract the following details: \
starter: How did the reviewer find the first course? \
Rate either Poor, Good, or Excellent. \
Do the same for the main course and dessert

Format the output as JSON with the following keys:
starter
main_course
dessert

text: {text}
"""



### Using the above text, create the <font color='blue' face="Courier New" size="+2">prompt_template</font> variable from the <font color='blue' face="Courier New" size="+2">review_template</font> variable and the <font color='blue' face="Courier New" size="+2">messages</font> variable from the <font color='blue' face="Courier New" size="+2">customer_review</font> variable and get a response back from a <font color='blue' face="Courier New" size="+2">chat</font> session.
<br>
<details><summary>Click for <b>hint</b></summary>
<p>
<br>
Use the <font color='blue' face="Courier New" size="+2">ChatPromptTemplate</font> class and the <font color='blue' face="Courier New" size="+2">from_template</font> method and <font color='blue' face="Courier New" size="+2">format_messages</font> method.
<br>
<br>
</p>
</details>

<details><summary>Click for <b>code</b></summary>
<p>

```python
from langchain.prompts import ChatPromptTemplate

prompt_template = ChatPromptTemplate.from_template(review_template)
print(prompt_template)
messages = prompt_template.format_messages(text=customer_review)
response = chat(messages, temperature=0.1)
print(response.content)
```
</p>
</details>

In [None]:
from langchain.prompts import ChatPromptTemplate

prompt_template = 
print(prompt_template)

In [None]:
messages = 
response = chat(messages, temperature=0.1)
print(response.content)

Though it looks like JSON, our output is actually a string type.

In [None]:
type(response.content)

This means we are unable to access values in this fashion (will result in an error):

In [None]:
response.content.get("main_course")

This is where LangChain's parser comes in. Here, we import the <font color='blue' face="Courier New" size="+2">ResponseSchema</font> and <font color='blue' face="Courier New" size="+2">StructuredOutputParser</font>, which we use to define the format of the results from the LLM.

In [None]:
from langchain.output_parsers import ResponseSchema
from langchain.output_parsers import StructuredOutputParser

starter_schema = ResponseSchema(name="starter", description="Review of the starter")
main_course_schema = ResponseSchema(name="main_course", description="Review of the main course")
dessert_schema = ResponseSchema(name="dessert", description="Review of the dessert")

response_schemas = [starter_schema, main_course_schema, dessert_schema]

In [None]:
output_parser = StructuredOutputParser.from_response_schemas(response_schemas)

In [None]:
format_instructions = output_parser.get_format_instructions()
print(format_instructions)

### Now we can update our prior review template to include the format instructions
<br>
<details><summary>Click for <b>hint</b></summary>
<p>
Use the <font color='blue' face="Courier New" size="+2">format_messages</font> function and add the <font color='blue' face="Courier New" size="+2">format_instructions</font> parameter before sending the <font color='blue' face="Courier New" size="+2">messages</font> to the <font color='blue' face="Courier New" size="+2">chat</font> function


<br>

<br>
<br>
</p>
</details>

<details><summary>Click for <b>code</b></summary>
<p>

```python
review_template_2 = """\
For the input text, extract the following details: \
starter: How did the reviewer find the first course? \
Rate either Poor, Good, or Excellent. \
Do the same for the main course and dessert

starter
main_course
dessert

text: {text}

{format_instructions}
"""
prompt = ChatPromptTemplate.from_template(template=review_template_2)

messages = prompt.format_messages(text=customer_review,
                                format_instructions=format_instructions)
print(messages[0].content)
response = chat(messages)
                                
```
</p>
</details>

In [None]:
review_template_2 = """\
For the input text, extract the following details: \
starter: How did the reviewer find the first course? \
Rate either Poor, Good, or Excellent. \
Do the same for the main course and dessert

starter
main_course
dessert

text: {text}

{format_instructions}
"""
prompt = ChatPromptTemplate.from_template(template=review_template_2)

messages = 
print(messages[0].content)
response = chat(messages)

Let's try it on the same review

Our response starts as an <font color='blue' face="Courier New" size="+2">AIMessage</font>

In [None]:
type(response)

Here we parse the <font color='blue' face="Courier New" size="+2">AIMessage</font> into a Python dictionary

In [None]:
output_dict = output_parser.parse(response.content)
output_dict

Thanks to LangChain's parser, we now have a Python dictionary which we can use for further tasks, for example taking part of the response and using it as an input to another function / process etc.

In [None]:
type(output_dict)

In [None]:
output_dict.get("main_course")

## API chains

Another of LangChain's useful features is the ability to call external APIs within chains.

In this example, we use the <font color='blue' face="Courier New" size="+2">open-meteo.com</font> API to get weather reports.

In [None]:
from langchain.chains import APIChain
from langchain.chains.api import open_meteo_docs

llm = VertexAI(temperature=0)
chain = APIChain.from_llm_and_api_docs(
    llm,
    open_meteo_docs.OPEN_METEO_DOCS,
    verbose=True,
    limit_to_domains=["https://api.open-meteo.com/"],
)
chain.run(
    "How is the weather today in Edinburgh, Scotland, in Fahrenheit?"
    )

### Wikipedia

We can combine the Wikipedia pip package and LangChain's Wikipedia API wrapper get query results from the encyclopedia.

In [None]:
!pip install wikipedia

In [None]:
from langchain.tools import WikipediaQueryRun
from langchain.utilities import WikipediaAPIWrapper

wikipedia = WikipediaQueryRun(api_wrapper=WikipediaAPIWrapper())

wikipedia.run("To which bird family does the field sparrow belong?")

### Google search


In [None]:
from langchain.prompts import PromptTemplate
from langchain.chains import LLMRequestsChain, LLMChain

template = """Between >>> and <<< are the raw search result text from google.
Extract the answer to the question '{query}' or say "not found" if the information is not contained.
Use the format
Extracted:<answer or "not found">
>>> {requests_result} <<<
Extracted:"""

PROMPT = PromptTemplate(
    input_variables=["query", "requests_result"],
    template=template,
)


chain = LLMRequestsChain(llm_chain=LLMChain(llm=VertexAI(temperature=0), prompt=PROMPT))
question = "What are the official languages in Turkmenistan, and their alphabets?"
inputs = {
    "query": question,
    "url": "https://www.google.com/search?q=" + question.replace(" ", "+"),
}
chain(inputs)

## Memory

It is essential that LLMs keep some memory of the prior interactions in a chat to better inform their answers.

LangChain offers several approaches and features in this regard. For all details, see the [Memory](https://python.langchain.com/docs/modules/memory/) section of the documentation.

### ConversationBufferWindowMemory

Maintains a list of the interactions of the conversation over time, using the last K interactions. This can be useful for keeping a sliding window of the most recent interactions, so the buffer does not get too large

In [None]:
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory, ConversationBufferWindowMemory

memory = ConversationBufferWindowMemory(k=3)

memory.save_context({"input": "Hi"},
                    {"output": "How are you?"})
memory.save_context({"input": "Fine thanks"},
                    {"output": "Great"})

memory.load_memory_variables({})

### ConversationTokenBufferMemory

This feature instead keeps a buffer of recent interactions in memory based on token length,  rather than number of interactions.

In [None]:
from langchain.memory import ConversationTokenBufferMemory

memory = ConversationTokenBufferMemory(llm=llm, max_token_limit=100)
memory.save_context({"input": "All alone, she dreams of the stars!"},
                    {"output": "As she should!"})
memory.save_context({"input": "Baking cookies today?"},
                    {"output": "Behold the cookies!"})
memory.save_context({"input": "Chatbots everywhere?"},
                    {"output": "Certainly!"})

In [None]:
memory.load_memory_variables({})

### Conversation summaries

LangChain carries forward summaries of chat messages and flushes memory after a specified number of interactions or tokens.

Let's first look at using the former, <font color='blue' face="Courier New" size="+2">ConversationBufferWindowMemory</font>.

We set <font color='blue' face="Courier New" size="+2">verbose=True</font> to show the prompts and information carried forward by the LLM.

In [None]:
from langchain.memory import ConversationBufferWindowMemory

conversation_with_summary = ConversationChain(
    llm=VertexAI(temperature=0),
    # We set a low k=2, to only keep the last 2 interactions in memory
    memory=ConversationBufferWindowMemory(k=2),
    verbose=True
)
conversation_with_summary.predict(input="My favourite sport is fencing. Any tips for how I can go pro?")

In [None]:
conversation_with_summary.predict(input="What equipment do I need?")

### Write some code to send the following two messages one after the other:
<br>
Who are the greats of the sport I can emulate?
<br>
What is my favourite sport?
<br>
<details><summary>Click for <b>hint</b></summary>
<p>
<font color='blue' face="Courier New" size="+2"> </font>
<br>

<br>
<br>
</p>
</details>

<details><summary>Click for <b>code</b></summary>
<p>

```python
conversation_with_summary.predict(input="Who are the greats of the sport I can emulate?")
conversation_with_summary.predict(input="What is my favourite sport?")
```
</p>
</details>

### Since we exceeded the k=2, the LLM will be unable to remember our initial message and will be unable to answer this

### ConversationSummaryBufferMemory

Ensures conversational memory up to a specified token length

In [None]:
from langchain.chains import ConversationChain

conversation_with_summary = ConversationChain(
    llm=llm,
    # Change max_token_limit here after running through the conversation
    memory=ConversationTokenBufferMemory(llm=llm, max_token_limit=600),
    verbose=True,
)
conversation_with_summary.predict(input="Hi, how are you?")

In [None]:
conversation_with_summary.predict(input="I'm learning the Rust programming language")

In [None]:
conversation_with_summary.predict(input="What's the best book to help me?")

In [None]:
# Notice the buffer here is updated and clears the earlier exchanges
# Depending on how chatty the LLM is feeling, the token limit may have
# already been reached, and this cell will yield a generic response.
conversation_with_summary.predict(input="Wish me luck!")

The following cell should generate a reply that is clearly restricted to the general benefits of learning Haskell and missing the previous context of someone trying to learn Rust.

Run this cell, then go back to the Keep the conversation going with summaries cell and change <font color='blue' face="Courier New" size="+2">max_token_limit</font> to 700. Then re-run the entire conversation and notice how the model relates its ouptut about learning Haskell to the context of someone trying to learn Rust.

In [None]:
conversation_with_summary.predict(input="Would knowing Haskell help me?")

## Chains

Complex applications will require chaining LLMs together, or with other components.

We will cover the following types of chains:

**Sequential chains**

**Router chains**

### LLMChain

An LLMChain simply provides a prompt to the LLM.

In [None]:
from langchain.chains import LLMChain

prompt = ChatPromptTemplate.from_template(
    "What is the best name to describe \
    a company that makes {product}?"
)

In [None]:
chain = LLMChain(llm=llm, prompt=prompt)
product = "A saw for laminate wood"
chain.run(product)

### Sequential chain

A sequential chain makes a series of calls to an LLM. It enables a pipeline-style workflow in which the output from one call becomes the input to the next.

The two types include:

* <font color='blue' face="Courier New" size="+2">SimpleSequentialChain</font>, where predictably each step has a single input and output, which becomes the input to the next step.

* <font color='blue' face="Courier New" size="+2">SequentialChain</font>, which allows for multiple inputs and outputs.

In [None]:
from langchain.chains import SimpleSequentialChain
from langchain.prompts import PromptTemplate

In [None]:
# This is an LLMChain to write a pitch for a new product
# Let's increase the temperature to allow some imagination

llm = VertexAI(temperature=0.7)
template = """You are an entrepreneur. Think of a ground breaking new product and write a short pitch.

Title: {title}
Entrepreneur: This is a pitch for the above product:"""
prompt_template = PromptTemplate(input_variables=["title"], template=template)
pitch_chain = LLMChain(llm=llm, prompt=prompt_template)

### Based on the pattern you just saw complete them next code block
<br>
<details><summary>Click for <b>hint</b></summary>
<p>
Add a <font color='blue' face="Courier New" size="+2">prompt_template</font> and <font color='blue' face="Courier New" size="+2">pitch_chain</font> as above.
<br>

<br>
<br>
</p>
</details>

<details><summary>Click for <b>code</b></summary>
<p>

```python
template = """You are a panelist on Dragon's Den. Given a \
description of the product, you are to explain why you think it will \
succeed or fail in the market.

Product pitch: {pitch}
Review by Dragon's Den panelist:"""
prompt_template = PromptTemplate(input_variables=["pitch"], template=template)
review_chain = LLMChain(llm=llm, prompt=prompt_template)
```
</p>
</details>


In [None]:
template = """You are a panelist on Dragon's Den. Given a \
description of the product, you are to explain why you think it will \
succeed or fail in the market.

Product pitch: {pitch}
Review by Dragon's Den panelist:"""


### Complete the code block below to run them in the right sequence.
<br>
<details><summary>Click for <b>hint</b></summary>
<p>
Supply <font color='blue' face="Courier New" size="+2">pitch_chain</font> and <font color='blue' face="Courier New" size="+2">review_chain</font> in the <font color='blue' face="Courier New" size="+2">chains parameter</font>.
<br>

<br>
<br>
</p>
</details>

<details><summary>Click for <b>code</b></summary>
<p>

```python
overall_chain = SimpleSequentialChain(chains=[pitch_chain, review_chain], verbose=True)
```
</p>
</details>


In [None]:
# This is the overall chain where we run these two chains in sequence.
overall_chain = SimpleSequentialChain(chains= , verbose=True)

In [None]:
review = overall_chain.run("Portable iced coffee maker")

### Router chain

A <font color='blue' face="Courier New" size="+2">RouterChain</font> dynamically selects the next chain to use for a given input.
This feature uses the <font color='blue' face="Courier New" size="+2">MultiPromptChain</font> to select then answer with the best-suited prompt to the question.

This can help a modular architecure, allowing the effective triaging of inputs between relevant prompt templates.

In [None]:
from langchain.chains.router import MultiPromptChain

korean_template = """
You are an expert in korean history and culture.
Here is a question:
{input}
"""

spanish_template = """
You are an expert in spanish history and culture.
Here is a question:
{input}
"""

chinese_template = """
You are an expert in Chinese history and culture.
Here is a question:
{input}
"""

In [None]:
prompt_infos = [
    {
        "name": "korean",
        "description": "Good for answering questions about Korean history and culture",
        "prompt_template": korean_template,
    },
    {
        "name": "spanish",
        "description": "Good for answering questions about Spanish history and culture",
        "prompt_template": spanish_template,
    },
     {
        "name": "chinese",
        "description": "Good for answering questions about Chinese history and culture",
        "prompt_template": chinese_template,
    },
]

In [None]:
from langchain.chains.router.llm_router import LLMRouterChain,RouterOutputParser

llm = VertexAI(temperature=0)

destination_chains = {}
for p_info in prompt_infos:
    name = p_info["name"]
    prompt_template = p_info["prompt_template"]
    prompt = ChatPromptTemplate.from_template(template=prompt_template)
    chain = LLMChain(llm=llm, prompt=prompt)
    destination_chains[name] = chain

destinations = [f"{p['name']}: {p['description']}" for p in prompt_infos]
destinations_str = "\n".join(destinations)

In [None]:
default_prompt = ChatPromptTemplate.from_template("{input}")
default_chain = LLMChain(llm=llm, prompt=default_prompt)

In [None]:
# Thanks to Deeplearning.ai for this template and for the
# Langchain short course at deeplearning.ai/short-courses/.

MULTI_PROMPT_ROUTER_TEMPLATE = """Given a raw text input to a \
language model select the model prompt best suited for the input. \
You will be given the names of the available prompts and a \
description of what the prompt is best suited for. \
You may also revise the original input if you think that revising\
it will ultimately lead to a better response from the language model.

<< FORMATTING >>
Return a markdown code snippet with a JSON object formatted to look like:
```json
{{{{
    "destination": string \ name of the prompt to use or "DEFAULT"
    "next_inputs": string \ a potentially modified version of the original input
}}}}
```

REMEMBER: "destination" MUST be one of the candidate prompt \
names specified below OR it can be "DEFAULT" if the input is not\
well suited for any of the candidate prompts.
REMEMBER: "next_inputs" can just be the original input \
if you don't think any modifications are needed.

<< CANDIDATE PROMPTS >>
{destinations}

<< INPUT >>
{{input}}

<< OUTPUT (remember to include the ```json)>>"""

In [None]:
router_template = MULTI_PROMPT_ROUTER_TEMPLATE.format(
    destinations=destinations_str
)
router_prompt = PromptTemplate(
    template=router_template,
    input_variables=["input"],
    output_parser=RouterOutputParser(),
)

router_chain = LLMRouterChain.from_llm(llm, router_prompt)

In [None]:
chain = MultiPromptChain(router_chain=router_chain,
                         destination_chains=destination_chains,
                         default_chain=default_chain, verbose=True
                        )

Notice in the outputs the country of speciality is prefixed eg:
<font color='blue' face="Courier New" size="+2">chinese: {'input': ...</font>, denoting the routing to the correct expert.

In [None]:
chain.run("What was the Han Dynasty?")

In [None]:
chain.run("What are some of typical dishes in Catalonia?")

In [None]:
chain.run("How would I greet a friend's parents in Korean?")

In [None]:
chain.run("Summarize Don Quixote in a short paragraph")

In [None]:
# No specialist chain for carburetor advice; this
# will be handled as any other input by the foundational model
chain.run("How can I fix a carburetor?")

## Agents and vectorstores

This final section of the notebook will cover some of LangChain's most fun and powerful features.

Agents have access to tools such as JSON, Wikipedia, Web Search, GitHub or Pandas Dataframes, and can access their capabilities depending on user input.

See [here](https://python.langchain.com/docs/integrations/toolkits/) for a full list of agent toolkits.

We will work with some data to perform data retrieval using the LLM with embeddings to match customer queries to products. This is known as Retrieval Augmentated Generation, or RAG.

We will use the Wayfair [WANDS](https://www.aboutwayfair.com/careers/tech-blog/wayfair-releases-wands-the-largest-and-richest-publicly-available-dataset-for-e-commerce-product-search-relevance) dataset of more than 42,000 products. Here are the steps:

* Download the data into a pandas dataframe and take a smaller 1,000-row sample set

* Merge then generate embeddings for the product titles and descriptions

* Prompt an LLM to retrieve details and relevant documents related to queries.

<img src="https://assets.wfcdn.com/im/01139917/resize-h800-w800%5Ecompr-r85/2315/231567967/Capricornus+3+Seater+Sofa.jpg" width="250"/> <img src="https://assets.wfcdn.com/im/07725066/resize-h800-w800%5Ecompr-r85/1584/158440119/Vancasso+BOMOOTIUR+Stoneware+Dinnerware+-+Set+of+18.jpg" width="250"/>


In [None]:
!wget -q https://raw.githubusercontent.com/wayfair/WANDS/main/dataset/product.csv

In [None]:
import pandas as pd
product_df = pd.read_csv("product.csv", sep='\t')

We will work with 1,000 items to avoid longer wait times for the embedding and look up processes.

In [None]:
product_df = product_df[:2000].dropna()

In [None]:
product_df.head(3)

In [None]:
len(product_df)

In [None]:
# Reduce the df to columns of interest
product_df = product_df.filter(["product_id", "product_name", "product_description", "average_rating"], axis=1)

In [None]:
product_df.head(3)

### Import and initialize pandas dataframe agent

These tools use the <font color='blue' face="Courier New" size="+2">langchain-experimental</font> pip package installed at the start of the notebook.

### Pandas agent

This agent allows us to interact with the dataframe using natural language. LangChain shows us the pandas queries it is composing to answer the questions.

In [None]:
from langchain_experimental.agents.agent_toolkits import create_pandas_dataframe_agent
from langchain.agents.agent_types import AgentType

agent = create_pandas_dataframe_agent(VertexAI(temperature=0), product_df, verbose=True, allow_dangerous_code=True)

In [None]:
agent.run("how many rows are there?")

In [None]:
agent.run("How many products have a rating of > 4?")

### CSV agent

We can also work directly on a .csv file

In [None]:
pd.DataFrame.to_csv(product_df, "data.csv")

In [None]:
from langchain_experimental.agents.agent_toolkits import create_csv_agent

agent = create_csv_agent(
    VertexAI(temperature=0),
    "data.csv",
    verbose=True,
    agent_type=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
    allow_dangerous_code=True,
)

In [None]:
agent.run("How many rows are there?")

In [None]:
agent.run("Do any product descriptions mention wood? Output them as JSON")

In [None]:
agent.run("What is the square root of all ratings for product names featuring sofas")

### Vector stores

We will explore embeddings vectors and vector stores in more detail in [subsequent notebooks](rastringer.io.github.com/promptcraft). Let's see what's possible by concatenating our <font color='blue' face="Courier New" size="+2">product_title</font> and <font color='blue' face="Courier New" size="+2">product_description</font> columns and creating a text file from the result. We can then create embeddings and perform various retrieval and Q&A tasks.

We will use the open source [Chroma](https://docs.trychroma.com/) vector store.

In [None]:
from langchain.embeddings.sentence_transformer import SentenceTransformerEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.document_loaders import TextLoader

We will embed a <font color='blue' face="Courier New" size="+2">text_data</font> column, which will be a concatenation of <font color='blue' face="Courier New" size="+2">product_name</font> and <font color='blue' face="Courier New" size="+2">product_description</font>, since both columns provide useful contextual information.

In [None]:
product_df['text_data'] = product_df['product_name'] + " " + product_df['product_description']

In [None]:
product_df["text_data"]

In [None]:
# Save the "text_data" column to a text file
text_file_path = "combined_text_data.txt"
product_df['text_data'].to_csv(text_file_path, sep='\t', index=False, header=False)


In [None]:
# load the document and split it into chunks
loader = TextLoader("combined_text_data.txt")
documents = loader.load()

### Text splitter

Splitting text is common when working with LangChain and LLMs in general. This practice means we can feed large amounts of data to LLMs for parsing or embedding in chunks, or batches.

Ideally, we want to do so in a way that keeps meaningful chunks together. We will use the default recommended <font color='blue' face="Courier New" size="+2">RecursiveCharacterTextSplitter</font>. We specify a <font color='blue' face="Courier New" size="+2">chunk_size</font> and <font color='blue' face="Courier New" size="+2">chunk_overlap</font> to set an upper limit on the size and overlap between the splits / chunks.

In [None]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 1500,
    chunk_overlap = 150
)

docs = text_splitter.split_documents(documents)

In [None]:
len(docs)

In [None]:
from langchain.vectorstores import Chroma

# Clear any previous vector store
!rm -rf ./docs/chroma

In [None]:
# Takes ~3 mins to run
embedding_function = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")
db = Chroma.from_documents(docs, embedding_function)

In [None]:
query = "Is there a slow cooker?"
docs = db.similarity_search(query, 2)

In [None]:
docs[0]

In [None]:
query = "Recommend a durable door mat"
docs = db.similarity_search(query, 2)

In [None]:
docs[0]

### Retrieval

A <font color='blue' face="Courier New" size="+2">Retriever</font> is a method for answering questions based on information in an index.

Here, we use <font color='blue' face="Courier New" size="+2">RetrievalQA</font> this ability with a question and answering chain.

In [None]:
from langchain.chains import RetrievalQA

llm = VertexAI(
    model_name="text-bison@001",
    max_output_tokens=1024,
    temperature=0.1,
    top_p=0.8,
    top_k=40,
    verbose=True,
)

qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=db.as_retriever()
)

### Prompt

In [None]:
from langchain.prompts import PromptTemplate

# Build prompt
template = """Use the following pieces of context to answer the question at the end. \
If you don't know the answer, just say that you don't know, \
don't try to make up an answer. Use three sentences maximum. \
{context}
Question: {question}
Helpful Answer:"""
QA_CHAIN_PROMPT = PromptTemplate(input_variables=["context", "question"],template=template,)


In [None]:
# Run chain
qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=db.as_retriever(),
    return_source_documents=True,
    chain_type_kwargs={"prompt": QA_CHAIN_PROMPT}
)

In [None]:
question = "Can you recommend comfortable bed sheets?"
result = qa_chain({"query": question})
result["result"]

### Try to have it answer this question: "How about a Persian-style rug for my living room."
<br>
<details><summary>Click for <b>code</b></summary>
<p>

```python
question = "How about a Persian-style rug for my living room."
result = qa_chain({"query": question})
result["result"]
```
</p>
</details>


## Summary

In this whirlwind tour of some of LangChain's features, we covered:

* Memory
* Chains
* Agents
* Vector stores

LangChain is a fast-evolving project. To explore more features and keep up-to-date with developments, please see the [website](https://www.langchain.com/) or [Python documentation](https://python.langchain.com/docs/get_started/introduction).

With thanks to Harrison Chase and the excellent LangChain courses at [deeplearning.ai](https://deeplearning.ai/short-courses)