<a href="https://colab.research.google.com/github/svnnynior/introduction-to-langchain-workshop/blob/main/th_pycon_langchain.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction to Langchain

LangChain is a framework for developing applications powered by language models

- GitHub: https://github.com/langchain-ai/langchain
- Docs: https://python.langchain.com/docs/get_started

## Outlines
1. Main components -- Model, Prompt Template, Output Parser
2. Chains
3. Memory
4. Retriever (RAG)
5. Evaluation


## 0. Installation

In [1]:
!pip install langchain==0.0.349

[0mCollecting langchain==0.0.349
  Using cached langchain-0.0.349-py3-none-any.whl (808 kB)
[0mInstalling collected packages: langchain
  Attempting uninstall: langchain
    Found existing installation: langchain 0.0.350
    Uninstalling langchain-0.0.350:
      Successfully uninstalled langchain-0.0.350
Successfully installed langchain-0.0.349


## 1. Main Components

### 1.1. Model

#### Model - Hugging Face

In [71]:
!pip install huggingface_hub==0.19.4

[0m

In [76]:
from langchain.llms import HuggingFaceHub
from google.colab import userdata

os.environ["HUGGINGFACEHUB_API_TOKEN"] = userdata.get('HUGGINGFACEHUB_API_TOKEN')

repo_id = "google/flan-t5-xxl"  # See https://huggingface.co/models?pipeline_tag=text-generation&sort=downloads for some other options


llm = HuggingFaceHub(
    repo_id=repo_id, model_kwargs={"temperature": 0.5, "max_length": 64}
)



In [77]:
text = "What would be a good company name for a company that makes colorful socks?"
print(llm(text))

Socks for Less


#### Model - Local Llama

In [None]:
!CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python

Collecting llama-cpp-python
  Downloading llama_cpp_python-0.2.20.tar.gz (8.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m8.7/8.7 MB[0m [31m59.2 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Installing backend dependencies ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Building wheels for collected packages: llama-cpp-python
  Building wheel for llama-cpp-python (pyproject.toml) ... [?25l[?25hdone
  Created wheel for llama-cpp-python: filename=llama_cpp_python-0.2.20-cp310-cp310-manylinux_2_35_x86_64.whl size=1987633 sha256=c3ea7b09418d30bbe545e16384b502a46551afd36b6643e1c15e6219ad10c9a2
  Stored in directory: /root/.cache/pip/wheels/ef/f2/d2/0becb03047a348d7bd9a5b91ec88f4654d6fa7d67ea4e84d43
Successfully built llama-cpp-python
Installing collected packages: llama-cpp-python
Successfully installed llama-cpp-python-0.2.20


In [None]:
!wget https://gpt4all.io/models/gguf/mistral-7b-instruct-v0.1.Q4_0.gguf

--2023-12-11 04:50:42--  https://gpt4all.io/models/gguf/mistral-7b-instruct-v0.1.Q4_0.gguf
Resolving gpt4all.io (gpt4all.io)... 104.26.0.159, 104.26.1.159, 172.67.71.169, ...
Connecting to gpt4all.io (gpt4all.io)|104.26.0.159|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4108916384 (3.8G)
Saving to: ‘mistral-7b-instruct-v0.1.Q4_0.gguf’


In [None]:
from langchain.llms import LlamaCpp

llm = LlamaCpp(
    model_path="/content/mistral-7b-instruct-v0.1.Q4_0.gguf",
    n_gpu_layers=200,
    verbose=True,
)

AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | 


In [None]:
llm("The first man on the moon was ... Let's think step by step")

'. We know that Neil Armstrong and Edwin "Buzz" Aldrin were the first two humans to walk on the moon as part of the Apollo 11 mission in July 1969. So, if we go back a generation earlier, to the Apollo 8 mission in December 1965, the first man in space was Frank R. Grissom. The first woman in space was Valentina Tereshkova in June 1963. And the first American woman in space was Sally Ride in July 1983. Now we\'re getting close! If we go back to the early days of human flight, the Wright brothers made their first successful powered flight in December 1903. But that wasn\'t exactly on the moon or in space. So let\'s go further back.\n\nThe first man-made object to reach Earth from another celestial body was a meteorite, which is essentially space debris left behind by an asteroid or comet. Meteorites have been falling to Earth for billions of years. \n\nSo, the answer is: The first man on the moon was not the first human in space'

#### Model - OpenAI

In [78]:
!pip install openai==1.3.9

[0m

In [79]:
import os
from langchain.chat_models import ChatOpenAI
from langchain.llms import OpenAI
from google.colab import userdata

# Get it from https://platform.openai.com/account/api-keys
os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')

##### LLM Model

TODO: Description

In [80]:
from langchain.llms import OpenAI
from google.colab import userdata

text = "What would be a good company name for a company that makes colorful socks?"

llm = OpenAI(temperature=0)
print(f"Using LLM model: {llm.model_name}")

Using LLM model: text-davinci-003


In [81]:
text = "What would be a good company name for a company that makes colorful socks?"

print(llm(text))



Rainbow Sock Co.


##### Chat Model

TODO: **Description**

In [82]:
from langchain.schema import HumanMessage

chat = ChatOpenAI()
print(f"Using chat model: {chat.model_name}")

Using chat model: gpt-3.5-turbo


In [83]:
messages = [HumanMessage(content="What would be a good company name for a company that makes colorful socks?")]

chat(messages)

AIMessage(content='RainbowThreads')

#### Bonus: Using LLM as a question-answering model

In [8]:
text = """Question: What would be a good company name for a company that makes colorful socks?

Let's think step by step.

Answer: """

In [9]:
print(llm(text))


1. Brainstorm words related to socks: cozy, comfy, colorful, hosiery, foot, toes, etc.
2. Brainstorm words related to color: vivid, bright, rainbow, spectrum, etc.
3. Combine the two ideas: Vivid Toes, Bright Hosiery, Rainbow Socks, Colorful Comfort, etc.


### 1.2. Prompt Templates

TODO: Description

In [10]:
from langchain.prompts import PromptTemplate

template = """Question: {question}

Let's think step by step, and then summarize the final answer in this format:

Answer: """

prompt = PromptTemplate(template=template, input_variables=["question"])

In [11]:
prompt_text = prompt.format(question="What is a good name for a company that makes video games")
print(prompt_text)

Question: What is a good name for a company that makes video games

Let's think step by step, and then summarize the final answer in this format:

Answer: 


In [12]:
print(llm(prompt_text))


Step 1: Brainstorm words related to video games. Examples: Play, Fun, Adventure, Challenge, Create, Digital, Entertainment.

Step 2: Combine words to create a unique name. Examples: Playful Entertainment, Digital Challenge, Create Adventure.

Step 3: Choose the best name.

Answer: Playful Entertainment.


### 1.3. Output Parser

TODO: Description

In [13]:
customer_review = """\
This leaf blower is pretty amazing.  It has four settings:\
candle blower, gentle breeze, windy city, and tornado. \
It arrived in two days, just in time for my wife's \
anniversary present. \
I think my wife liked it so much she was speechless. \
So far I've been the only one using it, and I've been \
using it every other morning to clear the leaves on our lawn. \
It's slightly more expensive than the other leaf blowers \
out there, but I think it's worth it for the extra features.
"""

In [14]:
review_template = """\
For the following text, extract the following information:

gift: Was the item purchased as a gift for someone else? \
Answer True if yes, False if not or unknown.

delivery_days: How many days did it take for the product \
to arrive? If this information is not found, output -1.

price_value: Extract any sentences about the value or price,\
and output them as a comma separated Python list.

Format the output as JSON with the following keys:
gift
delivery_days
price_value

text: {text}
"""

In [15]:
prompt_template = PromptTemplate(template=review_template, input_variables=["text"],)

In [16]:
answer = llm(prompt_template.format(text=customer_review))
print(answer)



{
  "gift": true,
  "delivery_days": 2,
  "price_value": ["slightly more expensive than the other leaf blowers out there", "worth it for the extra features"]
}


In [17]:
## Throw an error because answer is just a string - not dict
answer['gift']

TypeError: ignored

#### Using output parsers

In [18]:
from langchain.output_parsers import ResponseSchema
from langchain.output_parsers import StructuredOutputParser

In [19]:
gift_schema = ResponseSchema(name="gift",
                             description="Was the item purchased\
                             as a gift for someone else? \
                             Answer True if yes,\
                             False if not or unknown.")
delivery_days_schema = ResponseSchema(name="delivery_days",
                                      description="How many days\
                                      did it take for the product\
                                      to arrive? If this \
                                      information is not found,\
                                      output -1.")
price_value_schema = ResponseSchema(name="price_value",
                                    description="Extract any\
                                    sentences about the value or \
                                    price, and output them as a \
                                    comma separated Python list.")

response_schemas = [gift_schema,
                    delivery_days_schema,
                    price_value_schema]

output_parser = StructuredOutputParser.from_response_schemas(response_schemas)

In [20]:
format_instructions = output_parser.get_format_instructions()
print(format_instructions)

The output should be a markdown code snippet formatted in the following schema, including the leading and trailing "```json" and "```":

```json
{
	"gift": string  // Was the item purchased                             as a gift for someone else?                              Answer True if yes,                             False if not or unknown.
	"delivery_days": string  // How many days                                      did it take for the product                                      to arrive? If this                                       information is not found,                                      output -1.
	"price_value": string  // Extract any                                    sentences about the value or                                     price, and output them as a                                     comma separated Python list.
}
```


In [21]:
review_template_with_instructions = """\
For the following text, extract the following information:

{format_instructions}

text: {text}
"""

In [22]:
prompt_template_with_output = PromptTemplate(template=review_template_with_instructions, input_variables=["text"], partial_variables={"format_instructions": format_instructions})

In [23]:
prompt_and_model = prompt_template_with_output | llm
output = prompt_and_model.invoke({"text": customer_review})

In [24]:
# Note: Sometimes, if the model cannot extract the information, this can throw an error
# -- which is good because we want to know early that the model cannot achieve the tasks we want it to

result = output_parser.invoke(output)
print(result)

{'gift': 'True', 'delivery_days': '2', 'price_value': "['slightly more expensive than the other leaf blowers out there', 'worth it for the extra features']"}


In [25]:
result["gift"]

'True'

### Bonus: Chaining Stuffs

Because all of the objects implements the `Runnable` interface. It can be chained together.

More info: https://python.langchain.com/docs/expression_language/why

In [26]:
output_chain = prompt_template_with_output | llm | output_parser

In [27]:
output_chain.invoke({"text": customer_review})

{'gift': 'True',
 'delivery_days': '2',
 'price_value': "['slightly more expensive than the other leaf blowers out there', 'worth it for the extra features']"}

## 2. Chains

TODO: Description

### Old way

TODO: Description (LangChain provides a Chain object, that we can use)

In [28]:
from langchain import LLMChain

llm_chain = LLMChain(prompt=prompt_template_with_output, llm=llm, output_parser=output_parser)

llm_chain.run(text=customer_review)

{'gift': 'True',
 'delivery_days': '2',
 'price_value': "['slightly more expensive than the other leaf blowers out there', 'worth it for the extra features']"}

### New way

TODO: Description (LCEL, which allows every method to be chained by piping)

In [29]:
from langchain_core.runnables import RunnablePassthrough

chain = (
 { "text": RunnablePassthrough() }
 | prompt_template_with_output
 | llm
 | output_parser
)
chain.invoke(customer_review)

{'gift': 'True',
 'delivery_days': '2',
 'price_value': "['slightly more expensive than the other leaf blowers out there', 'worth it for the extra features']"}

## 3. Memory

TODO: Description

### Manipulating the memory

In [30]:
from langchain.memory import ConversationBufferMemory

memory = ConversationBufferMemory()
memory.chat_memory.add_user_message("hi!")
memory.chat_memory.add_ai_message("what's up?")

In [31]:
print(memory.load_memory_variables({})['history'])

Human: hi!
AI: what's up?


In [32]:
memory.save_context({"input": "how yo doin'"}, {"output": "fine. thank you!"})

In [33]:
print(memory.load_memory_variables({})['history'])

Human: hi!
AI: what's up?
Human: how yo doin'
AI: fine. thank you!


In [34]:
template = """You are a nice chatbot having a conversation with a human.

New human question: {question}
Response:"""
prompt = PromptTemplate.from_template(template)

#### Without memory

In [35]:
no_memory_chain = LLMChain(
    llm=llm,
    prompt=prompt,
    verbose=False,
)

In [36]:
no_memory_chain({"question": "Hello, My name is Junior"})

{'question': 'Hello, My name is Junior',
 'text': ' Hi Junior, nice to meet you! How can I help you today?'}

In [37]:
no_memory_chain({"question": "I have just introduced myself. What is my name?"})

{'question': 'I have just introduced myself. What is my name?',
 'text': ' Nice to meet you! What is your name?'}

#### With memory - LLMChain

In [39]:
# Notice that "chat_history" is present in the prompt template
template = """You are a nice chatbot having a conversation with a human.

Previous conversation:
{chat_history}

New human question: {question}
Response:"""
prompt = PromptTemplate.from_template(template)
# Notice that we need to align the `memory_key`
memory = ConversationBufferMemory(memory_key="chat_history")
with_memory_chain = LLMChain(
    llm=llm,
    prompt=prompt,
    verbose=False,
    memory=memory ## here - we are giving it a memory
)

In [40]:
with_memory_chain({"question": "ay yo!"})

{'question': 'ay yo!',
 'chat_history': '',
 'text': ' Hi there! How can I help you?'}

In [41]:
memory.save_context({"input": "how yo doin' My name is Junior. Nice to meet you."}, {"output": "Nice to meet you, Junior!. I am fine. Thank you!"})

In [42]:
with_memory_chain({"question": "what is my name again?"})

{'question': 'what is my name again?',
 'chat_history': "Human: ay yo!\nAI:  Hi there! How can I help you?\nHuman: how yo doin' My name is Junior. Nice to meet you.\nAI: Nice to meet you, Junior!. I am fine. Thank you!",
 'text': ' Your name is Junior.'}

#### With memory - ConversationChain

In [43]:
from langchain.chains import ConversationChain

conversation = ConversationChain(
    llm=llm,
    verbose=False,
)

In [44]:
conversation.predict(input="how yo doin' My name is Junior. Nice to meet you.")

" Hi Junior, nice to meet you too! I'm doing great, thank you for asking. How about you?"

In [45]:
conversation.predict(input="what is my name again?")

' Your name is Junior. Is there anything else I can help you with?'

### Bonus: Different types of Memory

More info: https://python.langchain.com/docs/modules/memory/types/

In [49]:
from langchain.chains import ConversationChain
from langchain.memory import ConversationSummaryMemory

conversation = ConversationChain(
    llm=llm,
    verbose=True,
    memory=ConversationSummaryMemory(llm=llm) # Note that the ConversationSummaryMemory will need an LLM as an input to do the summarization
)

In [50]:
# Notice the prompt that ConversationChain format for us

conversation.predict(input="how yo doin' My name is Junior. Nice to meet you.")



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:

Human: how yo doin' My name is Junior. Nice to meet you.
AI:[0m

[1m> Finished chain.[0m


" Hi Junior, nice to meet you too! I'm doing great, thank you for asking. How about you?"

In [51]:
conversation.predict(input="I like the color red. My favorite subject is Math.")



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:

The human introduces themselves as Junior and the AI responds, introducing itself and asking how Junior is doing.
Human: I like the color red. My favorite subject is Math.
AI:[0m

[1m> Finished chain.[0m


" Hi Junior, I'm AI. Nice to meet you. How are you doing? I like the color red too. Math is a great subject. What do you like about it?"

In [52]:
conversation.predict(input="Can you guess my favorite subject?")



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:

The human introduces themselves as Junior and the AI responds, introducing itself and asking how Junior is doing. Junior likes the color red and their favorite subject is Math, which the AI also likes. The AI then asks Junior what they like about Math.
Human: Can you guess my favorite subject?
AI:[0m

[1m> Finished chain.[0m


' Yes, I believe your favorite subject is Math. Is that correct?'

## 4. Retriever (RAG)

More info: https://python.langchain.com/docs/modules/data_connection/

In [53]:
import requests

url = "https://raw.githubusercontent.com/hwchase17/chat-your-data/master/state_of_the_union.txt"
res = requests.get(url)
with open("state_of_the_union.txt", "w") as f:
  f.write(res.text)

In [54]:
# Document Loader
from langchain.document_loaders import TextLoader
loader = TextLoader('./state_of_the_union.txt')
documents = loader.load()

In [55]:
# Text Splitter
from langchain.text_splitter import CharacterTextSplitter
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)

#### Hugging Face Embeddings

In [None]:
%pip install sentence_transformers faiss-cpu

In [None]:
from langchain.embeddings import HuggingFaceEmbeddings
embeddings = HuggingFaceEmbeddings()

In [None]:
from langchain.vectorstores import FAISS

db = FAISS.from_documents(docs, embeddings)

In [None]:
query = "What did the president say about Ketanji Brown Jackson"
docs = db.similarity_search(query)
print(docs[0].page_content)

Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. 

Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. 

One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. 

And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.


#### OpenAI Embeddings

In [56]:
!pip install chromadb==0.4.19 tiktoken==0.5.2

[0m

In [57]:
from langchain.embeddings import OpenAIEmbeddings

embeddings = OpenAIEmbeddings()

In [60]:
from langchain.vectorstores import Chroma

db = Chroma.from_documents(docs, embeddings)

In [61]:
query = "What did the president say about Ketanji Brown Jackson"
docs = db.similarity_search(query)
print(docs[0].page_content)

Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. 

Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. 

One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. 

And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.


#### Retrieval

In [62]:
retriever = db.as_retriever()

In [63]:
template = """Answer the question based only on the following context:

{context}

Question: {question}

Let's think step by step.

Answer:
"""

prompt = PromptTemplate.from_template(template)

def format_docs(docs):
    return "\n\n".join([d.page_content for d in docs])


chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm

)

In [64]:
answer = chain.invoke("Who is Ketanji Brown Jackson?")
print(answer)

Ketanji Brown Jackson is a Circuit Court of Appeals Judge who was nominated by the President 4 days ago to serve on the United States Supreme Court. She is a former top litigator in private practice, a former federal public defender, and from a family of public school educators and police officers. She has received a broad range of support from the Fraternal Order of Police to former judges appointed by Democrats and Republicans.


In [67]:
import langchain

langchain.debug = True
chain.invoke("What does the speech say about Russia?")
langchain.debug = False

[32;1m[1;3m[chain/start][0m [1m[1:chain:RunnableSequence] Entering Chain run with input:
[0m{
  "input": "What does the speech say about Russia?"
}
[32;1m[1;3m[chain/start][0m [1m[1:chain:RunnableSequence > 2:chain:RunnableParallel] Entering Chain run with input:
[0m{
  "input": "What does the speech say about Russia?"
}
[32;1m[1;3m[chain/start][0m [1m[1:chain:RunnableSequence > 2:chain:RunnableParallel > 3:chain:RunnableSequence] Entering Chain run with input:
[0m{
  "input": "What does the speech say about Russia?"
}
[32;1m[1;3m[chain/start][0m [1m[1:chain:RunnableSequence > 2:chain:RunnableParallel > 4:chain:RunnablePassthrough] Entering Chain run with input:
[0m{
  "input": "What does the speech say about Russia?"
}
[36;1m[1;3m[chain/end][0m [1m[1:chain:RunnableSequence > 2:chain:RunnableParallel > 4:chain:RunnablePassthrough] [1ms] Exiting Chain run with output:
[0m{
  "output": "What does the speech say about Russia?"
}
[32;1m[1;3m[chain/start][0m [1m

## 5. Evaluation

More info: https://python.langchain.com/docs/guides/evaluation/


### Generating test datasets & evaluate its accuracy

In [68]:
!pip install langchain[docarray]==0.0.349

Collecting docarray[hnswlib]<0.33.0,>=0.32.0 (from langchain[docarray]==0.0.349)
  Downloading docarray-0.32.1-py3-none-any.whl (215 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m215.3/215.3 kB[0m [31m1.9 MB/s[0m eta [36m0:00:00[0m
Collecting orjson>=3.8.2 (from docarray[hnswlib]<0.33.0,>=0.32.0->langchain[docarray]==0.0.349)
  Downloading orjson-3.9.10-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (138 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m138.7/138.7 kB[0m [31m11.4 MB/s[0m eta [36m0:00:00[0m
Collecting types-requests>=2.28.11.6 (from docarray[hnswlib]<0.33.0,>=0.32.0->langchain[docarray]==0.0.349)
  Downloading types_requests-2.31.0.10-py3-none-any.whl (14 kB)
Collecting hnswlib>=0.6.2 (from docarray[hnswlib]<0.33.0,>=0.32.0->langchain[docarray]==0.0.349)
  Downloading hnswlib-0.8.0.tar.gz (36 kB)
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  

In [84]:
import requests

url = "https://raw.githubusercontent.com/Ryota-Kawamura/LangChain-for-LLM-Application-Development/main/OutdoorClothingCatalog_1000.csv"
res = requests.get(url)
with open("OutdoorClothingCatalog_1000.csv", "w") as f:
  f.write(res.text)

In [85]:
openai_llm = OpenAI(temperature=0)
hf_llm = HuggingFaceHub(
    repo_id=repo_id, model_kwargs={"temperature": 0.5, "max_length": 64}
)



In [86]:
from langchain.document_loaders import CSVLoader
from langchain.indexes import VectorstoreIndexCreator
from langchain.vectorstores import DocArrayInMemorySearch
from langchain.chains import RetrievalQA

file = "./OutdoorClothingCatalog_1000.csv"
loader = CSVLoader(file_path=file)
data = loader.load()

index = VectorstoreIndexCreator(
    vectorstore_cls=DocArrayInMemorySearch
).from_loaders([loader])

In [87]:
data[10]

Document(page_content=": 10\nname: Cozy Comfort Pullover Set, Stripe\ndescription: Perfect for lounging, this striped knit set lives up to its name. We used ultrasoft fabric and an easy design that's as comfortable at bedtime as it is when we have to make a quick run out.\n\nSize & Fit\n- Pants are Favorite Fit: Sits lower on the waist.\n- Relaxed Fit: Our most generous fit sits farthest from the body.\n\nFabric & Care\n- In the softest blend of 63% polyester, 35% rayon and 2% spandex.\n\nAdditional Features\n- Relaxed fit top with raglan sleeves and rounded hem.\n- Pull-on pants have a wide elastic waistband and drawstring, side pockets and a modern slim leg.\n\nImported.", metadata={'source': './OutdoorClothingCatalog_1000.csv', 'row': 10})

In [88]:
data[11]

Document(page_content=': 11\nname: Ultra-Lofty 850 Stretch Down Hooded Jacket\ndescription: This technical stretch down jacket from our DownTek collection is sure to keep you warm and comfortable with its full-stretch construction providing exceptional range of motion. With a slightly fitted style that falls at the hip and best with a midweight layer, this jacket is suitable for light activity up to 20° and moderate activity up to -30°. The soft and durable 100% polyester shell offers complete windproof protection and is insulated with warm, lofty goose down. Other features include welded baffles for a no-stitch construction and excellent stretch, an adjustable hood, an interior media port and mesh stash pocket and a hem drawcord. Machine wash and dry. Imported.', metadata={'source': './OutdoorClothingCatalog_1000.csv', 'row': 11})

In [90]:
test_data_examples = [
    {
        "query": "Do the Cozy Comfort Pullover Set\
        have side pockets?",
        "answer": "Yes"
    },
    {
        "query": "What collection is the Ultra-Lofty \
        850 Stretch Down Hooded Jacket from?",
        "answer": "The DownTek collection"
    }
]

# But how can we automate the generation of these questions & answers ? (:thinking:)

In [91]:
from langchain.evaluation.qa import QAGenerateChain

example_gen_chain = QAGenerateChain.from_llm(ChatOpenAI())

In [98]:
generated_examples = example_gen_chain.apply_and_parse(
    [{"doc": t} for t in data[:5]]
)



In [99]:
generated_examples[0]

{'qa_pairs': {'query': "What is the weight of one pair of Women's Campside Oxfords?",
  'answer': "The approximate weight of one pair of Women's Campside Oxfords is 1 lb. 1 oz."}}

In [94]:
data[0]

Document(page_content=": 0\nname: Women's Campside Oxfords\ndescription: This ultracomfortable lace-to-toe Oxford boasts a super-soft canvas, thick cushioning, and quality construction for a broken-in feel from the first time you put them on. \n\nSize & Fit: Order regular shoe size. For half sizes not offered, order up to next whole size. \n\nSpecs: Approx. weight: 1 lb.1 oz. per pair. \n\nConstruction: Soft canvas material for a broken-in feel and look. Comfortable EVA innersole with Cleansport NXT® antimicrobial odor control. Vintage hunt, fish and camping motif on innersole. Moderate arch contour of innersole. EVA foam midsole for cushioning and support. Chain-tread-inspired molded rubber outsole with modified chain-tread pattern. Imported. \n\nQuestions? Please contact us for any inquiries.", metadata={'source': './OutdoorClothingCatalog_1000.csv', 'row': 0})

In [95]:
# Define the model under test

qa = RetrievalQA.from_chain_type(
    llm=hf_llm,
    chain_type="stuff",
    retriever=index.vectorstore.as_retriever(),
    verbose=True,
    chain_type_kwargs = {
        "document_separator": "<<<<>>>>>"
    }
)

In [100]:
qa.run(test_data_examples[0]["query"])



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


'yes'

In [101]:
test_data = []
for example in generated_examples:
  test_data.append(example['qa_pairs'])

test_data

[{'query': "What is the weight of one pair of Women's Campside Oxfords?",
  'answer': "The approximate weight of one pair of Women's Campside Oxfords is 1 lb. 1 oz."},
 {'query': 'What are the dimensions of the small and medium sizes of the Recycled Waterhog Dog Mat, Chevron Weave?',
  'answer': 'The small size of the Recycled Waterhog Dog Mat, Chevron Weave has dimensions of 18" x 28", while the medium size has dimensions of 22.5" x 34.5".'},
 {'query': "What are some features of the Infant and Toddler Girls' Coastal Chill Swimsuit, Two-Piece?",
  'answer': "Some features of the Infant and Toddler Girls' Coastal Chill Swimsuit, Two-Piece include bright colors, ruffles, exclusive whimsical prints, four-way-stretch and chlorine-resistant fabric, UPF 50+ rated fabric for sun protection, crossover no-slip straps, fully lined bottom for a secure fit and maximum coverage, and it can be machine washed and line dried for best results."},
 {'query': "What is the composition of the swimtop's fa

In [102]:
predictions = qa.apply(test_data)
predictions



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[{'query': "What is the weight of one pair of Women's Campside Oxfords?",
  'answer': "The approximate weight of one pair of Women's Campside Oxfords is 1 lb. 1 oz.",
  'result': '1 lb.1 oz.'},
 {'query': 'What are the dimensions of the small and medium sizes of the Recycled Waterhog Dog Mat, Chevron Weave?',
  'answer': 'The small size of the Recycled Waterhog Dog Mat, Chevron Weave has dimensions of 18" x 28", while the medium size has dimensions of 22.5" x 34.5".',
  'result': '18" x 28" and 22.5" x 34.5"'},
 {'query': "What are some features of the Infant and Toddler Girls' Coastal Chill Swimsuit, Two-Piece?",
  'answer': "Some features of the Infant and Toddler Girls' Coastal Chill Swimsuit, Two-Piece include bright colors, ruffles, exclusive whimsical prints, four-way-stretch and chlorine-resistant fabric, UPF 50+ rated fabric for sun protection, crossover no-slip straps, fully lined bottom for a secure fit and maximum coverage, and it can be machine washed and line dried for bes

`answer` is the actual answer expected from the question

`result` is what model under test predicts

Notice that the `answer` and the `result` is not an exact 1-to-1 match, BUT the content could be saying the same thing.

This is why we need **ANOTHER** LLM model to help evaluate whether the answer and the predicted result is saying the same thing.

In [103]:
from langchain.evaluation.qa import QAEvalChain

eval_chain = QAEvalChain.from_llm(openai_llm)

In [104]:
graded_outputs = eval_chain.evaluate(test_data, predictions)

In [105]:
graded_outputs[0]

{'results': ' CORRECT'}

In [106]:
for i, eg in enumerate(generated_examples):
    print(f"Example {i}:")
    print("Question: " + predictions[i]['query'])
    print("Real Answer: " + predictions[i]['answer'])
    print("Predicted Answer: " + predictions[i]['result'])
    print('Verdict: ' + graded_outputs[i]['results'])
    print()

Example 0:
Question: What is the weight of one pair of Women's Campside Oxfords?
Real Answer: The approximate weight of one pair of Women's Campside Oxfords is 1 lb. 1 oz.
Predicted Answer: 1 lb.1 oz.
Verdict:  CORRECT

Example 1:
Question: What are the dimensions of the small and medium sizes of the Recycled Waterhog Dog Mat, Chevron Weave?
Real Answer: The small size of the Recycled Waterhog Dog Mat, Chevron Weave has dimensions of 18" x 28", while the medium size has dimensions of 22.5" x 34.5".
Predicted Answer: 18" x 28" and 22.5" x 34.5"
Verdict:  CORRECT

Example 2:
Question: What are some features of the Infant and Toddler Girls' Coastal Chill Swimsuit, Two-Piece?
Real Answer: Some features of the Infant and Toddler Girls' Coastal Chill Swimsuit, Two-Piece include bright colors, ruffles, exclusive whimsical prints, four-way-stretch and chlorine-resistant fabric, UPF 50+ rated fabric for sun protection, crossover no-slip straps, fully lined bottom for a secure fit and maximum co

### Bonus: Evaluator

In [None]:
from langchain.evaluation import load_evaluator
from langchain.evaluation import EvaluatorType

evaluator = load_evaluator(EvaluatorType.CRITERIA, criteria="conciseness")

eval_result = evaluator.evaluate_strings(
    prediction="What's 2+2? That's an elementary question. The answer you're looking for is that two and two is four.",
    input="What's 2+2?",
)

print(f'Evaluation value (Y/N): {eval_result["value"]}')
print(f'Evaluation score: {eval_result["score"]}')
print(f'Evaluation reasoning: {eval_result["reasoning"]}')

Evaluation value (Y/N): N
Evaluation score: 0
Evaluation reasoning: The criterion is conciseness, which means the submission should be brief and to the point. 

Looking at the submission, the answer to the question "What's 2+2?" is given as "The answer you're looking for is that two and two is four." However, before providing the answer, the respondent adds an unnecessary comment: "That's an elementary question." This comment does not contribute to answering the question and thus makes the response less concise.

Therefore, the submission does not meet the criterion of conciseness.

N


In [None]:
eval_result = evaluator.evaluate_strings(
    prediction="four.",
    input="What's 2+2?",
)

print(f'Evaluation value (Y/N): {eval_result["value"]}')
print(f'Evaluation score: {eval_result["score"]}')
print(f'Evaluation reasoning: {eval_result["reasoning"]}')

Evaluation value (Y/N): Y
Evaluation score: 1
Evaluation reasoning: The criterion is conciseness, which means the submission should be brief and to the point. 

Looking at the submission, the answer to the question "What's 2+2?" is given as "four." This is a direct and succinct response to the question. 

The submission does not include any unnecessary information or details, making it concise. 

Therefore, the submission meets the criterion of conciseness. 

Y


#### Different type of evaluator -- Labeled Criteria

In [None]:
evaluator = load_evaluator("labeled_criteria", criteria="correctness")

# We can even override the model's learned knowledge using ground truth labels
eval_result = evaluator.evaluate_strings(
    input="What is the capital of the US?",
    prediction="Bangkok",
    reference="The capital of the US is Washington D.C.",
)

print(f'Evaluation value (Y/N): {eval_result["value"]}')
print(f'Evaluation score: {eval_result["score"]}')
print(f'Evaluation reasoning: {eval_result["reasoning"]}')

Evaluation value (Y/N): N
Evaluation score: 0
Evaluation reasoning: The criterion for this task is the correctness of the submitted answer. The input asks for the capital of the US. The submitted answer is "Bangkok", which is incorrect as the capital of the US is Washington D.C., as stated in the reference. Therefore, the submission does not meet the criterion of correctness.

N


In [None]:
evaluator = load_evaluator("labeled_criteria", criteria="correctness")

# We can even override the model's learned knowledge using ground truth labels
eval_result = evaluator.evaluate_strings(
    input="What is the capital of the US?",
    prediction="Washington D.C.",
    reference="The capital of the US is Washington D.C.",
)

print(f'Evaluation value (Y/N): {eval_result["value"]}')
print(f'Evaluation score: {eval_result["score"]}')
print(f'Evaluation reasoning: {eval_result["reasoning"]}')

Evaluation value (Y/N): Y
Evaluation score: 1
Evaluation reasoning: The criterion for this task is the correctness of the submitted answer. This involves checking if the submission is accurate, factual, and directly answers the given input.

The input asks for the capital of the US. The submitted answer is Washington D.C.

Comparing this with the reference answer, which is also Washington D.C., it is clear that the submitted answer is correct. It is factual and accurate, as Washington D.C. is indeed the capital of the US.

Therefore, the submission meets the criterion.

Y
