# LangChain Use Cases

From the tutorial [gkamradt/langchain-tutorials](https://github.com/gkamradt/langchain-tutorials/blob/main/LangChain%20Cookbook%20Part%202%20-%20Use%20Cases.ipynb)

### Table of Contents
- [Summarization](#Summarization)
- [Q&A over documents](#qa-over-documents)
- Extraction
- Evaluation
- Querying tabular data
- Code understanding
- Interacting with APIs
- Chatbots
- Agents


There are also some wonderful projects & use cases: https://github.com/gkamradt/langchain-tutorials




### Preparation

For now, I would use `OPEN_AI_API` temporarily when learning LangChain.

In [1]:
import os
from dotenv import load_dotenv

load_dotenv()
 
# Set the OPENAI_API_KEY if not yet
# openai_api_key = os.getenv('OPENAI_API_KEY', 'YourAPIKeyIfNotSet')

openai_api_key = os.getenv('OPENAI_API_KEY')
# print(openai_api_key)

ModuleNotFoundError: No module named 'dotenv'

### Summarization

Summarize from a bunch of information with the help from LLMs, like texts (possibly be books, articles, documents in any fields, table, financial reports, user manuals, podcasts, online posts, etc.)

#### 1. Summaries of short text

Just simply prompt with instructions:


In [13]:
from langchain.llms import OpenAI
from langchain import PromptTemplate

# Init a model, specify model, pass in the keys
# text-davinci-003 is alreay by dedault actually
llm = OpenAI(
    temperature=0, model_name="text-davinci-003", openai_api_key=openai_api_key
)

# Create the template
template = """
%INSTRUCTIONS:
Please summarize the following piece of text.
Respond in a manner that a 5 year old would understand.

%TEXT:
{text}
"""

# Create a PromptTemplte object for later use, utilizing the defined template and var text
prompt = PromptTemplate(input_variables=["text"], template=template)

confusing_text = """
Shortly after graduating from Cambridge, he got a job as a naturalist on the ship H.M.S. Beagle, which was about to start on a scientific and cartographic survey of the South American coast. The journey started in December 1831 and was to last almost five years, during which time he amassed considerable documentation.
Darwin was particularly struck by the fauna of the south seas, notably by the tortoises he found on the Galapagos Islands, a group of Pacific Islands where nature seemed different from nature in other lands. The Galapagos tortoises, Darwin observed, differed from island to island, and this, he deduced, implied different forms of evolution, since the animals obviously came from the same origins. Darwin was also struck by the iguanas he found, and observed that those which lived in water had heads suitable for finding food among stones, whilst those that lived on land had a sharper profile, more fit for a herbivorous animal.
"""


In [14]:

print("===== Prompt Shows =====")

final_prompt=prompt.format(text=confusing_text)
print(final_prompt)

print("===== Prompt Ends =====")


===== Prompt Shows =====

%INSTRUCTIONS:
Please summarize the following piece of text.
Respond in a manner that a 5 year old would understand.

%TEXT:

Shortly after graduating from Cambridge, he got a job as a naturalist on the ship H.M.S. Beagle, which was about to start on a scientific and cartographic survey of the South American coast. The journey started in December 1831 and was to last almost five years, during which time he amassed considerable documentation.
Darwin was particularly struck by the fauna of the south seas, notably by the tortoises he found on the Galapagos Islands, a group of Pacific Islands where nature seemed different from nature in other lands. The Galapagos tortoises, Darwin observed, differed from island to island, and this, he deduced, implied different forms of evolution, since the animals obviously came from the same origins. Darwin was also struck by the iguanas he found, and observed that those which lived in water had heads suitable for finding food a

In [15]:
output=llm(final_prompt)
print(output)



Darwin was a scientist who went on a long trip on a boat called the H.M.S. Beagle. He went to the Galapagos Islands and saw lots of animals like tortoises and iguanas. He noticed that the animals were different from island to island, which made him think that they had changed over time. He also noticed that the iguanas had different shaped heads depending on where they lived.


#### 2. Summaries of long text


For longer texts there might be socalled token limits constraining the number of texts you can pass in at one time. This way we can use the provided tools like text splitter and chain to solve the problem.


In [18]:
from langchain.llms import OpenAI
from langchain.chains.summarize import load_summarize_chain
from langchain.text_splitter import RecursiveCharacterTextSplitter

llm = OpenAI(temperature=0, openai_api_key=openai_api_key)

with open('theheartofabrokenstory.txt','r') as file:
    text=file.read()
print(text[-300:])

 Horgenschlag knew it, Doris Hillman and things were filing away Shirley Lester in the back of his mind. And Shirley Lester, the thought of her, no longer was available.  And that’s why I never wrote a boy-meets-girl story for Collier’s. In a boy-meets-girl story the boy should always meet the girl.


In [19]:
num_token=llm.get_num_tokens(text)

print(f'There are {num_token} tokens in this file')

There are 4472 tokens in this file


So as seen, 4,472 tokens might be too long for one query. Now it would be splitted or called chunked into smaller parts. There are many methods can be utilized for this [can be read here](https://python.langchain.com/en/latest/modules/indexes/text_splitters.html), but in this doc the `RecursiveCharacterTextSplitter()` would be used.

In [38]:
# ["\n\n", "\n", " ", ""] is the default list of separators, and proved necessary here
text_splitter = RecursiveCharacterTextSplitter(separators=["\n\n", "\n", " ", ""], chunk_size=5000, chunk_overlap=350 )

docs = text_splitter.create_documents([text])

print(f'Now we have {len(docs)} docs for the original text file, e,g, the number of the tokens in the first doc of splitted text is {llm.get_num_tokens(str(docs[0]))}.')


Now we have 4 docs for the original text file, e,g, the number of the tokens in the first doc of splitted text is 1027.


Next use a **chain** to combine them. Here we use chain type `map_reduce` thus we can get concise summaries. Type `map_reduce` is a chian specifically for summarizing. See how `map_reduce` type works [here.](https://github.com/hwchase17/langchain/blob/master/langchain/chains/summarize/map_reduce_prompt.py)

About different chain types and token limit workarounds, watch [this video](https://www.youtube.com/watch?v=f9_BWhCI4Zo).

In [39]:
chain = load_summarize_chain(llm=llm, chain_type="map_reduce", verbose=True)

ouput = chain.run(docs)
print(ouput)




[1m> Entering new MapReduceDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mWrite a concise summary of the following:


"The Heart of a Broken Story Esquire XVI, September 1941  The only real difficulty in concocting a boy-meets-girl story is that, somehow, he must EVERY day Justin Horgenschlag, thirty-dollar-a-week printer’s assistant, saw at close quarters approximately sixty women whom he had never seen before. Thus in the few years he had lived in New York, Horgenschlag had seen at close quarters about 75,120 different women. Of these 75,120 women, roughly 25,000 were under thirty years of age and over fifteen years of age. Of the 25,000 only 5,000 weighed between one hundred five and one hundred twenty-five pounds. Of these 5,000 only 1,000 were not ugly. Only 500 were reasonably attractive; only 100 of these were quite attractive; only 25 could have inspired a long, slow whistle. And with only 1 did Horgenschlag fall 

### Q&A over documents

[Official documentation about this.](https://python.langchain.com/en/latest/use_cases/question_answering.html)

The basic two components:
1. Tell it the context around the question
2. Tell it the question
i.e., `"llm(your context + your question) = your answer"`

Some good usages are chatting own documents, asking questions to academic papers, creating study guides, reference medical information

#### 1. Simple Q&A

Simple as the explained two components above.


In [42]:
from langchain.llms import OpenAI

llm = OpenAI(temperature=0, openai_api_key=openai_api_key)

context="""
split(): This method is available for strings in Python and is used to split a string into a list of substrings based on a delimiter. The delimiter is specified as an argument to the split() method. Some special features of split() include: It can split a string into substrings based on whitespace by default.
You can specify a custom delimiter, such as a specific character or sequence of characters.
You can limit the number of splits using the optional maxsplit parameter.

splitlines(): This method is also available for strings in Python and is used to split a string into a list of lines. It recognizes different line-ending characters such as \n, \r, or \r\n. Some special features of splitlines() include: It splits the string into lines while preserving the line-ending characters.
By default, it removes the line-ending characters from the resulting list of lines. You can preserve them by passing True as an argument.

re.split(): This method is part of the re module in Python and provides more advanced splitting capabilities using regular expressions. Some special features of re.split() include: It allows you to split a string based on a complex pattern defined by a regular expression.
You can use capture groups in the regular expression to include the delimiter in the resulting list.
It provides more flexibility in handling various splitting scenarios."""

question = "What should I use when I wanna split text involving regular expressions?"

output = llm(context + question)

# Did strip the text to remove the leading and trailing whitespace
print (output.strip())

If you want to split text involving regular expressions, you should use the re.split() method from the re module in Python. This method provides more advanced splitting capabilities using regular expressions and allows you to split a string based on a complex pattern defined by a regular expression. You can also use capture groups in the regular expression to include the delimiter in the resulting list.


Well, it would be really trouble when it is required to even select the correct data into context. This is called "[document retrieval](https://python.langchain.com/en/latest/modules/indexes/retrievers.html)" and tightly related to AI Memory.

#### 2. Using embeddings

Basically means using embedding to find answers through a longer given text, that is, splitting text, embedding the chunks, putting them in a Db for querying. [Here is a tutorial for doing a book.](https://www.youtube.com/watch?v=h0DHDp1FbmQ).

A key point is selecting relevant chunks by **pulling similar texts based on comparisons of vector embeddings**.

*Now I am going to use self-hosted models.*

**update: So far serious performance and accuracy issues with local model, more study work required to be done. But 0 issue with smooth experience using OpenAI API, strange, performance difference should not be this big**

In [99]:
from langchain import OpenAI

# The vectorstore we'll be using
from langchain.vectorstores import FAISS

# The LangChain component we'll use to get the documents
from langchain.chains import RetrievalQA
from langchain.chains.question_answering import load_qa_chain

# The easy document loader for text
from langchain.document_loaders import TextLoader

# The embedding engine that will convert our text to vectors, doc: https://api.python.langchain.com/en/latest/modules/embeddings.html#langchain.embeddings.HuggingFaceEmbeddings.embed_documents\
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.embeddings.openai import OpenAIEmbeddings


# Locaing text splitter
from langchain.text_splitter import RecursiveCharacterTextSplitter

# For loading local models
from langchain.llms import GPT4All
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
import os

model_path="/Users/hann/Projects/RES_LLM/private-llm/models/"
#  models are ggml-mpt-7b-chat.bin, ggml-gpt4all-l13b-snoozy.bin, ggml-gpt4all-j-v1.3-groovy.bin, ggml-mpt-7b-instruct.bin, ggml-v3-13b-hermes-q5_1.bin
models={
    "groovy": os.path.join(model_path, "ggml-gpt4all-j-v1.3-groovy.bin"),
    "snoozy": os.path.join(model_path, "ggml-gpt4all-l13b-snoozy.bin"),
    "7bchat": os.path.join(model_path, "ggml-mpt-7b-chat.bin"),
    "7binstruct": os.path.join(model_path, "ggml-mpt-7b-instruct.bin"),
    "hermes": os.path.join(model_path, "ggml-v3-13b-hermes-q5_1.bin")
}

#  "groovy", "snoozy",  "7bchat", "7binstruct", "hermes"
md = "7bchat"
llm = GPT4All(model=models[md], callbacks=[StreamingStdOutCallbackHandler()])
# llm = OpenAI(temperature=0, openai_api_key=openai_api_key)



Found model file at  /Users/hann/Projects/RES_LLM/private-llm/models/ggml-mpt-7b-chat.bin
mpt_model_load: loading model from '/Users/hann/Projects/RES_LLM/private-llm/models/ggml-mpt-7b-chat.bin' - please wait ...
mpt_model_load: n_vocab        = 50432
mpt_model_load: n_ctx          = 2048
mpt_model_load: n_embd         = 4096
mpt_model_load: n_head         = 32
mpt_model_load: n_layer        = 32
mpt_model_load: alibi_bias_max = 8.000000
mpt_model_load: clip_qkv       = 0.000000
mpt_model_load: ftype          = 2
mpt_model_load: ggml ctx size = 5653.09 MB
mpt_model_load: kv self size  = 1024.00 MB
mpt_model_load: ........................ done
mpt_model_load: model size =  4629.02 MB / num tensors = 194


In [100]:
# Document loading
loader = TextLoader('theheartofabrokenstory.txt')
doc = loader.load()
print (f"You have {len(doc)} document")
print (f"You have {len(doc[0].page_content)} characters in that document")

You have 1 document
You have 16523 characters in that document


In [101]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=300)
docs = text_splitter.split_documents(doc)

# Get the total number of characters so we can see the average later
num_total_characters = sum([len(x.page_content) for x in docs])

print (f"Now you have {len(docs)} documents that have an average of {num_total_characters / len(docs):,.0f} characters (smaller pieces)")

Now you have 12 documents that have an average of 1,599 characters (smaller pieces)


In [102]:
# Loading embedding engine
model_name = "sentence-transformers/all-mpnet-base-v2"
model_kwargs = {'device': 'cpu'}
encode_kwargs = {'normalize_embeddings': False}
embeddings = HuggingFaceEmbeddings(
    model_name=model_name,
    model_kwargs=model_kwargs,
    encode_kwargs=encode_kwargs
)

from dotenv import load_dotenv
load_dotenv()
openai_api_key = os.getenv('OPENAI_API_KEY')
# print(openai_api_key)

# embeddings_openai = OpenAIEmbeddings(openai_api_key=openai_api_key)

# Embed your documents and combine with the raw text in a pseudo db. Note: This will make an API call to OpenAI
docsearch = FAISS.from_documents(docs, embeddings)

Now create the retrieval engine and ask questions. The retriever will get similar documents and combine the question with the context. Then the LLM will answer the question.

**The original chain has a bit problem (or outdated), now here is the workable version**

In [103]:
# qa = RetrievalQA.from_chain_type(llm=llm, chain_type="map_reduce", retriever=docsearch.as_retriever())
qa_chain = load_qa_chain(llm, chain_type="stuff")
qa = RetrievalQA(combine_documents_chain=qa_chain, retriever=docsearch.as_retriever())


In [104]:
# query = "What does the author describe as good work?"
qa.run("What was the occupation of Justin?")

 The man described in these extracts is an office worker.  Justin's job title appears as " head printer" and we can infer from this that his profession or career must have been something related
 The man described in these extracts is an office worker.  Justin's job title appears as " head printer" and we can infer from this that his profession or career must have been something related

' The man described in these extracts is an office worker.  Justin\'s job title appears as " head printer" and we can infer from this that his profession or career must have been something related'

"If you wanted to do more you would hook this up to a cloud vector database, use a tool like metal and start managing your documents, with external data sources"

### Extraction

[Official documentation about this.](https://python.langchain.com/en/latest/use_cases/extraction.html)

Extraction is the process of parsing data from a piece of text. This is commonly used with output parsing in order to structure our data.

E.g., extract a structured row from a sentence to insert into a database, extract multiple rows from a long document to insert into a database, extracting parameters from a user query to make an API call. 

A popular library for extraction is [Kor](https://eyurtsev.github.io/kor/), check for advanced usages.

In [40]:
# To help construct our Chat Messages
from langchain.schema import HumanMessage
from langchain.prompts import PromptTemplate, ChatPromptTemplate, HumanMessagePromptTemplate

# We will be using a chat model, defaults to gpt-3.5-turbo
from langchain.chat_models import ChatOpenAI

# To parse outputs and get structured data back
from langchain.output_parsers import StructuredOutputParser, ResponseSchema

# chat_model = ChatOpenAI(temperature=0, model_name='gpt-3.5-turbo', openai_api_key=openai_api_key)

# For loading local models
from langchain.llms import GPT4All
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
import os

model_path="/Users/hann/Projects/RES_LLM/private-llm/models/"
#  models are ggml-mpt-7b-chat.bin, ggml-gpt4all-l13b-snoozy.bin, ggml-gpt4all-j-v1.3-groovy.bin, ggml-mpt-7b-instruct.bin, ggml-v3-13b-hermes-q5_1.bin
models={
    "groovy": os.path.join(model_path, "ggml-gpt4all-j-v1.3-groovy.bin"),
    "snoozy": os.path.join(model_path, "ggml-gpt4all-l13b-snoozy.bin"),
    "7bchat": os.path.join(model_path, "ggml-mpt-7b-chat.bin"),
    "7binstruct": os.path.join(model_path, "ggml-mpt-7b-instruct.bin"),
    "hermes": os.path.join(model_path, "ggml-v3-13b-hermes-q5_1.bin")
}
#  "groovy", "snoozy",  "7bchat", "7binstruct", "hermes"
md = "groovy"

# 1 for local, 0 for huggingface
use_local = 1 

# comment this line for using local model
use_local = 0

if use_local==0:
    chat_model = ChatOpenAI(temperature=0, model_name='gpt-3.5-turbo', openai_api_key=openai_api_key)
else:
    chat_model = GPT4All(model=models[md])

#### 1. Vanilla extraction

Simple example here

In [41]:
instructions = """
You will be given a sentence with fruit names, extract those fruit names and assign an emoji to them
Return the fruit name and emojis in a python dictionary
"""

fruit_names = """
Apple, Pear, this is an kiwi
"""

# Make your prompt which combines the instructions w/ the fruit names
prompt = str(instructions + fruit_names)

# Call the LLM
output = chat_model([HumanMessage(content=prompt)])
# output = chat_model(prompt)

print (output)
print (type(output))

content="{'Apple': '🍎', 'Pear': '🍐', 'kiwi': '🥝'}" additional_kwargs={} example=False
<class 'langchain.schema.AIMessage'>


Turn this into a proper python dictionary

In [42]:
output_dict = eval(output.content)

print (output_dict)
print (type(output_dict))

{'Apple': '🍎', 'Pear': '🍐', 'kiwi': '🥝'}
<class 'dict'>


#### Using LangChain's Response Schema
LangChain's response schema will does two things for us:

Autogenerate a prompt with bonafide format instructions. This is great because I don't need to worry about the prompt engineering side, I'll leave that up to LangChain!

Read the output from the LLM and turn it into a proper python object for me

Here I define the schema I want. I'm going to pull out the song and artist that a user wants to play from a pseudo chat message.

In [None]:
# The schema I want it to output
response_schemas = [
    ResponseSchema(name="artist", description="The name of the musical artist"),
    ResponseSchema(name="song", description="The name of the song that the artist plays")
]

# The parser that will look for the LLM output in my schema and return it back to me
output_parser = StructuredOutputParser.from_response_schemas(response_schemas)

# The format instructions that LangChain makes. Let's look at them
format_instructions = output_parser.get_format_instructions()
print(format_instructions)

The output should be a markdown code snippet formatted in the following schema, including the leading and trailing "\`\`\`json" and "\`\`\`":

```json
{
	"artist": string  // The name of the musical artist
	"song": string  // The name of the song that the artist plays
}
```


In [44]:
# The prompt template that brings it all together
# Note: This is a different prompt template than before because we are using a Chat Model

prompt = ChatPromptTemplate(
    messages=[
        HumanMessagePromptTemplate.from_template("Given a command from the user, extract the artist and song names \n \
                                                    {format_instructions}\n{user_prompt}")  
    ],
    input_variables=["user_prompt"],
    partial_variables={"format_instructions": format_instructions}
)

fruit_query = prompt.format_prompt(user_prompt="I really like So Young by Portugal. The Man")
print (fruit_query.messages[0].content)

Given a command from the user, extract the artist and song names 
                                                     The output should be a markdown code snippet formatted in the following schema, including the leading and trailing "\`\`\`json" and "\`\`\`":

```json
{
	"artist": string  // The name of the musical artist
	"song": string  // The name of the song that the artist plays
}
```
I really like So Young by Portugal. The Man


In [45]:
fruit_output = chat_model(fruit_query.to_messages())
output = output_parser.parse(fruit_output.content)

print (output)
print (type(output))

{'artist': 'Portugal. The Man', 'song': 'So Young'}
<class 'dict'>


### Evaluation

[Official documentation about this.](https://python.langchain.com/en/latest/use_cases/evaluation.html)

Evaluation is the process of doing quality checks on the output. Normally, deterministic code has tests we can run, but judging the output of LLMs is more difficult because of the unpredictableness and variability of natural language. LangChain provides tools that aid us in this journey.

E.g., Run quality checks on your summarization or Question & Answer pipelines, check the output of you summarization pipeline

In [137]:
# Embeddings, store, and retrieval
# from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import FAISS
from langchain.chains import RetrievalQA
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.chains.question_answering import load_qa_chain

# Model and doc loader
# from langchain import OpenAI
from langchain.document_loaders import TextLoader

# Eval!
from langchain.evaluation.qa import QAEvalChain

# For loading local models
from langchain.llms import GPT4All
# from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
import os

model_path="/Users/hann/Projects/RES_LLM/private-llm/models/"
#  models are ggml-mpt-7b-chat.bin, ggml-gpt4all-l13b-snoozy.bin, ggml-gpt4all-j-v1.3-groovy.bin, ggml-mpt-7b-instruct.bin, ggml-v3-13b-hermes-q5_1.bin
models={
    "groovy": os.path.join(model_path, "ggml-gpt4all-j-v1.3-groovy.bin"),
    "snoozy": os.path.join(model_path, "ggml-gpt4all-l13b-snoozy.bin"),
    "7bchat": os.path.join(model_path, "ggml-mpt-7b-chat.bin"),
    "7binstruct": os.path.join(model_path, "ggml-mpt-7b-instruct.bin"),
    "hermes": os.path.join(model_path, "ggml-v3-13b-hermes-q5_1.bin")
}
#  "groovy", "snoozy",  "7bchat", "7binstruct", "hermes"
md = "7binstruct"

# 1 for local, 0 for huggingface
use_local = 1 

# comment this line for using local model
# use_local = 0

if use_local==0:
    llm = OpenAI(temperature=0, openai_api_key=openai_api_key)
else:
    llm = GPT4All(model=models[md])

Found model file at  /Users/hann/Projects/RES_LLM/private-llm/models/ggml-mpt-7b-instruct.bin
mpt_model_load: loading model from '/Users/hann/Projects/RES_LLM/private-llm/models/ggml-mpt-7b-instruct.bin' - please wait ...
mpt_model_load: n_vocab        = 50432
mpt_model_load: n_ctx          = 2048
mpt_model_load: n_embd         = 4096
mpt_model_load: n_head         = 32
mpt_model_load: n_layer        = 32
mpt_model_load: alibi_bias_max = 8.000000
mpt_model_load: clip_qkv       = 0.000000
mpt_model_load: ftype          = 2
mpt_model_load: ggml ctx size = 5653.09 MB
mpt_model_load: kv self size  = 1024.00 MB
mpt_model_load: ........................ done
mpt_model_load: model size =  4629.02 MB / num tensors = 194


In [138]:
# Our long essay from before
loader = TextLoader('theheartofabrokenstory.txt')
doc = loader.load()

print (f"You have {len(doc)} document")
print (f"You have {len(doc[0].page_content)} characters in that document")

You have 1 document
You have 16523 characters in that document


In [139]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=3000, chunk_overlap=400)
docs = text_splitter.split_documents(doc)

# Get the total number of characters so we can see the average later
num_total_characters = sum([len(x.page_content) for x in docs])

print (f"Now you have {len(docs)} documents that have an average of {num_total_characters / len(docs):,.0f} characters (smaller pieces)")

Now you have 8 documents that have an average of 2,310 characters (smaller pieces)


In [140]:
# Loading embedding engine
model_name = "sentence-transformers/all-mpnet-base-v2"
model_kwargs = {'device': 'cpu'}
encode_kwargs = {'normalize_embeddings': False}
embeddings = HuggingFaceEmbeddings(
    model_name=model_name,
    model_kwargs=model_kwargs,
    encode_kwargs=encode_kwargs
)

# Embeddings and docstore
# embeddings = OpenAIEmbeddings(openai_api_key=openai_api_key)
docsearch = FAISS.from_documents(docs, embeddings)

Make your retrieval chain. Notice how I have an input_key parameter now. This tells the chain which key from a dictionary I supply has my prompt/query in it. I specify question to match the question in the dict below

In [141]:
chain = RetrievalQA.from_chain_type(llm=llm, chain_type="map_reduce", retriever=docsearch.as_retriever(), input_key="question")

# qa_chain = load_qa_chain(llm, chain_type="stuff")
# chain = RetrievalQA(combine_documents_chain=qa_chain, retriever=docsearch.as_retriever(),input_key="question")
# /

Now I'll pass a list of questions and ground truth answers to the LLM that I know are correct (I validated them as a human).

In [142]:
question_answers = [
    {'question' : "What is the occupation of Justin?", 'answer' : 'printer’s assistant'},
    {'question' : "In which city the story has happened", 'answer' : 'New York'}
]

Supposed to be like:

```json
[{'question': 'What is the occupation of Justin?',
  'answer': 'printer’s assistant',
  'result': ' Jsutin is a printer’s assistant.'},
 {'question': 'In which city the story has happened?',
  'answer': 'New York',
  'result': ' The story has happened in New York.'},]
```

I'll use chain.apply to run both my questions one by one separately.

One of the cool parts is that I'll get my list of question and answers dictionaries back, but there'll be another key in the dictionary result which will be the output from the LLM.

Note: I specifically made my 2nd question ambigious and tough to answer in one pass so the LLM would get it incorrect

In [143]:
predictions = chain.apply(question_answers)
predictions

 "I suppose I'm just one Of The Millions Who Was Never Meanth To Give Orders."
 "I suppose me as having been one Of The Millions Who Was Never Meanth To Give Orders."
 "I suppose I'm just one Of The Millions Who Was Never Meanth To Give Orders."


MPT: reached the end of the context window so resizing
Exception ignored on calling ctypes callback function: <function LLModel._recalculate_callback at 0x113c49e50>
Traceback (most recent call last):
  File "/Users/hann/miniconda3/envs/llm/lib/python3.8/site-packages/gpt4all/pyllmodel.py", line 334, in _recalculate_callback
    @staticmethod
KeyboardInterrupt: 


We then have the LLM compare my ground truth answer (the answer key) with the result from the LLM (result key).

Or simply, we are asking the LLM to grade itself. What a wild world we live in.

In [None]:
# Start your eval chain
eval_chain = QAEvalChain.from_llm(llm)

# Have it grade itself. The code below helps the eval_chain know where the different parts are
graded_outputs = eval_chain.evaluate(question_answers,
                                     predictions,
                                     question_key="question",
                                     prediction_key="result",
                                     answer_key='answer')
graded_outputs

 INCORRECT
 INCORRECT


[{'text': ' INCORRECT'}, {'text': ' INCORRECT'}]