# RAG

Let's evaluate your architecture on a Q&A dataset for the LangChain python docs.

Common RAG architectures have two main components:
1. Retriever -> provides information from a knowledge base. Vector search is simple and powerful, but this can include any database or arbitrary search engine
2. Response generator -> synthesizes a response to the user input based on a mixture of learned knowledge and the retrieved input.

Focusing on retrievers for unstructured data: you still have some additional design decisions you may want to make:

- What chunk size(s) to use for each document: too large and your system will be able to consider fewer documents at a time. Too small and the chunks themselves lack important context needed to interpret their content.
- How to index a single chunk: generating a single vector from an embedding model may be fine, or you can generate additional vectors based on summaries, hypothetical questions, or other related content. Some may even consider incorporating a keyword index or other structured metadata to better support different types of searches.
- How to assemble the retrieved chunks: once you've fetched the k-best list of "relevant" documents, you may want to do things like:
  - re-integrate the document into its parent context.
  - rerank the documents based on other criteria
 
All of these options come with tradeoffs in cost, response quality, and time. This may seem overwhelming at first! The good news is that the retrieval and response mechanism can be modular -> the better the information, the better the response, and the better the LLM, the better it is able to integrate the knowledge.

This notebook provides a RAG gym/playground you can use to evaluate different RAG strategies on a Q&A dataset generated from LangChain's python docs. The intent is to make it easy to experiment with different techniques to see their tradeoffs and make the appropriate decision for your use case.

## Pre-requisites

We will install quite a few prerequisites for this example since we are comparing various techinques and models.

In [1]:
# %pip install -U langchain_benchmarks
# %pip install -U langchain langsmith langchainhub chromadb openai huggingface pandas

In [2]:
# %pip install -e ../../../../langchain/libs/langchain

For this code to work, please configure LangSmith environment variables with your credentials.

In [3]:
%env LANGCHAIN_ENDPOINT=http://localhost:1984
%env LANGCHAIN_TRACING_V2=true

env: LANGCHAIN_ENDPOINT=http://localhost:1984


In [4]:
%load_ext autoreload
%autoreload 2

In [5]:
from langchain_benchmarks import clone_public_dataset
from langchain_benchmarks.rag import registry

In [6]:
registry

ID,Name,Dataset ID,Description
0,LangChain Docs Q&A,452ccafc-18e1-4314-885b-edd735f17b9d,Questions and answers based on a snapshot of the LangChain python docs. The environment provides the documents and the retriever information. Each example is composed of a question and reference answer. Success is measured based on the accuracy of the answer relative to the reference answer. We also measure the faithfulness of the model's response relative to the retrieved documents (if any).


In [7]:
langchain_docs = registry[0]

In [8]:
langchain_docs

0,1
ID,0
Name,LangChain Docs Q&A
Dataset ID,452ccafc-18e1-4314-885b-edd735f17b9d
Description,Questions and answers based on a snapshot of the LangChain python docs. The environment provides th...


In [9]:
clone_public_dataset(langchain_docs.dataset_id, dataset_name=langchain_docs.name)

Dataset LangChain Docs Q&A already exists. Skipping.
You can access the dataset at http://localhost/o/00000000-0000-0000-0000-000000000000/datasets/1e4bf58b-1a61-44fb-bb84-4c5c0e2b4b5b.


In [10]:
from langchain.embeddings import HuggingFaceEmbeddings, OpenAIEmbeddings
from langchain.vectorstores import FAISS, Chroma

In [11]:
embeddings = HuggingFaceEmbeddings(model_name="thenlper/gte-base")

In [12]:
retriever_factory = langchain_docs.retriever_factories["basic"]

In [13]:
# Indexes the documents with the specified embeddings
# Note that this does not apply any chunking to the docs,
# which means the
retriever = retriever_factory(embeddings)

In [14]:
# Factory for creating a chain.
chain_factory = langchain_docs.architecture_factories["chat-langchain"]

In [15]:
# Example
chain_factory(retriever).invoke({"question": "what's lcel?"})

'LCEL stands for LangChain Expression Language. It is a declarative language that allows you to easily compose chains together in LangChain. LCEL was designed to support putting prototypes into production with no code changes. Here are some key features of LCEL:\n\n- Streaming support: Chains built with LCEL can stream tokens straight from an LLM to a streaming output parser, providing incremental chunks of output.\n- Async support: LCEL chains can be called with both synchronous and asynchronous APIs, enabling the same code to be used for prototypes and in production.\n- Optimized parallel execution: LCEL automatically executes steps in parallel when possible, reducing latency.\n- Retries and fallbacks: Configurable retries and fallbacks improve the reliability of LCEL chains.\n- Access intermediate results: LCEL allows you to access intermediate results of complex chains, which can be useful for end-users or debugging.\n- Input and output schemas: LCEL chains have Pydantic and JSONSc

In [16]:
# Example with Code Llama on Ollama
from langchain.chat_models import ChatOllama

ollama = ChatOllama(model="codellama")

In [17]:
chain_factory(retriever, llm=ollama).invoke({"question": "what's lcel?"})

"\nLCEL stands for LangChain Expression Language. It's a markup language used to describe natural language conversations using chains of Runnables (the core LCEL interface). The goal is to make it easy to chain together different Runnable components in a sequence that accomplishes a specific task or generates a response. \n\nRunnables are the building blocks of LCEL, and there are many built-in ones. Chains of Runnables can be used to do everything from generating text to executing code. "

Let's test that our agent works

### Evaluate

Let's evaluate a retriever now.

In [18]:
from functools import partial

from langchain_benchmarks.rag import RAG_EVALUATION
from langsmith.client import Client

In [19]:
client = Client()

In [34]:
test_run = client.run_on_dataset(
    dataset_name=langchain_docs.name,
    llm_or_chain_factory=partial(chain_factory, retriever),
    evaluation=RAG_EVALUATION,
    verbose=True,
)

[----->                                            ] 10/86

In [None]:
test_run.get_aggregate_feedback()

# Comparing with other indexing strategies

## Parent Document Retriever

Let's try on a parent document retriever.

In [48]:
retriever_factory = langchain_docs.retriever_factories["parent-doc"]

# Indexes the documents with the specified embeddings
retriever = retriever_factory(embeddings)

In [50]:
parent_doc_test_run = client.run_on_dataset(
    dataset_name=langchain_docs.name,
    llm_or_chain_factory=partial(chain_factory, retriever),
    evaluation=RAG_EVALUATION,
    verbose=True,
)

View the evaluation results for project 'test-long-leg-61' at:
http://localhost/o/00000000-0000-0000-0000-000000000000/projects/p/dd556cd9-a156-4e5f-96bb-1c0e480fa230?eval=true

View all tests for Dataset LangChain Docs Q&A at:
http://localhost/o/00000000-0000-0000-0000-000000000000/datasets/1e4bf58b-1a61-44fb-bb84-4c5c0e2b4b5b
[->                                                ] 4/86

Chain failed for example 789d3416-cdaf-41be-b3c7-9da71513aeeb with inputs {'question': 'Will this work?\n\n```\nfrom langchain.chat_models import ChatOpenAI\n\nllm = ChatOpenAI(model="claude-2")\nllm.predict("Hi")\n```'}
Error Type: InvalidRequestError, Message: This model's maximum context length is 16385 tokens. However, your messages resulted in 17895 tokens. Please reduce the length of the messages.


[------->                                          ] 14/86

Chain failed for example 0f0322dd-a92f-4042-afd3-4782a9dc9c2f with inputs {'question': 'how to run a runnable'}
Error Type: InvalidRequestError, Message: This model's maximum context length is 16385 tokens. However, your messages resulted in 56235 tokens. Please reduce the length of the messages.
Chain failed for example a04d94f6-f12d-48e9-9128-0b2894c768e5 with inputs {'question': 'what does runnable.predict() mean?'}
Error Type: InvalidRequestError, Message: This model's maximum context length is 16385 tokens. However, your messages resulted in 31952 tokens. Please reduce the length of the messages.


[--------->                                        ] 17/86

Chain failed for example a88b6a8d-fe1f-4402-818a-668d7481e04e with inputs {'question': 'What class type is returned by initialize_agent'}
Error Type: InvalidRequestError, Message: This model's maximum context length is 16385 tokens. However, your messages resulted in 48201 tokens. Please reduce the length of the messages.


[--------->                                        ] 18/86

Chain failed for example 1f97ba11-f475-4974-b179-3be47f5882ec with inputs {'question': 'How do i run llama 2 in langchain'}
Error Type: InvalidRequestError, Message: This model's maximum context length is 16385 tokens. However, your messages resulted in 21917 tokens. Please reduce the length of the messages.


[------------>                                     ] 22/86

Chain failed for example 1a957869-5979-400c-805f-bc089782cb40 with inputs {'question': 'Let\'s say I have a chain like:\n\nmodel_call_1 = (\n  RunnablePassthrough()\n  | prompt\n  | model_parser\n)\n# ...\nchain = model_call_1 #| { "attr": model_call_2 } | model_call_3\nHow can I print out the filled out prompts for each model call?'}
Error Type: InvalidRequestError, Message: This model's maximum context length is 16385 tokens. However, your messages resulted in 27114 tokens. Please reduce the length of the messages.


[------------>                                     ] 23/86

Chain failed for example 39e0baee-6514-4c18-be15-00364c95bead with inputs {'question': 'soooo, is it possible to pass any kwargs for similarity thresholds or K-top documents in the multivectorretriever as we do in normal retrievers?'}
Error Type: InvalidRequestError, Message: This model's maximum context length is 16385 tokens. However, your messages resulted in 26092 tokens. Please reduce the length of the messages.


[-------------->                                   ] 26/86

Chain failed for example 9b046e8c-4541-4205-a06c-adc19fce2c46 with inputs {'question': 'what does runnable mean'}
Error Type: InvalidRequestError, Message: This model's maximum context length is 16385 tokens. However, your messages resulted in 48701 tokens. Please reduce the length of the messages.
Chain failed for example 023d8950-a200-4747-9d5a-3c63bb61f7eb with inputs {'question': 'what method should subclasses override if they can start producing output while input is still being generated'}
Error Type: InvalidRequestError, Message: This model's maximum context length is 16385 tokens. However, your messages resulted in 24838 tokens. Please reduce the length of the messages.


[---------------->                                 ] 29/86

Chain failed for example e0ad7e00-3243-493f-a59f-dc105bb99d96 with inputs {'question': 'How can I use OpenAI functions to get structured outputs in a chain?'}
Error Type: InvalidRequestError, Message: This model's maximum context length is 16385 tokens. However, your messages resulted in 37297 tokens. Please reduce the length of the messages.


[---------------->                                 ] 30/86

Chain failed for example 9d712d95-e9f6-4769-b86a-c773ef852321 with inputs {'question': 'how can I create a vectorstore from the texts in the list'}
Error Type: InvalidRequestError, Message: This model's maximum context length is 16385 tokens. However, your messages resulted in 17836 tokens. Please reduce the length of the messages.
Chain failed for example b4cb6d14-ff16-45d7-9458-6e605a87165a with inputs {'question': 'what does this do? return RunnableBranch(\n        (\n            RunnableLambda(lambda x: bool(x.get("chat_history"))).with_config(\n                run_name="HasChatHistoryCheck"\n            ),\n            conversation_chain.with_config(run_name="RetrievalChainWithHistory"),\n        ),\n        (\n            RunnableLambda(itemgetter("question")).with_config(\n                run_name="Itemgetter:question"\n            )\n            | retriever\n        ).with_config(run_name="RetrievalChainWithNoHistory"),\n    ).with_config(run_name="RouteDependingOnChatHistory")

[------------------>                               ] 33/86

Chain failed for example a6dd2f88-6ec2-4c6c-bf1e-81e6907f43f2 with inputs {'question': 'how do I search and filter metadata in redis vectorstore?'}
Error Type: InvalidRequestError, Message: This model's maximum context length is 16385 tokens. However, your messages resulted in 19817 tokens. Please reduce the length of the messages.
Chain failed for example 31e43b7c-a552-4d64-a3c6-086af9534405 with inputs {'question': 'how do I control the maximum number requests that can be made at the same time when making batch calls?'}
Error Type: InvalidRequestError, Message: This model's maximum context length is 16385 tokens. However, your messages resulted in 34360 tokens. Please reduce the length of the messages.


[--------------------->                            ] 38/86

Chain failed for example f6829b0f-e90d-400e-968e-726dd5ee66a9 with inputs {'question': 'whats the difference between run_id and example_id'}
Error Type: InvalidRequestError, Message: This model's maximum context length is 16385 tokens. However, your messages resulted in 24044 tokens. Please reduce the length of the messages.


[---------------------->                           ] 39/86

Chain failed for example f0cc53b4-f427-4f82-8b89-85f29f21fefd with inputs {'question': 'which document laaoder should i use for a loading a single web apage?'}
Error Type: InvalidRequestError, Message: This model's maximum context length is 16385 tokens. However, your messages resulted in 59012 tokens. Please reduce the length of the messages.


[---------------------->                           ] 40/86

Chain failed for example b815e7d6-9547-4e93-83c2-84fdb3c02e97 with inputs {'question': 'What is a chain?'}
Error Type: InvalidRequestError, Message: This model's maximum context length is 16385 tokens. However, your messages resulted in 31729 tokens. Please reduce the length of the messages.


[----------------------->                          ] 41/86

Chain failed for example bd080ab4-780b-4e85-9835-ee8cab4fa2d9 with inputs {'question': 'What are some ways of doing retrieval augmented generation?'}
Error Type: InvalidRequestError, Message: This model's maximum context length is 16385 tokens. However, your messages resulted in 22376 tokens. Please reduce the length of the messages.


[------------------------>                         ] 43/86

Chain failed for example 301af9d5-8e5c-4ec4-9cf5-25a88703173a with inputs {'question': 'how many llm api calls are made in OpenAIFunctionsAgent'}
Error Type: InvalidRequestError, Message: This model's maximum context length is 16385 tokens. However, your messages resulted in 45603 tokens. Please reduce the length of the messages.


[-------------------------->                       ] 47/86

Chain failed for example 14530d34-51f5-419c-bf62-67f6e1333278 with inputs {'question': "What's function calling"}
Error Type: InvalidRequestError, Message: This model's maximum context length is 16385 tokens. However, your messages resulted in 27732 tokens. Please reduce the length of the messages.


[--------------------------->                      ] 49/86

Chain failed for example 7dfa162f-3c38-4c2d-b771-323d169d5312 with inputs {'question': 'im getting a rate limit error for my llm - how do I have it automatically go to a different model?'}
Error Type: InvalidRequestError, Message: This model's maximum context length is 16385 tokens. However, your messages resulted in 18269 tokens. Please reduce the length of the messages.


[------------------------------>                   ] 54/86

Chain failed for example 8522ca3e-b5c1-4634-8b7f-d9e2b78961dd with inputs {'question': 'How do I use Qdrant as a vector store in the conversational retrieval chain?'}
Error Type: InvalidRequestError, Message: This model's maximum context length is 16385 tokens. However, your messages resulted in 17075 tokens. Please reduce the length of the messages.
Chain failed for example 2f67cf5e-9855-4357-a980-19b16bfb0b12 with inputs {'question': 'my agent keeps getting an OutputParserException is something i can set to make it take care of these?'}
Error Type: InvalidRequestError, Message: This model's maximum context length is 16385 tokens. However, your messages resulted in 47406 tokens. Please reduce the length of the messages.


[--------------------------------->                ] 59/86

Chain failed for example 70063d0e-3bc3-4b3e-bd81-d24f7c07b186 with inputs {'question': 'I want to return the source documents of my Weaviate retriever. Show me how'}
Error Type: InvalidRequestError, Message: This model's maximum context length is 16385 tokens. However, your messages resulted in 39044 tokens. Please reduce the length of the messages.


[------------------------------------>             ] 64/86

Chain failed for example 26d170f9-c0b2-4ccb-a11c-480416041cea with inputs {'question': 'What is html2texttransformer? Does it omit urls?'}
Error Type: InvalidRequestError, Message: This model's maximum context length is 16385 tokens. However, your messages resulted in 48291 tokens. Please reduce the length of the messages.
Chain failed for example 674c0c0e-92b5-4c48-91cf-14768624b0bd with inputs {'question': 'what are the main methods supported by Runnables'}
Error Type: InvalidRequestError, Message: This model's maximum context length is 16385 tokens. However, your messages resulted in 19708 tokens. Please reduce the length of the messages.


[-------------------------------------->           ] 67/86

Chain failed for example b1e2b78a-0d96-4354-a4a5-c580ad91daef with inputs {'question': 'I want to save the configuration for a given LLM. Show me how to do that.'}
Error Type: InvalidRequestError, Message: This model's maximum context length is 16385 tokens. However, your messages resulted in 55564 tokens. Please reduce the length of the messages.


[---------------------------------------->         ] 71/86

Chain failed for example 2440f23d-053d-4bba-b3c4-c1b90d376bb9 with inputs {'question': 'What serialization format is used to serialize chains to and from disk?'}
Error Type: InvalidRequestError, Message: This model's maximum context length is 16385 tokens. However, your messages resulted in 21008 tokens. Please reduce the length of the messages.


[--------------------------------------------->    ] 79/86

Chain failed for example d2c50cf2-a942-4cc1-852a-d7bd449ec4fe with inputs {'question': "What's an LLMChain"}
Error Type: InvalidRequestError, Message: This model's maximum context length is 16385 tokens. However, your messages resulted in 51937 tokens. Please reduce the length of the messages.


[---------------------------------------------->   ] 81/86

Chain failed for example 47dfa9ba-7a29-48a8-b5c2-2017bd24ac6f with inputs {'question': "What's a runnable lambda?"}
Error Type: InvalidRequestError, Message: This model's maximum context length is 16385 tokens. However, your messages resulted in 49897 tokens. Please reduce the length of the messages.
Chain failed for example 07453ed9-93ca-4e29-a494-2b145b774dc5 with inputs {'question': 'How do I deal with openai rate limiting by having a backup model? Show me with code.'}
Error Type: InvalidRequestError, Message: This model's maximum context length is 16385 tokens. However, your messages resulted in 20758 tokens. Please reduce the length of the messages.


[------------------------------------------------->] 86/86
 Eval quantiles:
        embedding_cosine_distance  faithfulness  score_string:accuracy  \
count                   55.000000     18.000000              55.000000   
unique                        NaN           NaN                    NaN   
top                           NaN           NaN                    NaN   
freq                          NaN           NaN                    NaN   
mean                     0.109461      0.861111               0.589091   
std                      0.068414      0.259272               0.330921   
min                      0.031590      0.100000               0.100000   
25%                      0.056286      0.775000               0.300000   
50%                      0.090218      1.000000               0.700000   
75%                      0.140353      1.000000               1.000000   
max                      0.278385      1.000000               1.000000   

                                   

In [51]:
parent_doc_test_run.get_aggregate_feedback()

Unnamed: 0,embedding_cosine_distance,faithfulness,score_string:accuracy,error,execution_time
count,55.0,18.0,55.0,31,86.0
unique,,,,31,
top,,,,This model's maximum context length is 16385 t...,
freq,,,,1,
mean,0.109461,0.861111,0.589091,,7.540131
std,0.068414,0.259272,0.330921,,3.710658
min,0.03159,0.1,0.1,,2.353036
25%,0.056286,0.775,0.3,,5.116402
50%,0.090218,1.0,0.7,,6.406112
75%,0.140353,1.0,1.0,,8.658157


## HyDE

HyDE (Hypothetical document embeddings) refers to the technique of using an LLM
to generate example queries that my be used to retrieve a doc. By doing so, the resulting embeddings are automatically "more aligned" with the embeddings generated from the query. This comes with an additional indexing cost, since each document requires an additoinal call to an LLM while indexing.

In [52]:
retriever_factory = langchain_docs.retriever_factories["hyde"]

retriever = retriever_factory(embeddings)


KeyboardInterrupt



In [None]:
hyde_test_run = client.run_on_dataset(
    dataset_name=langchain_docs.name,
    llm_or_chain_factory=partial(chain_factory, retriever),
    evaluation=RAG_EVALUATION,
    verbose=True,
)

In [None]:
hyde_test_run.get_aggregate_feedback()

# Comparing Embeddings

We've been using off-the-shelf GTE-Base embeddings so far to retrieve the docs, but
what happens if we switch this up? Let's compare to OpenAI's embeddings.

In [53]:
from langchain.embeddings.openai import OpenAIEmbeddings

openai_embeddings = OpenAIEmbeddings()

In [54]:
openai_retriever = langchain_docs.retriever_factories["basic"](openai_embeddings)

In [55]:
openai_embeddings_test_run = client.run_on_dataset(
    dataset_name=langchain_docs.name,
    llm_or_chain_factory=partial(chain_factory, openai_retriever),
    evaluation=RAG_EVALUATION,
    verbose=True,
)

View the evaluation results for project 'test-ample-bag-78' at:
http://localhost/o/00000000-0000-0000-0000-000000000000/projects/p/81c75e74-c54c-4c5e-9a5d-e841c743ad6d?eval=true

View all tests for Dataset LangChain Docs Q&A at:
http://localhost/o/00000000-0000-0000-0000-000000000000/datasets/1e4bf58b-1a61-44fb-bb84-4c5c0e2b4b5b
[--------->                                        ] 18/86

Retrying langchain.embeddings.openai.embed_with_retry.<locals>._embed_with_retry in 4.0 seconds as it raised APIConnectionError: Error communicating with OpenAI: ('Connection aborted.', ConnectionResetError(54, 'Connection reset by peer')).
Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised APIConnectionError: Error communicating with OpenAI: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')).
Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised APIConnectionError: Error communicating with OpenAI: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')).
Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised APIConnectionError: Error communicating with OpenAI: ('Connection aborte

[------------------------------------------------->] 86/86
 Eval quantiles:
        embedding_cosine_distance  faithfulness  score_string:accuracy error  \
count                   86.000000     80.000000              86.000000     0   
unique                        NaN           NaN                    NaN     0   
top                           NaN           NaN                    NaN   NaN   
freq                          NaN           NaN                    NaN   NaN   
mean                     0.160311      0.480000               0.366279   NaN   
std                      0.064935      0.253782               0.280172   NaN   
min                      0.062004      0.100000               0.100000   NaN   
25%                      0.112516      0.300000               0.100000   NaN   
50%                      0.145869      0.500000               0.300000   NaN   
75%                      0.193581      0.550000               0.500000   NaN   
max                      0.323163      1.000

In [None]:
openai_embeddings_test_run

## Comparing Models

We used OpenAI's gpt-3.5-turbo in our previous tests, but lets try with some other models.

You can swap in any LangChain LLM within the response generator below.
We'll try anthropic's claude-2 first. 

In [None]:
# from langchain.chat_models import ChatOllama

# # A llama2-based model with 128k context
# # (in theory) In practice, we will see how well
# # it actually leverages that context.
# ollama = ChatOllama(model="yarn-llama2:7b-128k")

In [59]:
from langchain.chat_models import ChatAnthropic

# A llama2-based model with 128k context
# (in theory) In practice, we will see how well
# it actually leverages that context.
# fireworks = ChatFireworks(model="yarn-llama2:7b-128k")

In [64]:
# We'll go back to the GTE embeddings for now

retriever_factory = langchain_docs.retriever_factories["basic"]
# Since claude-2 has a longer context window, we can fetch more documents by default
# Since this only impacts the retriever and NOT the index, this call will load from the cache
retriever = retriever_factory(embeddings, search_kwargs={"k": 10})

In [65]:
llm = ChatAnthropic(model="claude-2", temperature=1)

In [66]:
claude_test_run = client.run_on_dataset(
    dataset_name=langchain_docs.name,
    llm_or_chain_factory=partial(chain_factory, llm=llm, retriever=retriever),
    evaluation=RAG_EVALUATION,
    verbose=True,
)

View the evaluation results for project 'test-drab-street-64' at:
http://localhost/o/00000000-0000-0000-0000-000000000000/projects/p/9c4c3af8-0877-403b-a255-d6acb7279110?eval=true

View all tests for Dataset LangChain Docs Q&A at:
http://localhost/o/00000000-0000-0000-0000-000000000000/datasets/1e4bf58b-1a61-44fb-bb84-4c5c0e2b4b5b
[>                                                 ] 0/86

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


[------------------------------------------>       ] 74/86

Chain failed for example 963ab58f-bb81-4121-8da4-c5074a5d900a with inputs {'question': 'What is the purpose of caching embeddings?'}
Error Type: InternalServerError, Message: Error code: 500 - {'error': {'type': 'api_error', 'message': 'Internal server error'}}


[------------------------------------------------> ] 84/86

Retrying langchain.embeddings.openai.embed_with_retry.<locals>._embed_with_retry in 4.0 seconds as it raised APIConnectionError: Error communicating with OpenAI: ('Connection aborted.', ConnectionResetError(54, 'Connection reset by peer')).


[------------------------------------------------> ] 85/86

Retrying langchain.embeddings.openai.embed_with_retry.<locals>._embed_with_retry in 4.0 seconds as it raised APIConnectionError: Error communicating with OpenAI: ('Connection aborted.', ConnectionResetError(54, 'Connection reset by peer')).


[------------------------------------------------->] 86/86
 Eval quantiles:
        embedding_cosine_distance  faithfulness  score_string:accuracy  \
count                   85.000000     79.000000              85.000000   
unique                        NaN           NaN                    NaN   
top                           NaN           NaN                    NaN   
freq                          NaN           NaN                    NaN   
mean                     0.173510      0.532911               0.338824   
std                      0.061614      0.342959               0.264617   
min                      0.068957      0.100000               0.100000   
25%                      0.123281      0.300000               0.100000   
50%                      0.169280      0.500000               0.300000   
75%                      0.208775      1.000000               0.500000   
max                      0.322887      1.000000               1.000000   

                                   

## Changing the prompt in the response generator

The default prompt was tested primariily on OpenAI's gpt-3.5 model. When switching models, you may get better results if you modify the prompt. Let's try a simple one.

In [82]:
from langchain import hub
from langchain.schema.output_parser import StrOutputParser

In [80]:
prompt = hub.pull("wfh/rag-simple")

In [84]:
generator = prompt | ChatAnthropic(model="claude-2", temperature=1) | StrOutputParser()
new_chain = chain_factory(response_synthesizer=generator, retriever=openai_retriever)

In [None]:
claude_simple_prompt_test_run = client.run_on_dataset(
    dataset_name=langchain_docs.name,
    llm_or_chain_factory=partial(
        chain_factory, response_synthesizer=generator, retriever=retriever
    ),
    evaluation=RAG_EVALUATION,
    verbose=True,
)

View the evaluation results for project 'test-long-seat-30' at:
http://localhost/o/00000000-0000-0000-0000-000000000000/projects/p/c03ed2a9-a1c8-493e-b42d-66e5a1ce6567?eval=true

View all tests for Dataset LangChain Docs Q&A at:
http://localhost/o/00000000-0000-0000-0000-000000000000/datasets/1e4bf58b-1a61-44fb-bb84-4c5c0e2b4b5b
[>                                                 ] 0/86