# Question Answering with LangChain, OpenAI, and MultiQuery Retriever

This interactive workbook demonstrates example of Elasticsearch's [MultiQuery Retriever](https://api.python.langchain.com/en/latest/retrievers/langchain.retrievers.multi_query.MultiQueryRetriever.html) to generate similar queries for a given user input and apply all queries to retrieve a larger set of relevant documents from a vectorstore.

Before we begin, we first split the fictional workplace documents into passages with `langchain` and uses OpenAI to transform these passages into embeddings and then store these into Elasticsearch.

We will then ask a question, generate similar questions using langchain and OpenAI, retrieve relevant passages from the vector store, and use langchain and OpenAI again to provide a summary for the questions.

## Install packages and import modules

In [1]:
!python3 -m pip install -qU jq lark langchain langchain-elasticsearch langchain_openai tiktoken

from langchain_openai.embeddings import OpenAIEmbeddings
from langchain_elasticsearch import ElasticsearchStore
from langchain_openai.llms import OpenAI
from langchain.retrievers.multi_query import MultiQueryRetriever
from getpass import getpass

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/754.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━━━[0m [32m337.9/754.3 kB[0m [31m10.0 MB/s[0m eta [36m0:00:01[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m747.5/754.3 kB[0m [31m13.5 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m754.3/754.3 kB[0m [31m9.7 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/111.0 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m111.0/111.0 kB[0m [31m9.9 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/1.0 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/1.0 MB[0m [31m23.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━

## Connect to Elasticsearch

ℹ️ We're using an Elastic Cloud deployment of Elasticsearch for this notebook. If you don't have an Elastic Cloud deployment, sign up [here](https://cloud.elastic.co/registration?utm_source=github&utm_content=elasticsearch-labs-notebook) for a free trial.

We'll use the **Cloud ID** to identify our deployment, because we are using Elastic Cloud deployment. To find the Cloud ID for your deployment, go to https://cloud.elastic.co/deployments and select your deployment.

We will use [ElasticsearchStore](https://api.python.langchain.com/en/latest/vectorstores/langchain.vectorstores.elasticsearch.ElasticsearchStore.html) to connect to our elastic cloud deployment, This would help create and index data easily.  We would also send list of documents that we created in the previous step

In [7]:
# https://www.elastic.co/search-labs/tutorials/install-elasticsearch/elastic-cloud#finding-your-cloud-id
ELASTIC_CLOUD_ID = getpass("Elastic Cloud ID: ")

# https://www.elastic.co/search-labs/tutorials/install-elasticsearch/elastic-cloud#creating-an-api-key
ELASTIC_API_KEY = getpass("Elastic Api Key: ")

# https://platform.openai.com/api-keys
OPENAI_API_KEY = getpass("OpenAI API key: ")

embeddings = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY)

vectorstore = ElasticsearchStore(
    es_cloud_id=ELASTIC_CLOUD_ID,
    es_api_key=ELASTIC_API_KEY,
    index_name="index_store", #give it a meaningful name,
    embedding=embeddings,
)

Elastic Cloud ID: ··········
Elastic Api Key: ··········
OpenAI API key: ··········


## Indexing Data into Elasticsearch
Let's download the sample dataset and deserialize the document.

In [8]:
from urllib.request import urlopen
import json

url = "https://raw.githubusercontent.com/elastic/elasticsearch-labs/main/example-apps/chatbot-rag-app/data/data.json"

response = urlopen(url)
data = json.load(response)

with open("temp.json", "w") as json_file:
    json.dump(data, json_file)

### Split Documents into Passages

We’ll chunk documents into passages in order to improve the retrieval specificity and to ensure that we can provide multiple passages within the context window of the final question answering prompt.

Here we are chunking documents into 800 token passages with an overlap of 400 tokens.

Here we are using a simple splitter but Langchain offers more advanced splitters to reduce the chance of context being lost.

In [9]:
from langchain.document_loaders import JSONLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from elasticsearch import Elasticsearch


#when re-running, Elasticsearch may retain old mappings. Clear the index before reindexing
es = Elasticsearch(cloud_id=ELASTIC_CLOUD_ID, api_key=ELASTIC_API_KEY)
es.indices.delete(index="index_store", ignore_unavailable=True)

#coerces data to string
def metadata_func(record: dict, metadata: dict) -> dict:
    #Populate the metadata dictionary with keys name, summary, url, category, and updated_at.
    metadata["name"] = str(record.get("name", ""))
    metadata["summary"] = str(record.get("summary", ""))
    metadata["url"] = str(record.get("url", ""))
    metadata["category"] = str(record.get("category", ""))
    metadata["updated_at"] = str(record.get("updated_at", ""))

    return metadata


# For more loaders https://python.langchain.com/docs/modules/data_connection/document_loaders/
# And 3rd party loaders https://python.langchain.com/docs/modules/data_connection/document_loaders/#third-party-loaders
loader = JSONLoader(
    file_path="temp.json",
    jq_schema=".[]",
    content_key="content",
    metadata_func=metadata_func,
)

text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=500, chunk_overlap=50 #define chunk size and chunk overlap
)
docs = loader.load_and_split(text_splitter=text_splitter)

### Bulk Import Passages

Now that we have split each document into the chunk size of 800, we will now index data to elasticsearch using [ElasticsearchStore.from_documents](https://api.python.langchain.com/en/latest/vectorstores/langchain.vectorstores.elasticsearch.ElasticsearchStore.html#langchain.vectorstores.elasticsearch.ElasticsearchStore.from_documents).

We will use Cloud ID, Password and Index name values set in the `Create cloud deployment` step.

In [15]:
# documents = vectorstore.from_documents(
#     docs,
#     embeddings,
#     index_name="index_store",
#     es_cloud_id=ELASTIC_CLOUD_ID,
#     es_api_key=ELASTIC_API_KEY,
# )

# vectorstore = ElasticsearchStore.from_documents(
#     docs,
#     embedding=embeddings,
#     index_name="index_store",
#     es_cloud_id=ELASTIC_CLOUD_ID,
#     es_api_key=ELASTIC_API_KEY,
# )

#creates an errors catch to determine issue
for i, doc in enumerate(docs):
    try:
        vectorstore = ElasticsearchStore.from_documents(
            [doc],
            embedding=embeddings,
            index_name="index_store",
            es_cloud_id=ELASTIC_CLOUD_ID,
            es_api_key=ELASTIC_API_KEY,
        )
    except Exception as e:
        print(f"❌ Failed to index doc {i}: {doc.metadata}")
        print("Error:", e)

llm = OpenAI(temperature=0, openai_api_key=OPENAI_API_KEY)

retriever = MultiQueryRetriever.from_llm(vectorstore.as_retriever(), llm)

ERROR:langchain_community.vectorstores.elasticsearch:Error adding texts: 1 document(s) failed to index.
ERROR:langchain_community.vectorstores.elasticsearch:First error reason: [1:2905] failed to parse field [metadata.updated_at] of type [date] in document with id '81dcfa67-2efc-4022-9d0e-86acd1a84eea'. Preview of field's value: ''


❌ Failed to index doc 8: {'source': '/content/temp.json', 'seq_num': 6, 'name': 'Swe Career Matrix', 'summary': '\nThis career leveling matrix provides a framework for understanding the various roles and responsibilities of Software Engineers, as well as the skills and experience required for each level. It is intended to support employee development, facilitate performance evaluations, and provide a clear career progression path.', 'url': 'https://enterprisesearch.sharepoint.com/:t:/s/MSBuilddemo/EVYuEyRhHh5Aqc3a39sqbGcBkqKIHRWtJBjjUjNs6snpMg?e=nv1mf4', 'category': 'sharepoint', 'updated_at': ''}
Error: 1 document(s) failed to index.


ERROR:langchain_community.vectorstores.elasticsearch:Error adding texts: 1 document(s) failed to index.
ERROR:langchain_community.vectorstores.elasticsearch:First error reason: [1:2161] failed to parse field [metadata.updated_at] of type [date] in document with id '05d27919-98fe-47d9-9d69-9484db43d2df'. Preview of field's value: ''


❌ Failed to index doc 9: {'source': '/content/temp.json', 'seq_num': 6, 'name': 'Swe Career Matrix', 'summary': '\nThis career leveling matrix provides a framework for understanding the various roles and responsibilities of Software Engineers, as well as the skills and experience required for each level. It is intended to support employee development, facilitate performance evaluations, and provide a clear career progression path.', 'url': 'https://enterprisesearch.sharepoint.com/:t:/s/MSBuilddemo/EVYuEyRhHh5Aqc3a39sqbGcBkqKIHRWtJBjjUjNs6snpMg?e=nv1mf4', 'category': 'sharepoint', 'updated_at': ''}
Error: 1 document(s) failed to index.


ERROR:langchain_community.vectorstores.elasticsearch:Error adding texts: 1 document(s) failed to index.
ERROR:langchain_community.vectorstores.elasticsearch:First error reason: [1:2579] failed to parse field [metadata.updated_at] of type [date] in document with id '93d3d3ac-7bf7-4190-88d6-2354b085b77d'. Preview of field's value: ''


❌ Failed to index doc 10: {'source': '/content/temp.json', 'seq_num': 7, 'name': 'Sales Engineering Collaboration', 'summary': ": This guide provides an overview of how engineers can effectively collaborate with the sales team to ensure the success of a tech company. It includes understanding the sales team's role, communicating and collaborating on projects, engaging customers, and providing mutual respect and support.", 'url': 'https://enterprisesearch.sharepoint.com/:t:/s/MSBuilddemo/EW21-KJnfHBFoRiF49_uJMcBfHyPKimuPOFsCcJypQWaBQ?e=mGdIqe', 'category': 'sharepoint', 'updated_at': ''}
Error: 1 document(s) failed to index.


ERROR:langchain_community.vectorstores.elasticsearch:Error adding texts: 1 document(s) failed to index.
ERROR:langchain_community.vectorstores.elasticsearch:First error reason: [1:1892] failed to parse field [metadata.updated_at] of type [date] in document with id '64d19d76-076a-4627-81db-ace886e4d1e9'. Preview of field's value: ''


❌ Failed to index doc 11: {'source': '/content/temp.json', 'seq_num': 7, 'name': 'Sales Engineering Collaboration', 'summary': ": This guide provides an overview of how engineers can effectively collaborate with the sales team to ensure the success of a tech company. It includes understanding the sales team's role, communicating and collaborating on projects, engaging customers, and providing mutual respect and support.", 'url': 'https://enterprisesearch.sharepoint.com/:t:/s/MSBuilddemo/EW21-KJnfHBFoRiF49_uJMcBfHyPKimuPOFsCcJypQWaBQ?e=mGdIqe', 'category': 'sharepoint', 'updated_at': ''}
Error: 1 document(s) failed to index.


ERROR:langchain_community.vectorstores.elasticsearch:Error adding texts: 1 document(s) failed to index.
ERROR:langchain_community.vectorstores.elasticsearch:First error reason: [1:1621] failed to parse field [metadata.updated_at] of type [date] in document with id 'a20c360d-c74d-437d-8499-d362cac4b0ab'. Preview of field's value: ''


❌ Failed to index doc 12: {'source': '/content/temp.json', 'seq_num': 8, 'name': 'Intellectual Property Policy', 'summary': "This Intellectual Property Policy outlines guidelines and procedures for the ownership, protection, and utilization of intellectual property generated by employees during their employment. It establishes the company's ownership of work generated on company time, while recognizing employee ownership of work generated outside of company time without the use of company resources. The policy", 'url': 'https://enterprisesearch.sharepoint.com/:t:/s/MSBuilddemo/EWz3cYEVdzBNsiHsYbKhms4BVYGhravyrUw3T3lzxL4pTg?e=mPIgbO', 'category': 'sharepoint', 'updated_at': ''}
Error: 1 document(s) failed to index.


ERROR:langchain_community.vectorstores.elasticsearch:Error adding texts: 1 document(s) failed to index.
ERROR:langchain_community.vectorstores.elasticsearch:First error reason: [1:2905] failed to parse field [metadata.updated_at] of type [date] in document with id '4a088bf1-0643-4dff-9acc-2d07edeec99f'. Preview of field's value: ''


❌ Failed to index doc 13: {'source': '/content/temp.json', 'seq_num': 8, 'name': 'Intellectual Property Policy', 'summary': "This Intellectual Property Policy outlines guidelines and procedures for the ownership, protection, and utilization of intellectual property generated by employees during their employment. It establishes the company's ownership of work generated on company time, while recognizing employee ownership of work generated outside of company time without the use of company resources. The policy", 'url': 'https://enterprisesearch.sharepoint.com/:t:/s/MSBuilddemo/EWz3cYEVdzBNsiHsYbKhms4BVYGhravyrUw3T3lzxL4pTg?e=mPIgbO', 'category': 'sharepoint', 'updated_at': ''}
Error: 1 document(s) failed to index.


ERROR:langchain_community.vectorstores.elasticsearch:Error adding texts: 1 document(s) failed to index.
ERROR:langchain_community.vectorstores.elasticsearch:First error reason: [1:3202] failed to parse field [metadata.updated_at] of type [date] in document with id '82eccbea-8177-405e-b649-735a92daa23e'. Preview of field's value: ''


❌ Failed to index doc 14: {'source': '/content/temp.json', 'seq_num': 9, 'name': 'Code Of Conduct', 'summary': 'This code of conduct outlines the principles and values that all employees are expected to uphold in their interactions with colleagues, customers, partners, and other stakeholders. It sets out core values such as integrity, respect, accountability, collaboration and excellence. Employees must comply with all applicable laws, regulations, and organizational', 'url': 'https://enterprisesearch.sharepoint.com/:t:/s/MSBuilddemo/ER3xmeKaZ_pAqPeJWyyNR0QBg6QmoWIGPhwfEyCABWHrPA?e=cvzrgV', 'category': 'sharepoint', 'updated_at': ''}
Error: 1 document(s) failed to index.


ERROR:langchain_community.vectorstores.elasticsearch:Error adding texts: 1 document(s) failed to index.
ERROR:langchain_community.vectorstores.elasticsearch:First error reason: [1:2181] failed to parse field [metadata.updated_at] of type [date] in document with id '11297f33-dcef-4fdd-980b-a9f0f883b4b7'. Preview of field's value: ''


❌ Failed to index doc 15: {'source': '/content/temp.json', 'seq_num': 9, 'name': 'Code Of Conduct', 'summary': 'This code of conduct outlines the principles and values that all employees are expected to uphold in their interactions with colleagues, customers, partners, and other stakeholders. It sets out core values such as integrity, respect, accountability, collaboration and excellence. Employees must comply with all applicable laws, regulations, and organizational', 'url': 'https://enterprisesearch.sharepoint.com/:t:/s/MSBuilddemo/ER3xmeKaZ_pAqPeJWyyNR0QBg6QmoWIGPhwfEyCABWHrPA?e=cvzrgV', 'category': 'sharepoint', 'updated_at': ''}
Error: 1 document(s) failed to index.


ERROR:langchain_community.vectorstores.elasticsearch:Error adding texts: 1 document(s) failed to index.
ERROR:langchain_community.vectorstores.elasticsearch:First error reason: [1:2906] failed to parse field [metadata.updated_at] of type [date] in document with id 'ebf785ca-60e9-46a1-b0ba-3a8ae8c64f14'. Preview of field's value: ''


❌ Failed to index doc 16: {'source': '/content/temp.json', 'seq_num': 10, 'name': 'Office Pet Policy', 'summary': 'This policy outlines the guidelines and procedures for bringing pets into the workplace. It covers approval process, pet behavior and supervision, allergies and phobias, cleanliness and hygiene, liability, restricted areas, and policy review. Employees must obtain prior approval from their supervisor and the HR department before bringing their', 'url': 'https://enterprisesearch.sharepoint.com/:t:/s/MSBuilddemo/ETf-69wBeaZJpAn3CY7ExRABQWvav-p24VOnB6C0A4l2pQ?e=X72WuK', 'category': 'sharepoint', 'updated_at': ''}
Error: 1 document(s) failed to index.


ERROR:langchain_community.vectorstores.elasticsearch:Error adding texts: 1 document(s) failed to index.
ERROR:langchain_community.vectorstores.elasticsearch:First error reason: [1:1390] failed to parse field [metadata.updated_at] of type [date] in document with id 'cbf9bad2-1ba3-45cf-a246-3ccec5254c5b'. Preview of field's value: ''


❌ Failed to index doc 17: {'source': '/content/temp.json', 'seq_num': 10, 'name': 'Office Pet Policy', 'summary': 'This policy outlines the guidelines and procedures for bringing pets into the workplace. It covers approval process, pet behavior and supervision, allergies and phobias, cleanliness and hygiene, liability, restricted areas, and policy review. Employees must obtain prior approval from their supervisor and the HR department before bringing their', 'url': 'https://enterprisesearch.sharepoint.com/:t:/s/MSBuilddemo/ETf-69wBeaZJpAn3CY7ExRABQWvav-p24VOnB6C0A4l2pQ?e=X72WuK', 'category': 'sharepoint', 'updated_at': ''}
Error: 1 document(s) failed to index.


ERROR:langchain_community.vectorstores.elasticsearch:Error adding texts: 1 document(s) failed to index.
ERROR:langchain_community.vectorstores.elasticsearch:First error reason: [1:3620] failed to parse field [metadata.updated_at] of type [date] in document with id 'bd1b94b7-d08a-4ae4-907b-2ad6731c7fd0'. Preview of field's value: ''


❌ Failed to index doc 18: {'source': '/content/temp.json', 'seq_num': 11, 'name': 'Performance Management Policy', 'summary': 'This Performance Management Policy outlines a consistent and transparent process for evaluating, recognizing, and rewarding employees. It includes goal setting, ongoing feedback, performance evaluations, ratings, promotions, and rewards. The policy applies to all employees and encourages open communication and professional growth.', 'url': 'https://enterprisesearch.sharepoint.com/:t:/s/MSBuilddemo/ERsxt9p1uehJqeJu4JlxkakBavbKwcldrYv_hpv3xHikAw?e=pf5R2C', 'category': 'sharepoint', 'updated_at': ''}
Error: 1 document(s) failed to index.


ERROR:langchain_community.vectorstores.elasticsearch:Error adding texts: 1 document(s) failed to index.
ERROR:langchain_community.vectorstores.elasticsearch:First error reason: [1:1099] failed to parse field [metadata.updated_at] of type [date] in document with id 'c71916f5-84ce-4b8a-b2d3-4ed7f772cb80'. Preview of field's value: ''


❌ Failed to index doc 19: {'source': '/content/temp.json', 'seq_num': 11, 'name': 'Performance Management Policy', 'summary': 'This Performance Management Policy outlines a consistent and transparent process for evaluating, recognizing, and rewarding employees. It includes goal setting, ongoing feedback, performance evaluations, ratings, promotions, and rewards. The policy applies to all employees and encourages open communication and professional growth.', 'url': 'https://enterprisesearch.sharepoint.com/:t:/s/MSBuilddemo/ERsxt9p1uehJqeJu4JlxkakBavbKwcldrYv_hpv3xHikAw?e=pf5R2C', 'category': 'sharepoint', 'updated_at': ''}
Error: 1 document(s) failed to index.


ERROR:langchain_community.vectorstores.elasticsearch:Error adding texts: 1 document(s) failed to index.
ERROR:langchain_community.vectorstores.elasticsearch:First error reason: [1:2734] failed to parse field [metadata.updated_at] of type [date] in document with id 'a751ad5f-7545-4df0-98fc-d61e04e4743d'. Preview of field's value: ''


❌ Failed to index doc 20: {'source': '/content/temp.json', 'seq_num': 12, 'name': 'Sales Organization Overview', 'summary': '\nOur sales organization is divided into four regions: The Americas, Europe, Asia-Pacific, and Middle East & Africa. Each region is led by an Area Vice-President and consists of dedicated account managers, sales representatives, and support staff. They collaborate with other departments to ensure the delivery of high', 'url': 'https://enterprisesearch.sharepoint.com/:t:/s/MSBuilddemo/EYsr1eqgn9hMslMJFLR-k54BBX-O3iC26bK7xNEBtYIBkg?e=xeAjiT', 'category': 'sharepoint', 'updated_at': ''}
Error: 1 document(s) failed to index.


ERROR:langchain_community.vectorstores.elasticsearch:Error adding texts: 1 document(s) failed to index.
ERROR:langchain_community.vectorstores.elasticsearch:First error reason: [1:3274] failed to parse field [metadata.updated_at] of type [date] in document with id '0290a469-d0f6-467c-95ce-ef804a163a86'. Preview of field's value: ''


❌ Failed to index doc 21: {'source': '/content/temp.json', 'seq_num': 13, 'name': 'Compensation Framework For It Teams', 'summary': 'This document outlines a compensation framework for IT teams. It includes job levels, compensation bands, and performance-based incentives to ensure fair and competitive wages. Regular market benchmarking will be conducted to adjust the bands according to industry trends.', 'url': 'https://enterprisesearch.sharepoint.com/:t:/s/MSBuilddemo/EaAFec6004tAg21g4i67rfgBBRqCm1yY7AZLLQyyaMtsEQ?e=wTMb4z', 'category': 'sharepoint', 'updated_at': ''}
Error: 1 document(s) failed to index.


ERROR:langchain_community.vectorstores.elasticsearch:Error adding texts: 1 document(s) failed to index.
ERROR:langchain_community.vectorstores.elasticsearch:First error reason: [1:1290] failed to parse field [metadata.updated_at] of type [date] in document with id '6d61406f-ea54-4de9-801c-2561e4c9b461'. Preview of field's value: ''


❌ Failed to index doc 22: {'source': '/content/temp.json', 'seq_num': 13, 'name': 'Compensation Framework For It Teams', 'summary': 'This document outlines a compensation framework for IT teams. It includes job levels, compensation bands, and performance-based incentives to ensure fair and competitive wages. Regular market benchmarking will be conducted to adjust the bands according to industry trends.', 'url': 'https://enterprisesearch.sharepoint.com/:t:/s/MSBuilddemo/EaAFec6004tAg21g4i67rfgBBRqCm1yY7AZLLQyyaMtsEQ?e=wTMb4z', 'category': 'sharepoint', 'updated_at': ''}
Error: 1 document(s) failed to index.


ERROR:langchain_community.vectorstores.elasticsearch:Error adding texts: 1 document(s) failed to index.
ERROR:langchain_community.vectorstores.elasticsearch:First error reason: [1:2785] failed to parse field [metadata.updated_at] of type [date] in document with id '11497881-4d78-48c7-8036-c6ddd434fec2'. Preview of field's value: ''


❌ Failed to index doc 23: {'source': '/content/temp.json', 'seq_num': 14, 'name': 'Updating Your Tax Elections Forms', 'summary': ': This guide gives a step-by-step explanation of how to update your TD1 Personal Tax Credits Return form. Access the form from the CRA website and choose the correct version based on your province or territory of residence. Download and open the form in Adobe Reader, fill out the form by entering', 'url': './github/Updating Your Tax Elections Forms.txt', 'category': 'github', 'updated_at': ''}
Error: 1 document(s) failed to index.


ERROR:langchain_community.vectorstores.elasticsearch:Error adding texts: 1 document(s) failed to index.
ERROR:langchain_community.vectorstores.elasticsearch:First error reason: [1:741] failed to parse field [metadata.updated_at] of type [date] in document with id '3258bfd6-b384-4ded-8dd7-00edd3f15f17'. Preview of field's value: ''


❌ Failed to index doc 24: {'source': '/content/temp.json', 'seq_num': 14, 'name': 'Updating Your Tax Elections Forms', 'summary': ': This guide gives a step-by-step explanation of how to update your TD1 Personal Tax Credits Return form. Access the form from the CRA website and choose the correct version based on your province or territory of residence. Download and open the form in Adobe Reader, fill out the form by entering', 'url': './github/Updating Your Tax Elections Forms.txt', 'category': 'github', 'updated_at': ''}
Error: 1 document(s) failed to index.


ERROR:langchain_community.vectorstores.elasticsearch:Error adding texts: 1 document(s) failed to index.
ERROR:langchain_community.vectorstores.elasticsearch:First error reason: [1:2925] failed to parse field [metadata.updated_at] of type [date] in document with id 'db715f08-94f4-43a9-85ea-b23715ca3bb8'. Preview of field's value: ''


❌ Failed to index doc 25: {'source': '/content/temp.json', 'seq_num': 15, 'name': 'New Employee Onboarding Guide', 'summary': '\nThis onboarding guide provides essential information to new employees on our company culture and values, key onboarding steps, tax elections and documents, benefits enrollment, and setting up their workspace.', 'url': './github/New Employee Onboarding guide.txt', 'category': 'github', 'updated_at': ''}
Error: 1 document(s) failed to index.


ERROR:langchain_community.vectorstores.elasticsearch:Error adding texts: 1 document(s) failed to index.
ERROR:langchain_community.vectorstores.elasticsearch:First error reason: [1:2115] failed to parse field [metadata.updated_at] of type [date] in document with id 'bbd5f815-03d3-41b4-ad09-81718b67c037'. Preview of field's value: ''


❌ Failed to index doc 26: {'source': '/content/temp.json', 'seq_num': 15, 'name': 'New Employee Onboarding Guide', 'summary': '\nThis onboarding guide provides essential information to new employees on our company culture and values, key onboarding steps, tax elections and documents, benefits enrollment, and setting up their workspace.', 'url': './github/New Employee Onboarding guide.txt', 'category': 'github', 'updated_at': ''}
Error: 1 document(s) failed to index.


# Question Answering with MultiQuery Retriever

Now that we have the passages stored in Elasticsearch, we can now ask a question to get the relevant passages.

In [16]:
from langchain.schema.runnable import RunnableParallel, RunnablePassthrough
from langchain.prompts import ChatPromptTemplate, PromptTemplate
from langchain.schema import format_document

import logging

logging.basicConfig()
logging.getLogger("langchain.retrievers.multi_query").setLevel(logging.INFO)

LLM_CONTEXT_PROMPT = ChatPromptTemplate.from_template(
    """You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Be as verbose and educational in your response as possible.

    context: {context}
    Question: "{question}"
    Answer:
    """
)

LLM_DOCUMENT_PROMPT = PromptTemplate.from_template(
    """
---
SOURCE: {name}
{page_content}
---
"""
)


def _combine_documents(
    docs, document_prompt=LLM_DOCUMENT_PROMPT, document_separator="\n\n"
):
    doc_strings = [format_document(doc, document_prompt) for doc in docs]
    return document_separator.join(doc_strings)


_context = RunnableParallel(
    context=retriever | _combine_documents,
    question=RunnablePassthrough(),
)

chain = _context | LLM_CONTEXT_PROMPT | llm

ans = chain.invoke("what is the nasa sales team?")

print("---- Answer ----")
print(ans)

INFO:langchain.retrievers.multi_query:Generated queries: ['1. Can you provide information on the sales team at NASA?', '2. How does the sales team operate within NASA?', '3. What are the responsibilities of the NASA sales team?']


---- Answer ----
I'm sorry, I don't know the answer to that question. However, based on the retrieved context, it seems that the sales team being referred to is the sales team of a tech company, not NASA. The context mentions conducting regular sales team meetings and quarterly reviews of the sales strategy, which are not typically associated with NASA. Additionally, the context mentions achieving growth and success in target markets, which is not a goal typically associated with NASA.


In [18]:
from langchain.schema.runnable import RunnableParallel, RunnablePassthrough
from langchain.prompts import ChatPromptTemplate, PromptTemplate
from langchain.schema import format_document

import logging

logging.basicConfig()
logging.getLogger("langchain.retrievers.multi_query").setLevel(logging.INFO)

LLM_CONTEXT_PROMPT = ChatPromptTemplate.from_template(
    """You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Be as verbose and educational in your response as possible.

    context: {context}
    Question: "{question}"
    Answer:
    """
)

LLM_DOCUMENT_PROMPT = PromptTemplate.from_template(
    """
---
SOURCE: {name}
{page_content}
---
"""
)


def _combine_documents(
    docs, document_prompt=LLM_DOCUMENT_PROMPT, document_separator="\n\n"
):
    doc_strings = [format_document(doc, document_prompt) for doc in docs]
    return document_separator.join(doc_strings)


_context = RunnableParallel(
    context=retriever | _combine_documents,
    question=RunnablePassthrough(),
)

chain = _context | LLM_CONTEXT_PROMPT | llm

ans = chain.invoke("When is it acceptable for an employee to use a sick day?")

print("---- Answer ----")
print(ans)

INFO:langchain.retrievers.multi_query:Generated queries: ['1. What are the guidelines for an employee to take a sick day?', '2. In what situations is it appropriate for an employee to use a sick day?', '3. Can you provide information on when an employee is allowed to use a sick day?']


---- Answer ----

It is acceptable for an employee to use a sick day when they are feeling unwell and unable to work. This could include physical illness, mental health concerns, or other personal reasons. Employees should follow the company's sick leave policy and communicate with their supervisor and HR department if they need to take a sick day. It is important for employees to prioritize their health and well-being, and the company encourages them to take necessary breaks and seek support when needed.


In [17]:
from langchain.schema.runnable import RunnableParallel, RunnablePassthrough
from langchain.prompts import ChatPromptTemplate, PromptTemplate
from langchain.schema import format_document

import logging

logging.basicConfig()
logging.getLogger("langchain.retrievers.multi_query").setLevel(logging.INFO)

LLM_CONTEXT_PROMPT = ChatPromptTemplate.from_template(
    """You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Be as verbose and educational in your response as possible.

    context: {context}
    Question: "{question}"
    Answer:
    """
)

LLM_DOCUMENT_PROMPT = PromptTemplate.from_template(
    """
---
SOURCE: {name}
{page_content}
---
"""
)


def _combine_documents(
    docs, document_prompt=LLM_DOCUMENT_PROMPT, document_separator="\n\n"
):
    doc_strings = [format_document(doc, document_prompt) for doc in docs]
    return document_separator.join(doc_strings)


_context = RunnableParallel(
    context=retriever | _combine_documents,
    question=RunnablePassthrough(),
)

chain = _context | LLM_CONTEXT_PROMPT | llm

ans = chain.invoke("are the benefits and drawbacks of working from home?")

print("---- Answer ----")
print(ans)

INFO:langchain.retrievers.multi_query:Generated queries: ['1. What are the pros and cons of remote work?', '2. Can you explain the advantages and disadvantages of working remotely?', '3. What are the positive and negative aspects of working from home?']


---- Answer ----

The benefits of working from home include increased flexibility, reduced commute time and costs, and the ability to create a comfortable and personalized workspace. It also allows for a better work-life balance and can improve overall well-being. However, there are also potential drawbacks, such as feelings of isolation, difficulty separating work and personal life, and potential distractions at home. It is important for employees to maintain effective communication and establish clear expectations and goals to ensure productivity and collaboration while working from home.


**Generate at least two new iteratioins of the previous cells - Be creative.** Did you master Multi-
Query Retriever concepts through this lab?

I understand the concept of the multi-query retriever which is that the questions create multiple level of vectors to retrieve similar data. By formulating multiple questions from the original question and rephrasing it or elaborating, the model is capable of finding more vectors of similarity. This is also done more efficiently through chunking the data. The retriever returns the best matching chunks per query, not entire documents. Each query can examine a chunk and determine if it is relevant.

