diff --git a/README.md b/README.md index 3d3516cc..af6ddc74 100644 --- a/README.md +++ b/README.md @@ -30,7 +30,9 @@ The [`notebooks`](notebooks/README.md) folder contains a range of executable Pyt ### LangChain - [`question-answering.ipynb`](./notebooks/generative-ai/question-answering.ipynb) -- [`langchain-self-query-retriever.ipynb`](./notebooks/langchain/langchain-self-query-retriever.ipynb) +- [`langchain-self-query-retriever.ipynb`](./notebooks/langchain/self-query-retriever-examples/langchain-self-query-retriever.ipynb) +- [`Question Answering with Self Query Retriever`](./notebooks/langchain/self-query-retriever-examples/chatbot-example.ipynb) +- [`BM25 and Self-querying retriever with elasticsearch and LangChain`](./notebooks/langchain/self-query-retriever-examples/chatbot-with-bm25-only-example.ipynb) - [`langchain-vector-store.ipynb`](./notebooks/langchain/langchain-vector-store.ipynb) - [`langchain-vector-store-using-elser.ipynb`](./notebooks/langchain/langchain-vector-store-using-elser.ipynb) - [`langchain-using-own-model.ipynb`](./notebooks/langchain/langchain-using-own-model.ipynb) diff --git a/notebooks/langchain/README.md b/notebooks/langchain/README.md deleted file mode 100644 index 5ed4e0f8..00000000 --- a/notebooks/langchain/README.md +++ /dev/null @@ -1,26 +0,0 @@ -# LangChain notebooks - -This folder contains notebooks that demonstrate how to use Elasticsearch with the LangChain framework for building applications powered by language models. - - \ No newline at end of file diff --git a/notebooks/langchain/self-query-retriever-examples/chatbot-example.ipynb b/notebooks/langchain/self-query-retriever-examples/chatbot-example.ipynb new file mode 100644 index 00000000..176f5d87 --- /dev/null +++ b/notebooks/langchain/self-query-retriever-examples/chatbot-example.ipynb @@ -0,0 +1,289 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Chatbot Example with Self Query Retriever\n", + "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/elastic/elasticsearch-labs/blob/main/notebooks/langchain/self-query-retriever-examples/chatbot-example.ipynb)\n", + "\n", + "This workbook demonstrates example of Elasticsearch's [Self-query retriever](https://api.python.langchain.com/en/latest/retrievers/langchain.retrievers.self_query.base.SelfQueryRetriever.html) to convert a question into a structured query and apply structured query to Elasticsearch index. \n", + "\n", + "Before we begin, we first split the documents into chunks with `langchain` and then using [`ElasticsearchStore.from_documents`](https://api.python.langchain.com/en/latest/vectorstores/langchain.vectorstores.elasticsearch.ElasticsearchStore.html#langchain.vectorstores.elasticsearch.ElasticsearchStore.from_documents), we create a `vectorstore` and index data to elasticsearch.\n", + "\n", + "\n", + "We will then see few examples query demonstrating full power of elasticsearch powered self-query retriever.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Install packages and import modules\n" + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.2\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m23.3.1\u001b[0m\n", + "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n" + ] + } + ], + "source": [ + "!python3 -m pip install -qU lark elasticsearch langchain openai\n", + "\n", + "from langchain.schema import Document\n", + "from langchain.embeddings.openai import OpenAIEmbeddings\n", + "from langchain.vectorstores import ElasticsearchStore\n", + "from langchain.llms import OpenAI\n", + "from langchain.retrievers.self_query.base import SelfQueryRetriever\n", + "from langchain.chains.query_constructor.base import AttributeInfo\n", + "from getpass import getpass" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create documents \n", + "Next, we will create list of documents with summary of movies using [langchain Schema Document](https://api.python.langchain.com/en/latest/schema/langchain.schema.document.Document.html), containing each document's `page_content` and `metadata` .\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 67, + "metadata": {}, + "outputs": [], + "source": [ + "docs = [\n", + " Document(\n", + " page_content=\"A bunch of scientists bring back dinosaurs and mayhem breaks loose\",\n", + " metadata={\"year\": 1993, \"rating\": 7.7, \"genre\": \"science fiction\", \"director\": \"Steven Spielberg\", \"title\": \"Jurassic Park\"},\n", + " ),\n", + " Document(\n", + " page_content=\"Leo DiCaprio gets lost in a dream within a dream within a dream within a ...\",\n", + " metadata={\"year\": 2010, \"director\": \"Christopher Nolan\", \"rating\": 8.2, \"title\": \"Inception\"},\n", + " ),\n", + " Document(\n", + " page_content=\"A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea\",\n", + " metadata={\"year\": 2006, \"director\": \"Satoshi Kon\", \"rating\": 8.6, \"title\": \"Paprika\"},\n", + " ),\n", + " Document(\n", + " page_content=\"A bunch of normal-sized women are supremely wholesome and some men pine after them\",\n", + " metadata={\"year\": 2019, \"director\": \"Greta Gerwig\", \"rating\": 8.3, \"title\": \"Little Women\"},\n", + " ),\n", + " Document(\n", + " page_content=\"Toys come alive and have a blast doing so\",\n", + " metadata={\"year\": 1995, \"genre\": \"animated\", \"director\": \"John Lasseter\", \"rating\": 8.3, \"title\": \"Toy Story\"},\n", + " ),\n", + " Document(\n", + " page_content=\"Three men walk into the Zone, three men walk out of the Zone\",\n", + " metadata={\n", + " \"year\": 1979,\n", + " \"rating\": 9.9,\n", + " \"director\": \"Andrei Tarkovsky\",\n", + " \"genre\": \"science fiction\",\n", + " \"rating\": 9.9,\n", + " \"title\": \"Stalker\",\n", + " },\n", + " ),\n", + "]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Connect to Elasticsearch\n", + "\n", + "ℹ️ We're using an Elastic Cloud deployment of Elasticsearch for this notebook. If you don't have an Elastic Cloud deployment, sign up [here](https://cloud.elastic.co/registration?utm_source=github&utm_content=elasticsearch-labs-notebook) for a free trial. \n", + "\n", + "We'll use the **Cloud ID** to identify our deployment, because we are using Elastic Cloud deployment. To find the Cloud ID for your deployment, go to https://cloud.elastic.co/deployments and select your deployment.\n", + "\n", + "\n", + "We will use [ElasticsearchStore](https://api.python.langchain.com/en/latest/vectorstores/langchain.vectorstores.elasticsearch.ElasticsearchStore.html) to connect to our elastic cloud deployment, This would help create and index data easily. We would also send list of documents that we created in the previous step." + ] + }, + { + "cell_type": "code", + "execution_count": 68, + "metadata": {}, + "outputs": [], + "source": [ + "# https://www.elastic.co/search-labs/tutorials/install-elasticsearch/elastic-cloud#finding-your-cloud-id\n", + "ELASTIC_CLOUD_ID = getpass(\"Elastic Cloud ID: \")\n", + "\n", + "# https://www.elastic.co/search-labs/tutorials/install-elasticsearch/elastic-cloud#creating-an-api-key\n", + "ELASTIC_API_KEY = getpass(\"Elastic Api Key: \")\n", + "\n", + "# https://platform.openai.com/api-keys\n", + "OPENAI_API_KEY = getpass(\"OpenAI API key: \")\n", + "\n", + "embeddings = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY)\n", + "\n", + "\n", + "vectorstore = ElasticsearchStore.from_documents(\n", + " docs, \n", + " embeddings, \n", + " index_name=\"elasticsearch-self-query-demo\", \n", + " es_cloud_id=ELASTIC_CLOUD_ID, \n", + " es_api_key=ELASTIC_API_KEY\n", + ")\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Setup query retriever\n", + "\n", + "Next we will instantiate self-query retriever by providing a bit information about our document attributes and a short description about the document. \n", + "\n", + "We will then instantiate retriever with [SelfQueryRetriever.from_llm](https://api.python.langchain.com/en/latest/retrievers/langchain.retrievers.self_query.base.SelfQueryRetriever.html)" + ] + }, + { + "cell_type": "code", + "execution_count": 80, + "metadata": {}, + "outputs": [], + "source": [ + "# Add details about metadata fields\n", + "metadata_field_info = [\n", + " AttributeInfo(\n", + " name=\"genre\",\n", + " description=\"The genre of the movie. Can be either 'science fiction' or 'animated'.\",\n", + " type=\"string or list[string]\",\n", + " ),\n", + " AttributeInfo(\n", + " name=\"year\",\n", + " description=\"The year the movie was released\",\n", + " type=\"integer\",\n", + " ),\n", + " AttributeInfo(\n", + " name=\"director\",\n", + " description=\"The name of the movie director\",\n", + " type=\"string\",\n", + " ),\n", + " AttributeInfo(\n", + " name=\"rating\", description=\"A 1-10 rating for the movie\", type=\"float\"\n", + " ),\n", + "]\n", + "\n", + "document_content_description = \"Brief summary of a movie\"\n", + "\n", + "# Set up openAI llm with sampling temperature 0\n", + "llm = OpenAI(temperature=0, openai_api_key=OPENAI_API_KEY)\n", + "\n", + "# instantiate retriever\n", + "retriever = SelfQueryRetriever.from_llm(\n", + " llm, vectorstore, document_content_description, metadata_field_info, verbose=True\n", + ")\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Question Answering with Self-Query Retriever\n", + "\n", + "We will now demonstrate how to use self-query retriever for RAG." + ] + }, + { + "cell_type": "code", + "execution_count": 77, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "AIMessage(content='Inception (2010)')" + ] + }, + "execution_count": 77, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from langchain.chat_models import ChatOpenAI\n", + "from langchain.schema.runnable import RunnableParallel, RunnablePassthrough\n", + "from langchain.prompts import ChatPromptTemplate, PromptTemplate\n", + "from langchain.schema import format_document\n", + "\n", + "LLM_CONTEXT_PROMPT = ChatPromptTemplate.from_template(\"\"\"\n", + "Use the following context movies that matched the user question. Use the movies below only to answer the user's question.\n", + "\n", + "If you don't know the answer, just say that you don't know, don't try to make up an answer.\n", + "\n", + "----\n", + "{context}\n", + "----\n", + "Question: {question}\n", + "Answer:\n", + "\"\"\")\n", + "\n", + "DOCUMENT_PROMPT = PromptTemplate.from_template(\"\"\"\n", + "---\n", + "title: {title} \n", + "year: {year} \n", + "director: {director} \n", + "---\n", + "\"\"\")\n", + "\n", + "def _combine_documents(\n", + " docs, document_prompt=DOCUMENT_PROMPT, document_separator=\"\\n\\n\"\n", + "):\n", + " doc_strings = [format_document(doc, document_prompt) for doc in docs]\n", + " return document_separator.join(doc_strings)\n", + "\n", + "\n", + "_context = RunnableParallel(\n", + " context=retriever | _combine_documents,\n", + " question=RunnablePassthrough(),\n", + ")\n", + "\n", + "chain = (_context | LLM_CONTEXT_PROMPT | llm)\n", + "\n", + "chain.invoke(\"What movies are about dreams and was released after the year 1992 but before 2007?\")" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3.11.4 64-bit", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.3" + }, + "orig_nbformat": 4, + "vscode": { + "interpreter": { + "hash": "b0fa6594d8f4cbf19f97940f81e996739fb7646882a419484c72d19e05852a7e" + } + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/notebooks/langchain/self-query-retriever-examples/chatbot-with-bm25-only-example.ipynb b/notebooks/langchain/self-query-retriever-examples/chatbot-with-bm25-only-example.ipynb new file mode 100644 index 00000000..5ca8cf9b --- /dev/null +++ b/notebooks/langchain/self-query-retriever-examples/chatbot-with-bm25-only-example.ipynb @@ -0,0 +1,427 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# BM25 and Self-querying retriever with elasticsearch and LangChain\n", + "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/elastic/elasticsearch-labs/blob/main/notebooks/langchain/notebooks/langchain/self-query-retriever-examples/chatbot-with-bm25-only-example.ipynb)\n", + "\n", + "This workbook demonstrates example of Elasticsearch's [Self-query retriever](https://api.python.langchain.com/en/latest/retrievers/langchain.retrievers.self_query.base.SelfQueryRetriever.html) to convert unstructured query into a structured query and we use this for a BM25 example. \n", + "\n", + "In this example:\n", + "- we are going to ingest a sample dataset of movies outside of LangChain\n", + "- Customise the retrieval strategy in ElasticsearchStore to use just BM25\n", + "- use the self-query retrieval to transform question into a structured query\n", + "- Use the documents and RAG strategy to answer the question " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Install packages\n" + ] + }, + { + "cell_type": "code", + "execution_count": 58, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.2\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m23.3.1\u001b[0m\n", + "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n" + ] + } + ], + "source": [ + "!python3 -m pip install -qU lark elasticsearch langchain openai" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Sample Dataset" + ] + }, + { + "cell_type": "code", + "execution_count": 59, + "metadata": {}, + "outputs": [], + "source": [ + "docs = [\n", + " {\n", + " \"text\": \"A bunch of scientists bring back dinosaurs and mayhem breaks loose\",\n", + " \"metadata\": {\"year\": 1993, \"rating\": 7.7, \"genre\": \"science fiction\", \"director\": \"Steven Spielberg\", \"title\": \"Jurassic Park\"},\n", + " },\n", + " {\n", + " \"text\": \"Leo DiCaprio gets lost in a dream within a dream within a dream within a ...\",\n", + " \"metadata\": {\"year\": 2010, \"director\": \"Christopher Nolan\", \"rating\": 8.2, \"title\": \"Inception\"},\n", + " },\n", + " {\n", + " \"text\": \"A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea\",\n", + " \"metadata\": {\"year\": 2006, \"director\": \"Satoshi Kon\", \"rating\": 8.6, \"title\": \"Paprika\"},\n", + " },\n", + " {\n", + " \"text\":\"A bunch of normal-sized women are supremely wholesome and some men pine after them\",\n", + " \"metadata\":{\"year\": 2019, \"director\": \"Greta Gerwig\", \"rating\": 8.3, \"title\": \"Little Women\"},\n", + " },\n", + " {\n", + " \"text\":\"Toys come alive and have a blast doing so\",\n", + " \"metadata\":{\"year\": 1995, \"genre\": \"animated\", \"director\": \"John Lasseter\", \"rating\": 8.3, \"title\": \"Toy Story\"},\n", + " },\n", + " {\n", + " \"text\":\"Three men walk into the Zone, three men walk out of the Zone\",\n", + " \"metadata\":{\n", + " \"year\": 1979,\n", + " \"rating\": 9.9,\n", + " \"director\": \"Andrei Tarkovsky\",\n", + " \"genre\": \"science fiction\",\n", + " \"rating\": 9.9,\n", + " \"title\": \"Stalker\",\n", + " }\n", + " }\n", + "]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Connect to Elasticsearch\n", + "\n", + "ℹ️ We're using an Elastic Cloud deployment of Elasticsearch for this notebook. If you don't have an Elastic Cloud deployment, sign up [here](https://cloud.elastic.co/registration?utm_source=github&utm_content=elasticsearch-labs-notebook) for a free trial. \n", + "\n", + "We'll use the **Cloud ID** to identify our deployment, because we are using Elastic Cloud deployment. To find the Cloud ID for your deployment, go to https://cloud.elastic.co/deployments and select your deployment.\n", + "\n", + "\n", + "We will use [ElasticsearchStore](https://api.python.langchain.com/en/latest/vectorstores/langchain.vectorstores.elasticsearch.ElasticsearchStore.html) to connect to our elastic cloud deployment, This would help create and index data easily. We would also send list of documents that we created in the previous step." + ] + }, + { + "cell_type": "code", + "execution_count": 60, + "metadata": {}, + "outputs": [], + "source": [ + "from elasticsearch import Elasticsearch\n", + "from getpass import getpass\n", + "\n", + "# https://www.elastic.co/search-labs/tutorials/install-elasticsearch/elastic-cloud#finding-your-cloud-id\n", + "ELASTIC_CLOUD_ID = getpass(\"Elastic Cloud ID: \")\n", + "\n", + "# https://www.elastic.co/search-labs/tutorials/install-elasticsearch/elastic-cloud#creating-an-api-key\n", + "ELASTIC_API_KEY = getpass(\"Elastic Api Key: \")\n", + "\n", + "# https://platform.openai.com/api-keys\n", + "OPENAI_API_KEY = getpass(\"OpenAI API key: \")\n", + "\n", + "client = Elasticsearch(\n", + " cloud_id=ELASTIC_CLOUD_ID,\n", + " api_key=ELASTIC_API_KEY,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Indexing data into Elasticsearch\n", + "\n", + "We have chosen to index the data outside of Langchain to demonstrate how its possible to use Langchain for RAG and use the self-query retrieveral on any Elasticsearch index." + ] + }, + { + "cell_type": "code", + "execution_count": 61, + "metadata": {}, + "outputs": [], + "source": [ + "from elasticsearch import helpers\n", + "\n", + "# create the index\n", + "client.indices.create(index=\"movies_self_query\")\n", + "\n", + "operations = [\n", + " {\n", + " \"_index\": \"movies_self_query\",\n", + " \"_id\": i,\n", + " \"text\": doc[\"text\"],\n", + " \"metadata\": doc[\"metadata\"]\n", + " } for i, doc in enumerate(docs)\n", + "]\n", + "\n", + "# Add the documents to the index directly\n", + "response = helpers.bulk(\n", + " client,\n", + " operations,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Setup query retriever\n", + "\n", + "Next we will instantiate self-query retriever by providing a bit information about our document attributes and a short description about the document. \n", + "\n", + "We will then instantiate retriever with [SelfQueryRetriever.from_llm](https://api.python.langchain.com/en/latest/retrievers/langchain.retrievers.self_query.base.SelfQueryRetriever.html)" + ] + }, + { + "cell_type": "code", + "execution_count": 62, + "metadata": {}, + "outputs": [], + "source": [ + "from langchain.vectorstores.elasticsearch import ApproxRetrievalStrategy\n", + "from typing import List, Union\n", + "from langchain.retrievers.self_query.base import SelfQueryRetriever\n", + "from langchain.chains.query_constructor.base import AttributeInfo\n", + "from langchain.llms import OpenAI\n", + "from langchain.vectorstores.elasticsearch import ElasticsearchStore\n", + "\n", + "# Add details about metadata fields\n", + "metadata_field_info = [\n", + " AttributeInfo(\n", + " name=\"genre\",\n", + " description=\"The genre of the movie. Can be either 'science fiction' or 'animated'.\",\n", + " type=\"string or list[string]\",\n", + " ),\n", + " AttributeInfo(\n", + " name=\"year\",\n", + " description=\"The year the movie was released\",\n", + " type=\"integer\",\n", + " ),\n", + " AttributeInfo(\n", + " name=\"director\",\n", + " description=\"The name of the movie director\",\n", + " type=\"string\",\n", + " ),\n", + " AttributeInfo(\n", + " name=\"rating\", description=\"A 1-10 rating for the movie\", type=\"float\"\n", + " ),\n", + "]\n", + "\n", + "document_content_description = \"Brief summary of a movie\"\n", + "\n", + "# Set up openAI llm with sampling temperature 0\n", + "llm = OpenAI(temperature=0, openai_api_key=OPENAI_API_KEY)\n", + "\n", + "class BM25RetrievalStrategy(ApproxRetrievalStrategy):\n", + "\n", + " def __init__(\n", + " self\n", + " ):\n", + " pass\n", + "\n", + " def query(\n", + " self,\n", + " query: Union[str, None],\n", + " filter: List[dict],\n", + " **kwargs,\n", + " ):\n", + " \n", + " if query:\n", + " query_clause = [{\n", + " \"multi_match\": {\n", + " \"query\": query,\n", + " \"fields\": [\"text\"],\n", + " \"fuzziness\": \"AUTO\",\n", + " }\n", + " }]\n", + " else:\n", + " query_clause = []\n", + "\n", + "\n", + " bm25_query = {\n", + " \"query\": {\n", + " \"bool\": {\n", + " \"filter\": filter,\n", + " \"must\": query_clause\n", + " }\n", + " },\n", + " }\n", + "\n", + " print(\"query\", bm25_query)\n", + "\n", + " return bm25_query\n", + "\n", + "\n", + "vectorstore = ElasticsearchStore(\n", + " index_name=\"movies_self_query\",\n", + " es_connection=client,\n", + " strategy=BM25RetrievalStrategy()\n", + ")\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## BM25 Only Retriever \n", + "One option is to customise the query to use BM25 only retrieval method. We can do this by overriding the `custom_query` function, specifying the query to use only `multi_match`.\n", + "\n", + "In the example below, the self-query retriever is using the LLM to transform the question into a keyword and filter query (query: dreams, filter: year range). The custom query is then used to perform a BM25 based query on the keyword query and filter query.\n", + "\n", + "This means that you dont have to vectorise all the documents if you want to perform a question / answerinf use-case on an existing Elasticsearch index. " + ] + }, + { + "cell_type": "code", + "execution_count": 63, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "query {'query': {'bool': {'filter': [{'bool': {'must': [{'match': {'metadata.genre': {'query': 'science fiction'}}}, {'range': {'metadata.year': {'gt': 1992}}}, {'range': {'metadata.year': {'lt': 2007}}}]}}], 'must': [{'multi_match': {'query': 'dinosaur', 'fields': ['text'], 'fuzziness': 'AUTO'}}]}}}\n", + "docs: [Document(page_content='A bunch of scientists bring back dinosaurs and mayhem breaks loose', metadata={'year': 1993, 'rating': 7.7, 'genre': 'science fiction', 'director': 'Steven Spielberg', 'title': 'Jurassic Park'})]\n" + ] + }, + { + "data": { + "text/plain": [ + "'Steven Spielberg directed Jurassic Park in 1993.'" + ] + }, + "execution_count": 63, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from langchain.schema.runnable import RunnableParallel, RunnablePassthrough\n", + "from langchain.prompts import ChatPromptTemplate, PromptTemplate\n", + "from langchain.schema import format_document\n", + "\n", + "def custom_query(query_body, query):\n", + " filters = query_body.get(\"knn\", {}).get(\"filter\", [])\n", + " \n", + " print(f\"filters: {filters}\")\n", + " print(f\"query: {query}\")\n", + "\n", + " if query.strip() != \"\":\n", + " query_clause = [{\n", + " \"multi_match\": {\n", + " \"query\": query,\n", + " \"fields\": [\"text\"],\n", + " \"fuzziness\": \"AUTO\",\n", + " }\n", + " }]\n", + " else:\n", + " query_clause = []\n", + "\n", + "\n", + " return {\n", + " \"query\": {\n", + " \"bool\": {\n", + " \"filter\": filters,\n", + " \"must\": query_clause\n", + " }\n", + " },\n", + " }\n", + "\n", + "retriever = SelfQueryRetriever.from_llm(\n", + " llm, \n", + " vectorstore, \n", + " document_content_description, \n", + " metadata_field_info, \n", + " verbose=True\n", + ")\n", + "\n", + "LLM_CONTEXT_PROMPT = ChatPromptTemplate.from_template(\"\"\"\n", + "Use the following context movies that matched the user question. Use the movies below only to answer the user's question.\n", + "\n", + "If you don't know the answer, just say that you don't know, don't try to make up an answer.\n", + "\n", + "----\n", + "{context}\n", + "----\n", + "Question: {question}\n", + "Answer:\n", + "\"\"\")\n", + "\n", + "DOCUMENT_PROMPT = PromptTemplate.from_template(\"\"\"\n", + "---\n", + "title: {title} \n", + "year: {year} \n", + "director: {director} \n", + "---\n", + "\"\"\")\n", + "\n", + "def _combine_documents(\n", + " docs, document_prompt=DOCUMENT_PROMPT, document_separator=\"\\n\\n\"\n", + "):\n", + " print(\"docs:\", docs)\n", + " doc_strings = [format_document(doc, document_prompt) for doc in docs]\n", + " return document_separator.join(doc_strings)\n", + "\n", + "\n", + "_context = RunnableParallel(\n", + " context=retriever | _combine_documents,\n", + " question=RunnablePassthrough(),\n", + ")\n", + "\n", + "chain = (_context | LLM_CONTEXT_PROMPT | llm)\n", + "\n", + "chain.invoke(\"Which director directed movies about dinosaurs that was released after the year 1992 but before 2007?\")" + ] + }, + { + "cell_type": "code", + "execution_count": 64, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "ObjectApiResponse({'acknowledged': True})" + ] + }, + "execution_count": 64, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "client.indices.delete(index=\"movies_self_query\")" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3.11.4 64-bit", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.3" + }, + "orig_nbformat": 4, + "vscode": { + "interpreter": { + "hash": "b0fa6594d8f4cbf19f97940f81e996739fb7646882a419484c72d19e05852a7e" + } + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/notebooks/langchain/langchain-self-query-retriever.ipynb b/notebooks/langchain/self-query-retriever-examples/langchain-self-query-retriever.ipynb similarity index 92% rename from notebooks/langchain/langchain-self-query-retriever.ipynb rename to notebooks/langchain/self-query-retriever-examples/langchain-self-query-retriever.ipynb index a810c267..37c027a7 100644 --- a/notebooks/langchain/langchain-self-query-retriever.ipynb +++ b/notebooks/langchain/self-query-retriever-examples/langchain-self-query-retriever.ipynb @@ -5,7 +5,7 @@ "metadata": {}, "source": [ "# Self-querying retriever with elasticsearch and langchain\n", - "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/elastic/elasticsearch-labs/blob/main/notebooks/langchain/langchain-self-query-retriever.ipynb)\n", + "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/elastic/elasticsearch-labs/blob/main/notebooks/langchain/notebooks/langchain/self-query-retriever-examples/langchain-self-query-retriever.ipynb)\n", "\n", "This workbook demonstrates example of Elasticsearch's [Self-query retriever](https://api.python.langchain.com/en/latest/retrievers/langchain.retrievers.self_query.base.SelfQueryRetriever.html) to convert unstructured query into a structured query and apply structured query to a vectorstore. \n", "\n", @@ -24,7 +24,7 @@ }, { "cell_type": "code", - "execution_count": 1, + "execution_count": 30, "metadata": {}, "outputs": [ { @@ -60,30 +60,30 @@ }, { "cell_type": "code", - "execution_count": 2, + "execution_count": 67, "metadata": {}, "outputs": [], "source": [ "docs = [\n", " Document(\n", " page_content=\"A bunch of scientists bring back dinosaurs and mayhem breaks loose\",\n", - " metadata={\"year\": 1993, \"rating\": 7.7, \"genre\": \"science fiction\"},\n", + " metadata={\"year\": 1993, \"rating\": 7.7, \"genre\": \"science fiction\", \"director\": \"Steven Spielberg\", \"title\": \"Jurassic Park\"},\n", " ),\n", " Document(\n", " page_content=\"Leo DiCaprio gets lost in a dream within a dream within a dream within a ...\",\n", - " metadata={\"year\": 2010, \"director\": \"Christopher Nolan\", \"rating\": 8.2},\n", + " metadata={\"year\": 2010, \"director\": \"Christopher Nolan\", \"rating\": 8.2, \"title\": \"Inception\"},\n", " ),\n", " Document(\n", " page_content=\"A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea\",\n", - " metadata={\"year\": 2006, \"director\": \"Satoshi Kon\", \"rating\": 8.6},\n", + " metadata={\"year\": 2006, \"director\": \"Satoshi Kon\", \"rating\": 8.6, \"title\": \"Paprika\"},\n", " ),\n", " Document(\n", " page_content=\"A bunch of normal-sized women are supremely wholesome and some men pine after them\",\n", - " metadata={\"year\": 2019, \"director\": \"Greta Gerwig\", \"rating\": 8.3},\n", + " metadata={\"year\": 2019, \"director\": \"Greta Gerwig\", \"rating\": 8.3, \"title\": \"Little Women\"},\n", " ),\n", " Document(\n", " page_content=\"Toys come alive and have a blast doing so\",\n", - " metadata={\"year\": 1995, \"genre\": \"animated\"},\n", + " metadata={\"year\": 1995, \"genre\": \"animated\", \"director\": \"John Lasseter\", \"rating\": 8.3, \"title\": \"Toy Story\"},\n", " ),\n", " Document(\n", " page_content=\"Three men walk into the Zone, three men walk out of the Zone\",\n", @@ -93,6 +93,7 @@ " \"director\": \"Andrei Tarkovsky\",\n", " \"genre\": \"science fiction\",\n", " \"rating\": 9.9,\n", + " \"title\": \"Stalker\",\n", " },\n", " ),\n", "]" @@ -114,7 +115,7 @@ }, { "cell_type": "code", - "execution_count": 6, + "execution_count": 68, "metadata": {}, "outputs": [], "source": [ @@ -152,7 +153,7 @@ }, { "cell_type": "code", - "execution_count": 9, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -160,7 +161,7 @@ "metadata_field_info = [\n", " AttributeInfo(\n", " name=\"genre\",\n", - " description=\"The genre of the movie\",\n", + " description=\"The genre of the movie. Can be either 'science fiction' or 'animated'.\",\n", " type=\"string or list[string]\",\n", " ),\n", " AttributeInfo(\n", @@ -202,7 +203,7 @@ }, { "cell_type": "code", - "execution_count": 10, + "execution_count": 34, "metadata": {}, "outputs": [ { @@ -214,7 +215,7 @@ " Document(page_content='A bunch of normal-sized women are supremely wholesome and some men pine after them', metadata={'year': 2019, 'director': 'Greta Gerwig', 'rating': 8.3})]" ] }, - "execution_count": 10, + "execution_count": 34, "metadata": {}, "output_type": "execute_result" } @@ -237,7 +238,7 @@ }, { "cell_type": "code", - "execution_count": 11, + "execution_count": 35, "metadata": {}, "outputs": [ { @@ -246,7 +247,7 @@ "[Document(page_content='Three men walk into the Zone, three men walk out of the Zone', metadata={'year': 1979, 'rating': 9.9, 'director': 'Andrei Tarkovsky', 'genre': 'science fiction'})]" ] }, - "execution_count": 11, + "execution_count": 35, "metadata": {}, "output_type": "execute_result" } @@ -268,7 +269,7 @@ }, { "cell_type": "code", - "execution_count": 12, + "execution_count": 36, "metadata": {}, "outputs": [], "source": [ @@ -295,7 +296,7 @@ }, { "cell_type": "code", - "execution_count": 13, + "execution_count": 37, "metadata": {}, "outputs": [ { @@ -305,7 +306,7 @@ " Document(page_content='A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea', metadata={'year': 2006, 'director': 'Satoshi Kon', 'rating': 8.6})]" ] }, - "execution_count": 13, + "execution_count": 37, "metadata": {}, "output_type": "execute_result" } @@ -328,7 +329,7 @@ }, { "cell_type": "code", - "execution_count": 14, + "execution_count": 38, "metadata": {}, "outputs": [ { @@ -337,7 +338,7 @@ "[Document(page_content='A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea', metadata={'year': 2006, 'director': 'Satoshi Kon', 'rating': 8.6})]" ] }, - "execution_count": 14, + "execution_count": 38, "metadata": {}, "output_type": "execute_result" }