From 51866a9872f4c49140246fe1e8ba0e2d8295427c Mon Sep 17 00:00:00 2001 From: Craig Chi Date: Fri, 16 Feb 2024 09:41:02 -0800 Subject: [PATCH] feat: add notebooks for memory, document loader, and vector store --- docs/chat_message_history.ipynb | 259 +++++++---- docs/document_loader.ipynb | 431 +++++++++++------- docs/vector_store.ipynb | 751 +++++++++++++++++--------------- 3 files changed, 841 insertions(+), 600 deletions(-) diff --git a/docs/chat_message_history.ipynb b/docs/chat_message_history.ipynb index 8b1a4cf..914af7e 100644 --- a/docs/chat_message_history.ipynb +++ b/docs/chat_message_history.ipynb @@ -1,79 +1,184 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Google DATABASE\n", - "\n", - "[Google DATABASE](https://cloud.google.com/DATABASE).\n", - "\n", - "Save chat messages into `DATABASE`." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Pre-reqs" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [] - }, - "outputs": [], - "source": [ - "%pip install PACKAGE_NAME" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": { - "tags": [] - }, - "outputs": [], - "source": [ - "from PACKAGE import LOADER" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Basic Usage" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.10.6" - } + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "6-0_o3DxsFGi" + }, + "source": [ + "Google Database\n", + "\n", + "Use [Google Memorystore for Redis](https://cloud.google.com/memorystore/docs/redis/memorystore-for-redis-overview) to store chat message history for LangChain." + ] }, - "nbformat": 4, - "nbformat_minor": 4 -} + { + "cell_type": "markdown", + "metadata": { + "id": "dWakBoPnsFGj" + }, + "source": [ + "## Pre-reqs" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "EudfLv_UsFGk" + }, + "source": [ + "### Setting Up a Memorystore for Redis Instance\n", + "\n", + "Before proceeding, an active Memorystore for Redis instance is needed to store chat message history:\n", + "\n", + "* Create a Memorystore for Reids Instance (>= 5.0): If an instance doesn't exist, follow the instructions at https://cloud.google.com/memorystore/docs/redis/create-instance-console to create a new one. Ensure the version is greater than or equal to 5.0.\n", + "* Obtain Endpoint: Note the endpoint associated with the instance." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "J5nxjYxHsFGk" + }, + "source": [ + "### Installing the LangChain Memorystore for Redis Module\n", + "\n", + "Interaction with the Memorystore for Redis instance from LangChain requires installing the necessary module:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [], + "id": "iLwVMVkYsFGk" + }, + "outputs": [], + "source": [ + "# Install Memorystore for Redis for LangChain module\n", + "%pip install langchain_google_memorystore_redis" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "2L7kMu__sFGl" + }, + "source": [ + "## Basic Usage" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "A2fT1iEhsFGl" + }, + "source": [ + "### Initialize a MemorystoreChatMessageHistory\n", + "\n", + "Each chat message history object must have a unique session ID. If the session ID already has messages stored in Redis, they will can be retrieved." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "YEDKWR6asFGl" + }, + "outputs": [], + "source": [ + "import redis\n", + "from langchain_google_memorystore_redis import MemorystoreChatMessageHistory\n", + "\n", + "# Connect to a Memorystore for Redis instance\n", + "redis_client = redis.from_url(\"redis://127.0.0.1:6379\")\n", + "\n", + "message_history = MemorystoreChatMessageHistory(redis_client, session_id='session1')" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "EmoJcTgosFGl" + }, + "source": [ + "### Add Messages" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "gB1PGe6wsFGm" + }, + "outputs": [], + "source": [ + "message_history.add_ai_message('Hey! I am AI!')\n", + "message_history.add_user_message('Hey! I am human!')" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "02xxvmzTsFGm" + }, + "source": [ + "### Retrieve All Messages Stored in the Session" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "BvS3UFsysFGm" + }, + "outputs": [], + "source": [ + "message_history.messages" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "sFJdt3ubsFGo" + }, + "source": [ + "### Clear Messages" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "H5I7K3MTsFGo" + }, + "outputs": [], + "source": [ + "message_history.clear()" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.6" + }, + "colab": { + "provenance": [] + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} \ No newline at end of file diff --git a/docs/document_loader.ipynb b/docs/document_loader.ipynb index 5f48d1d..6462ae5 100644 --- a/docs/document_loader.ipynb +++ b/docs/document_loader.ipynb @@ -1,172 +1,265 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Google DATABASE\n", - "\n", - "[Google DATABASE](https://cloud.google.com/DATABASE).\n", - "\n", - "Load documents from `DATABASE`." - ] + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "6-0_o3DxsFGi" + }, + "source": [ + "Google Database\n", + "\n", + "Use [Google Memorystore for Redis](https://cloud.google.com/memorystore/docs/redis/memorystore-for-redis-overview) to store Documents for LangChain." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dWakBoPnsFGj" + }, + "source": [ + "## Pre-reqs" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "EudfLv_UsFGk" + }, + "source": [ + "### Setting Up a Memorystore for Redis Instance\n", + "\n", + "Before proceeding, an active Memorystore for Redis instance is needed to store chat message history:\n", + "\n", + "* Create a Memorystore for Reids Instance (>= 5.0): If an instance doesn't exist, follow the instructions at https://cloud.google.com/memorystore/docs/redis/create-instance-console to create a new one. Ensure the version is greater than or equal to 5.0.\n", + "* Obtain Endpoint: Note the endpoint associated with the instance." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "J5nxjYxHsFGk" + }, + "source": [ + "### Installing the LangChain Memorystore for Redis Module\n", + "\n", + "Interaction with the Memorystore for Redis instance from LangChain requires installing the necessary module:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [], + "id": "iLwVMVkYsFGk" + }, + "outputs": [], + "source": [ + "# Install Memorystore for Redis for LangChain module\n", + "%pip install langchain_google_memorystore_redis" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "2L7kMu__sFGl" + }, + "source": [ + "## Basic Usage" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "A2fT1iEhsFGl" + }, + "source": [ + "### Initialize a MemorystoreDocumentLoader\n", + "\n", + "Initialize a loader that loads all documents stored in the Memorystore for Redis instance with a specific prefix." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "YEDKWR6asFGl" + }, + "outputs": [], + "source": [ + "import redis\n", + "from langchain_google_memorystore_redis import MemorystoreDocumentLoader\n", + "\n", + "# Connect to a Memorystore for Redis instance\n", + "redis_client = redis.from_url(\"redis://127.0.0.1:6379\")\n", + "prefix = \"doc:\"\n", + "\n", + "loader = MemorystoreDocumentLoader(\n", + " client=redis_client,\n", + " key_prefix=prefix,\n", + " content_fields=set([\"page_content\"]),\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "EmoJcTgosFGl" + }, + "source": [ + "### Load Documents\n", + "\n", + "Load all documents stored in the Memorystore for Redis instance at once." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "gB1PGe6wsFGm" + }, + "outputs": [], + "source": [ + "documents = loader.load()" + ] + }, + { + "cell_type": "markdown", + "source": [ + "### Lazy Load Documents\n", + "\n", + "Load the document one-by-one with lazy_load generator." + ], + "metadata": { + "id": "Vbs8gIa24YvJ" + } + }, + { + "cell_type": "code", + "source": [ + "for document in loader.lazy_load():\n", + " # Do something\n", + " print(document)" + ], + "metadata": { + "id": "nPhpvLtA4kBM" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "02xxvmzTsFGm" + }, + "source": [ + "## Customize Document Page Content & Metadata\n", + "\n", + "When initializing a loader with more than 1 content field, the `page_content` of the loaded Documents will contain a JSON-encoded string with top level fields equal to the specified fields in `content_fields`.\n", + "\n", + "If the `metadata_fields` are specified, the `metadata` field of the loaded Documents will only have the top level fields equal to the specified `metadata_fields`. If any of the values of the metadata fields is stored as a JSON-encoded string, it will be decoded prior to being loaded to the metadata fields." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "BvS3UFsysFGm" + }, + "outputs": [], + "source": [ + "loader = MemorystoreDocumentLoader(\n", + " client=redis_client,\n", + " key_prefix=prefix,\n", + " content_fields=set([\"content_field_1\", \"content_field_2\"]),\n", + " metadata_fields=set([\"title\", \"author\"]),\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "sFJdt3ubsFGo" + }, + "source": [ + "## Save Documents\n", + "\n", + "You can save a list of Documents into Memorystore for Redis instance like below. The Documents will be stored into randomly generated keys with the specified prefix of `key_prefix`. Alternatively, you can designate the suffixes of the keys by specifying `ids` in the `add_documents` method." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "H5I7K3MTsFGo" + }, + "outputs": [], + "source": [ + "import redis\n", + "from langchain_google_memorystore_redis import MemorystoreDocumentSaver\n", + "\n", + "# Connect to a Memorystore for Redis instance\n", + "redis_client = redis.from_url(\"redis://127.0.0.1:6379\")\n", + "prefix = \"doc:\"\n", + "\n", + "saver = MemorystoreDocumentSaver(\n", + " client=redis_client,\n", + " key_prefix=prefix,\n", + " content_field=\"page_content\",\n", + ")\n", + "saver.add_documents(documents)" + ] + }, + { + "cell_type": "markdown", + "source": [ + "### Clean up Saved Documents\n", + "\n", + "Delete all of keys with the specified prefix in the Memorystore for Redis instance. You can also specify the suffixes of the keys if you know.\n", + "\n" + ], + "metadata": { + "id": "FLVI7Kp7mhL-" + } + }, + { + "cell_type": "code", + "source": [ + "saver.delete()" + ], + "metadata": { + "id": "1ArfDYUGmrP3" + }, + "execution_count": null, + "outputs": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.6" + }, + "colab": { + "provenance": [ + { + "file_id": "1kuFhDfyzOdzS1apxQ--1efXB1pJ79yVY", + "timestamp": 1708033015250 + } + ] + } }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Pre-reqs" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [] - }, - "outputs": [], - "source": [ - "%pip install PACKAGE_NAME" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": { - "tags": [] - }, - "outputs": [], - "source": [ - "from PACKAGE import LOADER" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Basic Usage" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Load from table" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [], - "source": [ - "loader = LOADER()\n", - "\n", - "data = loader.load()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Load from query" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "loader = LOADER()\n", - "\n", - "data = loader.load()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Customize Document Page Content & Metadata" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [], - "source": [ - "loader = LOADER()\n", - "\n", - "data = loader.load()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Customize Page Content Format" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Save Documents to table" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "saver = SAVER()\n", - "saver.add_documents(docs)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Customize Connection & Authentication" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from google.cloud.DATABASE import Client\n", - "\n", - "creds = \"\"\n", - "client = Client(creds=creds)\n", - "loader = LOADER(\n", - " client=client,\n", - ")" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.10.6" - } - }, - "nbformat": 4, - "nbformat_minor": 4 + "nbformat": 4, + "nbformat_minor": 0 } \ No newline at end of file diff --git a/docs/vector_store.ipynb b/docs/vector_store.ipynb index 564e338..b2a5538 100644 --- a/docs/vector_store.ipynb +++ b/docs/vector_store.ipynb @@ -1,356 +1,399 @@ { - "cells": [ - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Google Database\n", - "\n", - "Use [Google Memorystore for Redis](https://cloud.google.com/memorystore/docs/redis/memorystore-for-redis-overview) as a vector store for LangChain." - ] + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "2Jp10hX_jSLi" + }, + "source": [ + "Google Database\n", + "\n", + "Use [Google Memorystore for Redis](https://cloud.google.com/memorystore/docs/redis/memorystore-for-redis-overview) as a vector store for LangChain." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "5MR6o8SQjSLm" + }, + "source": [ + "## Pre-reqs" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "U8J90K-1jSLm" + }, + "source": [ + "### Setting Up a Memorystore for Redis Instance\n", + "\n", + "Before proceeding, an active Memorystore for Redis instance is needed to store vectors:\n", + "\n", + "* Create a Memorystore for Redis Instance (v7.2): If an instance doesn't exist, follow the instructions at https://cloud.google.com/memorystore/docs/redis/create-instance-console to create a new one. Ensure version 7.2 is selected.\n", + "* Obtain Endpoint: Note the endpoint associated with the instance." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "KEZwJVIijSLn" + }, + "source": [ + "### Installing the LangChain Memorystore for Redis Module\n", + "\n", + "Interaction with the Memorystore for Redis instance from LangChain requires installing the necessary module:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [], + "id": "JR8gi7LwjSLn" + }, + "outputs": [], + "source": [ + "# Install Memorystore for Redis for LangChain module\n", + "%pip install langchainlangchain_google_memorystore_redis" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Hxa67-4HjSLp" + }, + "source": [ + "## Basic Usage" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "vX4_KmZhjSLp" + }, + "source": [ + "### Initialize a Vector Index" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "fuma9hiKjSLq" + }, + "outputs": [], + "source": [ + "import redis\n", + "from langchain_google_memorystore_redis import (\n", + " DistanceStrategy,\n", + " HNSWConfig,\n", + " RedisVectorStore,\n", + ")\n", + "\n", + "# Connect to a Memorystore for Redis instance\n", + "redis_client = redis.from_url(\"redis://127.0.0.1:6379\")\n", + "\n", + "# Configure HNSW index with descriptive parameters\n", + "index_config = HNSWConfig(\n", + " name=\"my_vector_index\", distance_strategy=DistanceStrategy.COSINE, vector_size=128\n", + ")\n", + "\n", + "# Initialize/create the vector store index\n", + "RedisVectorStore.init_index(client=redis_client, index_config=index_config)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "mK8WgZLgjSLq" + }, + "source": [ + "### Prepare Documents\n", + "\n", + "Text needs processing and numerical representation before interacting with a vector store. This involves:\n", + "\n", + "* Loading Text: The TextLoader obtains text data from a file (e.g., \"state_of_the_union.txt\").\n", + "* Text Splitting: The CharacterTextSplitter breaks the text into smaller chunks for embedding models." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ysSzPrPtjSLq" + }, + "outputs": [], + "source": [ + "from langchain.text_splitter import CharacterTextSplitter\n", + "from langchain_community.document_loaders import TextLoader\n", + "\n", + "loader = TextLoader(\"./state_of_the_union.txt\")\n", + "documents = loader.load()\n", + "text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n", + "docs = text_splitter.split_documents(documents)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "jF01qnlDjSLr" + }, + "source": [ + "### Add Documents to the Vector Store\n", + "\n", + "After text preparation and embedding generation, the following methods insert them into the Redis vector store." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ygn-RPg9jSLr" + }, + "source": [ + "#### Method 1: Classmethod for Direct Insertion\n", + "\n", + "This approach combines embedding creation and insertion into a single step using the from_documents classmethod:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "n8CGv5ItjSLr" + }, + "outputs": [], + "source": [ + "from langchain_community.embeddings.fake import FakeEmbeddings\n", + "\n", + "embeddings = FakeEmbeddings(size=128)\n", + "redis_client = redis.from_url(\"redis://127.0.0.1:6379\")\n", + "rvs = RedisVectorStore.from_documents(\n", + " docs, embedding=embeddings, client=redis_client, index_name=\"my_vector_index\"\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "GyBYICjbjSLr" + }, + "source": [ + "#### Method 2: Instance-Based Insertion\n", + "This approach offers flexibility when working with a new or existing RedisVectorStore:\n", + "\n", + "* [Optional] Create a RedisVectorStore Instance: Instantiate a RedisVectorStore object for customization. If you already have an instance, proceed to the next step.\n", + "* Add Text with Metadata: Provide raw text and metadata to the instance. Embedding generation and insertion into the vector store are handled automatically." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "OUTTz4HajSLs" + }, + "outputs": [], + "source": [ + "rvs = RedisVectorStore(\n", + " client=redis_client, index_name=\"my_vector_index\", embedding_service=embeddings\n", + ")\n", + "ids = rvs.add_texts(\n", + " texts=[d.page_content for d in docs], metadatas=[d.metadata for d in docs]\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "OGYt_pjOjSLs" + }, + "source": [ + "### Perform a Similarity Search (KNN)\n", + "\n", + "With the vector store populated, it's possible to search for text semantically similar to a query. Here's how to use KNN (K-Nearest Neighbors) with default settings:\n", + "\n", + "* Formulate the Query: A natural language question expresses the search intent (e.g., \"What did the president say about Ketanji Brown Jackson\").\n", + "* Retrieve Similar Results: The `similarity_search` method finds items in the vector store closest to the query in meaning." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "NFIn1s04jSLs" + }, + "outputs": [], + "source": [ + "import pprint\n", + "\n", + "query = \"What did the president say about Ketanji Brown Jackson\"\n", + "knn_results = rvs.similarity_search(query=query)\n", + "pprint.pprint(knn_results)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "of9yyc3NjSLs" + }, + "source": [ + "### Perform a Range-Based Similarity Search\n", + "\n", + "Range queries provide more control by specifying a desired similarity threshold along with the query text:\n", + "\n", + "* Formulate the Query: A natural language question defines the search intent.\n", + "* Set Similarity Threshold: The distance_threshold parameter determines how close a match must be considered relevant.\n", + "* Retrieve Results: The `similarity_search_with_score` method finds items from the vector store that fall within the specified similarity threshold." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "oyE54puWjSLs" + }, + "outputs": [], + "source": [ + "rq_results = rvs.similarity_search_with_score(query=query, distance_threshold=0.8)\n", + "pprint.pprint(rq_results)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "IgeR83SWjSLs" + }, + "source": [ + "### Perform a Maximal Marginal Relevance (MMR) Search\n", + "\n", + "MMR queries aim to find results that are both relevant to the query and diverse from each other, reducing redundancy in search results.\n", + "\n", + "* Formulate the Query: A natural language question defines the search intent.\n", + "* Balance Relevance and Diversity: The lambda_mult parameter controls the trade-off between strict relevance and promoting variety in the results.\n", + "* Retrieve MMR Results: The `max_marginal_relevance_search` method returns items that optimize the combination of relevance and diversity based on the lambda setting." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "pZySwTDYjSLt" + }, + "outputs": [], + "source": [ + "mmr_results = rvs.max_marginal_relevance_search(query=query, lambda_mult=0.90)\n", + "pprint.pprint(mmr_results)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "3r4jQeNfjSLt" + }, + "source": [ + "## Use the Vector Store as a Retriever\n", + "\n", + "For seamless integration with other LangChain components, a vector store can be converted into a Retriever. This offers several advantages:\n", + "\n", + "* LangChain Compatibility: Many LangChain tools and methods are designed to directly interact with retrievers.\n", + "* Ease of Use: The `as_retriever()` method converts the vector store into a format that simplifies querying." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "LdbtJgNvjSLt" + }, + "outputs": [], + "source": [ + "retriever = rvs.as_retriever()\n", + "results = retriever.invoke(query)\n", + "pprint.pprint(results)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "89uoKEoAjSLt" + }, + "source": [ + "## Clean up" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "rxAGb_4DjSLt" + }, + "source": [ + "### Delete Documents from the Vector Store\n", + "\n", + "Occasionally, it's necessary to remove documents (and their associated vectors) from the vector store. The `delete` method provides this functionality." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "csW5Smw2jSLt" + }, + "outputs": [], + "source": [ + "rvs.delete(ids)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "wtWr_fDjjSLu" + }, + "source": [ + "### Delete a Vector Index\n", + "\n", + "There might be circumstances where the deletion of an existing vector index is necessary. Common reasons include:\n", + "\n", + "* Index Configuration Changes: If index parameters need modification, it's often required to delete and recreate the index.\n", + "* Storage Management: Removing unused indices can help free up space within the Redis instance.\n", + "\n", + "Caution: Vector index deletion is an irreversible operation. Be certain that the stored vectors and search functionality are no longer required before proceeding." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "vzY_iiRLjSLu" + }, + "outputs": [], + "source": [ + "# Delete the vector index\n", + "RedisVectorStore.drop_index(client=redis_client, index_name=\"my_vector_index\")" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.6" + }, + "colab": { + "provenance": [] + } }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Pre-reqs" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Setting Up a Memorystore for Redis Instance\n", - "\n", - "Before proceeding, an active Memorystore for Redis instance is needed to store vectors:\n", - "\n", - "* Create a Memorystore for Reids Instance (v7.2): If an instance doesn't exist, follow the instructions at https://cloud.google.com/memorystore/docs/redis/create-instance-console to create a new one. Ensure version 7.2 is selected.\n", - "* Obtain Endpoint: Note the endpoint associated with the instance." - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Installing the LangChain Memorystore for Redis Module\n", - "\n", - "Interaction with the Memorystore for Redis instance from LangChain requires installing the necessary module:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [] - }, - "outputs": [], - "source": [ - "# Install Memorystore for Redis for LangChain module\n", - "%pip install langchainlangchain_google_memorystore_redis" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Basic Usage" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Initialize a Vector Index" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": {}, - "outputs": [], - "source": [ - "import redis\n", - "from langchain_google_memorystore_redis import (\n", - " DistanceStrategy,\n", - " HNSWConfig,\n", - " RedisVectorStore,\n", - ")\n", - "\n", - "# Connect to a Memorystore for Redis instance\n", - "redis_client = redis.from_url(\"redis://127.0.0.1:6379\")\n", - "\n", - "# Configure HNSW index with descriptive parameters\n", - "index_config = HNSWConfig(\n", - " name=\"my_vector_index\", distance_strategy=DistanceStrategy.COSINE, vector_size=128\n", - ")\n", - "\n", - "# Initialize/create the vector store index\n", - "RedisVectorStore.init_index(client=redis_client, index_config=index_config)" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Prepare Documents\n", - "\n", - "Text needs processing and numerical representation before interacting with a vector store. This involves:\n", - "\n", - "* Loading Text: The TextLoader obtains text data from a file (e.g., \"state_of_the_union.txt\").\n", - "* Text Splitting: The CharacterTextSplitter breaks the text into smaller chunks for embedding models." - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "metadata": {}, - "outputs": [], - "source": [ - "from langchain.text_splitter import CharacterTextSplitter\n", - "from langchain_community.document_loaders import TextLoader\n", - "\n", - "loader = TextLoader(\"./state_of_the_union.txt\")\n", - "documents = loader.load()\n", - "text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n", - "docs = text_splitter.split_documents(documents)" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Add Documents to the Vector Store\n", - "\n", - "After text preparation and embedding generation, the following methods insert them into the Redis vector store." - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Method 1: Classmethod for Direct Insertion\n", - "\n", - "This approach combines embedding creation and insertion into a single step using the from_documents classmethod:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from langchain_community.embeddings.fake import FakeEmbeddings\n", - "\n", - "embeddings = FakeEmbeddings(size=128)\n", - "redis_client = redis.from_url(\"redis://127.0.0.1:6379\")\n", - "rvs = RedisVectorStore.from_documents(\n", - " docs, embedding=embeddings, client=redis_client, index_name=\"my_vector_index\"\n", - ")" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Method 2: Instance-Based Insertion\n", - "This approach offers flexibility when working with a new or existing RedisVectorStore:\n", - "\n", - "* [Optional] Create a RedisVectorStore Instance: Instantiate a RedisVectorStore object for customization. If you already have an instance, proceed to the next step.\n", - "* Add Text with Metadata: Provide raw text and metadata to the instance. Embedding generation and insertion into the vector store are handled automatically." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "rvs = RedisVectorStore(\n", - " client=redis_client, index_name=\"my_vector_index\", embeddings=embeddings\n", - ")\n", - "ids = rvs.add_texts(\n", - " texts=[d.page_content for d in docs], metadatas=[d.metadata for d in docs]\n", - ")" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Perform a Similarity Search (KNN)\n", - "\n", - "With the vector store populated, it's possible to search for text semantically similar to a query. Here's how to use KNN (K-Nearest Neighbors) with default settings:\n", - "\n", - "* Formulate the Query: A natural language question expresses the search intent (e.g., \"What did the president say about Ketanji Brown Jackson\").\n", - "* Retrieve Similar Results: The `similarity_search` method finds items in the vector store closest to the query in meaning." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import pprint\n", - "\n", - "query = \"What did the president say about Ketanji Brown Jackson\"\n", - "knn_results = rvs.similarity_search(query=query)\n", - "pprint.pprint(knn_results)" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Perform a Range-Based Similarity Search\n", - "\n", - "Range queries provide more control by specifying a desired similarity threshold along with the query text:\n", - "\n", - "* Formulate the Query: A natural language question defines the search intent.\n", - "* Set Similarity Threshold: The distance_threshold parameter determines how close a match must be considered relevant.\n", - "* Retrieve Results: The `similarity_search_with_score` method finds items from the vector store that fall within the specified similarity threshold." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "rq_results = rvs.similarity_search_with_score(query=query, distance_threshold=0.8)\n", - "pprint.pprint(rq_results)" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Perform a Maximal Marginal Relevance (MMR) Search\n", - "\n", - "MMR queries aim to find results that are both relevant to the query and diverse from each other, reducing redundancy in search results.\n", - "\n", - "* Formulate the Query: A natural language question defines the search intent.\n", - "* Balance Relevance and Diversity: The lambda_mult parameter controls the trade-off between strict relevance and promoting variety in the results.\n", - "* Retrieve MMR Results: The `max_marginal_relevance_search` method returns items that optimize the combination of relevance and diversity based on the lambda setting." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "mmr_results = rvs.max_marginal_relevance_search(query=query, lambda_mult=0.90)\n", - "pprint.pprint(mmr_results)" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Use the Vector Store as a Retriever\n", - "\n", - "For seamless integration with other LangChain components, a vector store can be converted into a Retriever. This offers several advantages:\n", - "\n", - "* LangChain Compatibility: Many LangChain tools and methods are designed to directly interact with retrievers.\n", - "* Ease of Use: The `as_retriever()` method converts the vector store into a format that simplifies querying." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "retriever = rvs.as_retriever()\n", - "results = retriever.invoke(query)\n", - "pprint.pprint(results)" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Clean up" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Delete Documents from the Vector Store\n", - "\n", - "Occasionally, it's necessary to remove documents (and their associated vectors) from the vector store. The `delete` method provides this functionality." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "rvs.delete(ids)" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Delete a Vector Index\n", - "\n", - "There might be circumstances where the deletion of an existing vector index is necessary. Common reasons include:\n", - "\n", - "* Index Configuration Changes: If index parameters need modification, it's often required to delete and recreate the index.\n", - "* Storage Management: Removing unused indices can help free up space within the Redis instance.\n", - "\n", - "Caution: Vector index deletion is an irreversible operation. Be certain that the stored vectors and search functionality are no longer required before proceeding." - ] - }, - { - "cell_type": "code", - "execution_count": 22, - "metadata": {}, - "outputs": [], - "source": [ - "# Delete the vector index\n", - "RedisVectorStore.drop_index(client=redis_client, index_name=\"my_vector_index\")" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.11.6" - } - }, - "nbformat": 4, - "nbformat_minor": 4 -} + "nbformat": 4, + "nbformat_minor": 0 +} \ No newline at end of file