From 7275b863900d2587a756aed9307f1bf1624754ad Mon Sep 17 00:00:00 2001 From: Vishwaraj Anand Date: Sun, 6 Jul 2025 09:09:32 +0000 Subject: [PATCH 1/3] chore: add documentation for Hybrid Seach --- README.md | 1 + examples/pg_vectorstore_how_to.ipynb | 101 +++++++++++++++++++++++++++ 2 files changed, 102 insertions(+) diff --git a/README.md b/README.md index 1c3b5bac..e0b0f5f1 100644 --- a/README.md +++ b/README.md @@ -78,6 +78,7 @@ print(docs) > [!TIP] > All synchronous functions have corresponding asynchronous functions +> PGVectorStore also supports Hybrid Search which combines multiple search strategies to improve search results. ## ChatMessageHistory diff --git a/examples/pg_vectorstore_how_to.ipynb b/examples/pg_vectorstore_how_to.ipynb index bbdd7237..bd6d4593 100644 --- a/examples/pg_vectorstore_how_to.ipynb +++ b/examples/pg_vectorstore_how_to.ipynb @@ -686,6 +686,107 @@ "1. For new records, added via `VectorStore` embeddings are automatically generated." ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Hybrid Search Vector Store\n", + "\n", + "A Hybrid Search Vector Store combines multiple lookup strategies to provide more comprehensive and relevant search results. Specifically, it leverages both dense embedding vector search (for semantic similarity) and TSV (Text Search Vector) based keyword search (for lexical matching). This approach is particularly powerful for applications requiring efficient searching through customized text and metadata, especially when a specialized embedding model isn't feasible or necessary.\n", + "\n", + "By integrating both semantic and lexical capabilities, hybrid search helps overcome the limitations of each individual method:\n", + "\n", + "* **Semantic Search**: Excellent for understanding the meaning of a query, even if the exact keywords aren't present. However, it can sometimes miss highly relevant documents that contain the precise keywords but have a slightly different semantic context.\n", + "\n", + "* **Keyword Search**: Highly effective for finding documents with exact keyword matches and is generally fast. Its weakness lies in its inability to understand synonyms, misspellings, or conceptual relationships.\n", + "\n", + "With a `HybridSearchConfig` provided, the `PGVectorStore` class can efficiently manage a hybrid search vector store using PostgreSQL as the backend, automatically handling the creation and population of the necessary TSV columns when possible.\n", + "\n", + "\n", + "Assuming a pre-existing table same as above in PG DB: `products`, which stores product details for an eComm venture.\n", + "\n", + "Here is how this table mapped to `PGVectorStore`:\n", + "\n", + "- **`id_column=\"product_id\"`**: ID column uniquely identifies each row in the products table.\n", + "\n", + "- **`content_column=\"description\"`**: The `description` column contains text descriptions of each product. This text is used by the `embedding_service` to create vectors that go in embedding_column and represent the semantic meaning of each description.\n", + "\n", + "- **`embedding_column=\"embed\"`**: The `embed` column stores the vectors created from the product descriptions. These vectors are used to find products with similar descriptions.\n", + "\n", + "- **`metadata_columns=[\"name\", \"category\", \"price_usd\", \"quantity\", \"sku\", \"image_url\"]`**: These columns are treated as metadata for each product. Metadata provides additional information about a product, such as its name, category, price, quantity available, SKU (Stock Keeping Unit), and an image URL. This information is useful for displaying product details in search results or for filtering and categorization.\n", + "\n", + "- **`metadata_json_column=\"metadata\"`**: The `metadata` column can store any additional information about the products in a flexible JSON format. This allows for storing varied and complex data that doesn't fit into the standard columns.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "\n", + "from langchain_postgres.v2 import PGVectorStore\n", + "from langchain_postgres.v2.hybrid_search_config import HybridSearchConfig\n", + "from langchain_postgres.v2.hybrid_search_config import reciprocal_rank_fusion\n", + "\n", + "TABLE_NAME=\"hybrid_search_products\"\n", + "\n", + "hybrid_search_config = HybridSearchConfig(\n", + " tsv_column=\"hybrid_description\",\n", + " tsv_lang=\"pg_catalog.english\",\n", + " fusion_function=reciprocal_rank_fusion,\n", + " fusion_function_parameters={\n", + " \"rrf_k\": 60,\n", + " \"fetch_top_k\": 10,\n", + " },\n", + ")\n", + "\n", + "# If a hybrid search config is provided during vector store table creation,\n", + "# the specified TSV column will be automatically created.\n", + "await pg_engine.ainit_vectorstore_table(\n", + " table_name=TABLE_NAME,\n", + " # schema_name=SCHEMA_NAME,\n", + " vector_size=VECTOR_SIZE,\n", + " id_column=\"product_id\",\n", + " content_column=\"description\",\n", + " embedding_column=\"embed\",\n", + " metadata_columns=[\"name\", \"category\", \"price_usd\", \"quantity\", \"sku\", \"image_url\"],\n", + " metadata_json_column=\"metadata\",\n", + " hybrid_search_config=hybrid_search_config,\n", + " store_metadata=True\n", + ")\n", + "\n", + "\n", + "# If a hybrid search config is NOT provided during init_vectorstore_table (above),\n", + "# but only provided during PGVectorStore creation, the specified TSV column\n", + "# is not present and TSV vectors are created dynamically on-the-go for hybrid search.\n", + "vs_hybrid = await PGVectorStore.create(\n", + " pg_engine,\n", + " table_name=TABLE_NAME,\n", + " # schema_name=SCHEMA_NAME,\n", + " embedding_service=embedding,\n", + " # Connect to existing VectorStore by customizing below column names\n", + " id_column=\"product_id\",\n", + " content_column=\"description\",\n", + " embedding_column=\"embed\",\n", + " metadata_columns=[\"name\", \"category\", \"price_usd\", \"quantity\", \"sku\", \"image_url\"],\n", + " metadata_json_column=\"metadata\",\n", + " hybrid_search_config=hybrid_search_config,\n", + ")\n", + "\n", + "# Optionally, create an index on hybrid search column name\n", + "await vs_hybrid.aapply_hybrid_search_index()\n", + "\n", + "# Fetch documents from the previopusly created store to fetch product documents\n", + "docs = await custom_store.asimilarity_search(\"products\", k=5)\n", + "# Add data normally to the vector store, which will also add the tsv values in tsv_column\n", + "await vs_hybrid.aadd_documents(docs)\n", + "\n", + "# Use hybrid search\n", + "hybrid_docs = await vs_hybrid.asimilarity_search(\"products\", k=5)\n", + "print(hybrid_docs)" + ] + }, { "cell_type": "markdown", "metadata": {}, From c3de14d7202bcaa5f3c626e9903f71d0571fceb3 Mon Sep 17 00:00:00 2001 From: Vishwaraj Anand Date: Mon, 7 Jul 2025 12:00:33 +0530 Subject: [PATCH 2/3] chore: lint fixes --- examples/pg_vectorstore_how_to.ipynb | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/examples/pg_vectorstore_how_to.ipynb b/examples/pg_vectorstore_how_to.ipynb index bd6d4593..28c487e0 100644 --- a/examples/pg_vectorstore_how_to.ipynb +++ b/examples/pg_vectorstore_how_to.ipynb @@ -724,12 +724,11 @@ "metadata": {}, "outputs": [], "source": [ - "\n", "from langchain_postgres.v2 import PGVectorStore\n", "from langchain_postgres.v2.hybrid_search_config import HybridSearchConfig\n", "from langchain_postgres.v2.hybrid_search_config import reciprocal_rank_fusion\n", "\n", - "TABLE_NAME=\"hybrid_search_products\"\n", + "TABLE_NAME = \"hybrid_search_products\"\n", "\n", "hybrid_search_config = HybridSearchConfig(\n", " tsv_column=\"hybrid_description\",\n", @@ -753,7 +752,7 @@ " metadata_columns=[\"name\", \"category\", \"price_usd\", \"quantity\", \"sku\", \"image_url\"],\n", " metadata_json_column=\"metadata\",\n", " hybrid_search_config=hybrid_search_config,\n", - " store_metadata=True\n", + " store_metadata=True,\n", ")\n", "\n", "\n", From b510e4c52689bc9a4127a544bcfd0d6ca15fa8f0 Mon Sep 17 00:00:00 2001 From: Vishwaraj Anand Date: Mon, 7 Jul 2025 06:34:59 +0000 Subject: [PATCH 3/3] chore: lint fixes --- examples/pg_vectorstore_how_to.ipynb | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/examples/pg_vectorstore_how_to.ipynb b/examples/pg_vectorstore_how_to.ipynb index 28c487e0..8b429fc8 100644 --- a/examples/pg_vectorstore_how_to.ipynb +++ b/examples/pg_vectorstore_how_to.ipynb @@ -725,8 +725,10 @@ "outputs": [], "source": [ "from langchain_postgres.v2 import PGVectorStore\n", - "from langchain_postgres.v2.hybrid_search_config import HybridSearchConfig\n", - "from langchain_postgres.v2.hybrid_search_config import reciprocal_rank_fusion\n", + "from langchain_postgres.v2.hybrid_search_config import (\n", + " HybridSearchConfig,\n", + " reciprocal_rank_fusion,\n", + ")\n", "\n", "TABLE_NAME = \"hybrid_search_products\"\n", "\n",