From 6a6850d6632da01c5c05708eb6554c94fb445c2b Mon Sep 17 00:00:00 2001 From: Tyler Hutcherson Date: Fri, 19 Jan 2024 16:52:07 -0500 Subject: [PATCH 01/18] enhance readme outline and content --- README.md | 251 ++++++++++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 215 insertions(+), 36 deletions(-) diff --git a/README.md b/README.md index 8866f83d..6772f4f6 100644 --- a/README.md +++ b/README.md @@ -1,10 +1,8 @@ -# RedisVL: Python Client Library for Redis as a Vector Database +# RedisVL
- Home    - Documentation    - More Projects    + 🔥 Redis Vector Library: the AI-native Redis Python client   

@@ -13,7 +11,7 @@
[![Codecov](https://img.shields.io/codecov/c/github/RedisVentures/RedisVL/dev?label=Codecov&logo=codecov&token=E30WxqBeJJ)](https://codecov.io/gh/RedisVentures/RedisVL) -[![License](https://img.shields.io/badge/License-BSD-3--blue.svg)](https://opensource.org/licenses/mit/) +[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) ![Language](https://img.shields.io/github/languages/top/RedisVentures/RedisVL) [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black) ![GitHub last commit](https://img.shields.io/github/last-commit/RedisVentures/RedisVL) @@ -22,64 +20,245 @@
-RedisVL provides a powerful Python client library for using Redis as a Vector Database. Leverage the speed and reliability of Redis along with vector-based semantic search capabilities to supercharge your application! +
+
+ Home    + Documentation    + More Projects    +
+
+
+ + +## Introduction + +`redisvl` is a Python client library, tailor-made for AI applications leveraging [Redis](https://redis.com). It's designed for use in: + +- Information retrieval & semantic search apps +- Real-time RAG pipelines +- Recommendation engines + +Enhance your AI applications with Redis's **speed**, **flexibility**, and **reliability**, incorporating capabilities like vector-based semantic search, full-text search, and geo-spatial search. + +## 🚀 Why RedisVL? + +The emergence of the modern GenAI stack, including **vector databases** and **LLMs**, has become increasingly popular due to accelerated innovation & research in information retrieval, the ubiquity of tools & frameworks (e.g. [LangChain](), [LlamaIndex](), [EmbedChain]()), and the never-ending stream of business problems addressable by AI. + +However, organizations struggle with delivering reliable solutions **quickly** (*time to value*) at **scale** (*beyond a demo*). + +[Redis](https://redis.io) has been a staple for over a decade in the NoSQL world, and boasts a number of flexible [data structures]() and [processing engines]() to handle realtime application workloads like [caching](), [session management](), [job queueing]() and [search](). + +`redisvl` bridges the gap between the emerging AI-native developer ecosystem and the capabilities of Redis by providing a lightweight, elegant, and intuitive interface. Built on the back of the popular Python client, [`redis-py`](), it extends the core features of Redis into a grammar that is more aligned to the needs of today's AI/ML engineers or scientists. + +## 💪 Getting Started + +### Installation + +Install `redisvl` into your Python (>=3.8) environment using `pip`: + +```bash +pip install redisvl +``` +> For more instructions, visit the `redisvl` [installation guide](https://www.redisvl.com/overview/installation.html). + +### Setting up Redis + +Choose from multiple Redis deployment options: -**Note**: This supported by Redis, Inc. on a good faith effort basis. To report bugs, request features, or receive assistance, please [file an issue](https://github.com/RedisVentures/redisvl/issues). +1. [Redis Cloud](https://redis.com/try-free): Managed cloud database (free tier available) +2. [Redis Stack](https://redis.io/docs/getting-started/install-stack/docker/): Docker image for development + ```bash + docker run -d --name redis-stack -p 6379:6379 -p 8001:8001 redis/redis-stack:latest + ``` +3. [Redis Enterprise](https://redis.com/redis-enterprise/): Commercial, self-hosted database ------------- +> Enhance your experience and obersvability with the free [Redis Insight GUI](https://redis.com/redis-enterprise/redis-insight/). -## 🚀 What is RedisVL? +## Features and Usage -Vector databases have become increasingly popular in recent years due to their ability to store and retrieve vectors efficiently. However, most vector databases are complex to use and require a lot of time and effort to set up. RedisVL aims to solve this problem by providing a simple and intuitive interface for using Redis as a vector database. -RedisVL provides a client library that enables you to harness the power and flexibility of Redis as a vector database. This library simplifies the process of storing, retrieving, and performing complex semantic and hybrid searches over vectors in Redis. It also provides a robust index management system that allows you to create, update, and delete indices with ease. +### 🗃️ Index Management +1. [Design an `IndexSchema`](https://www.redisvl.com/user_guide/getting_started_01.html#define-an-indexschema) that models your dataset with built-in Redis [data structures](https://www.redisvl.com/user_guide/hash_vs_json_05.html) (*Hash or JSON*) and indexable fields (*e.g. text, tags, numerics, geo, and vectors*). [Load a schema](https://www.redisvl.com/user_guide/getting_started_01.html#example-schema-creation) from YAML file or from a Python dictionary: + ```python + from redisvl.schema import IndexSchema + ``` + Load schema from a [YAML file](schemas/schema.yaml): + ```python + schema = IndexSchema.from_yaml("schemas/schema.yaml") + ``` + Or load directly from a Python dictionary: + ```python + schema = IndexSchema.from_dict({ + "index": { + "name": "my-index", + "prefix": "docs", + }, + "fields": [ + {"name": "content", "type": "text"}, + { + "name": "content-embedding", + "type": "vector", + "attrs": { + "algorithm": "flat", + "datatype": "float32", + "dims": 4, + "distance_metric": "cosine" + } + } + ] + }) + ``` -### Capabilities +2. [Create a SearchIndex](https://www.redisvl.com/user_guide/getting_started_01.html#create-a-searchindex) class with an input schema and client connection in order to perform admin and search operations on your index in Redis: + ```python + from redis import Redis + from redisvl.index import SearchIndex -RedisVL has a host of powerful features designed to streamline your vector database operations. + # Establish Redis connection and define index + client = Redis.from_url("redis://localhost:6379") + index = SearchIndex(schema, client) -1. **Index Management**: RedisVL allows for indices to be created, updated, and deleted with ease. A schema for each index can be defined in yaml or directly in python code and used throughout the lifetime of the index. - - [Getting Started with SearchIndex](https://www.redisvl.com/user_guide/getting_started_01.html) - - [``rvl`` Command Line Interface](https://www.redisvl.com/user_guide/cli.html) + # Create the index in Redis + index.create() + ``` -2. **Embedding Creation**: RedisVLs [Vectorizers](https://www.redisvl.com/user_guide/vectorizers_04.html) integrate with common embedding model services to simplify the process of vectorizing unstructured data. - - [OpenAI](https://www.redisvl.com/api/vectorizer.html#openaitextvectorizer) - - [HuggingFace](https://www.redisvl.com/api/vectorizer.html#hftextvectorizer) - - [GCP VertexAI](https://www.redisvl.com/api/vectorizer.html#vertexaitextvectorizer) +### 🔍 Realtime Search -3. **Vector Search**: RedisVL provides robust search capabilities that enable you quickly define complex search queries with flexible abstractions. - - [VectorQuery](https://www.redisvl.com/api/query.html#vectorquery) - Flexible vector queries with filters - - [RangeQuery](https://www.redisvl.com/api/query.html#rangequery) - Vector search within a defined range - - [CountQuery](https://www.redisvl.com/api/query.html#countquery) - Count the number of records given attributes - - [FilterQuery](https://www.redisvl.com/api/query.html#filterquery) - Filter records given attributes +Define queries and perform advanced searches over your indices, including the combination of vectors, metadata filters, and more. -3. **[Hybrid (Filtered) queries](https://www.redisvl.com/user_guide/hybrid_queries_02.html)** that utilize tag, geographic, numeric, and other filters like full-text search are also supported. +- [VectorQuery](https://www.redisvl.com/api/query.html#vectorquery) - Flexible vector queries with customizable filters enabling semantic search: -4. **Semantic Caching**: [`LLMCache`](https://www.redisvl.com/user_guide/llmcache_03.html) is a semantic caching interface built directly into RedisVL. Semantic caching is a popular technique to increase the QPS and reduce the cost of using LLM models in production. + ```python + from redisvl.query import VectorQuery -5. [**JSON Storage**](https://www.redisvl.com/user_guide/hash_vs_json_05.html): RedisVL supports storing JSON objects, including vectors, in Redis. + query = VectorQuery( + vector=[0.16, -0.34, 0.98, 0.23], + vector_field_name="content-embedding", + num_results=3 + ) -## Installation + results = index.query(query) + ``` -Install `redisvl` using `pip`: + Incorporate complex metadata filters on your queries: + ```python + from redisvl.query.filter import Text + + # define a text filter + text_filter = Text("content") % "foo" + + # update query definition + query.set_filter(text_filter) + + # execute + results = index.query(query) + ``` + +- [RangeQuery](https://www.redisvl.com/api/query.html#rangequery) - Vector search within a defined range paired with customizable filters +- [FilterQuery](https://www.redisvl.com/api/query.html#filterquery) - Standard search using filters and the full-text search +- [CountQuery](https://www.redisvl.com/api/query.html#countquery) - Count the number of indexed records given attributes + +> Read more about building advanced queries [here](https://www.redisvl.com/user_guide/hybrid_queries_02.html). + + +### 🖥️ Command Line Interface +Create, destroy, and manage Redis index configurations from a purpose-built CLI interface: `rvl`. ```bash -pip install redisvl +$ rvl -h + +usage: rvl [] + +Commands: + index Index manipulation (create, delete, etc.) + version Obtain the version of RedisVL + stats Obtain statistics about an index +``` + +> Read more about using the `redisvl` CLI [here](https://www.redisvl.com/user_guide/cli.html). + +### ⚡ Community Integrations +Integrate with popular embedding models and providers to greatly simplify the process of vectorizing unstructured data for your index and queries: +- [Cohere](https://www.redisvl.com/api/vectorizer/html#coheretextvectorizer) +- [OpenAI](https://www.redisvl.com/api/vectorizer.html#openaitextvectorizer) +- [HuggingFace](https://www.redisvl.com/api/vectorizer.html#hftextvectorizer) +- [GCP VertexAI](https://www.redisvl.com/api/vectorizer.html#vertexaitextvectorizer) + +```python +from redisvl.vectorize import CohereTextVectorizer + +# set COHERE_API_KEY in your environment +co = CohereTextVectorizer() + +embedding = co.embed( + text="What is the capital city of France?", + input_type="search_query" +) + +embeddings = co.embed_many( + texts=["my document chunk content", "my other document chunk content"], + search_type="search_documents" +) ``` -For more instructions, see the [installation guide](https://www.redisvl.com/overview/installation.html). +> Learn more about using `redisvl` Vectorizers in your workflows [here](https://www.redisvl.com/user_guide/vectorizers_04.html). + +### 💫 Beyond Vector Search +Modern GenAI applications require much more than RAG-style vector search in order +to perform well in production. `redisvl` provides some common extensions that +improve applications working with LLMs: + +- **LLM Semantic Caching** is designed to increase the request QPS, reduce the cost of using LLM models in production, and drive towards more compliant + consistent responses, robust to nuanced input. + + ```python + from redisvl.llmcache import SemanticCache + + # init cache with TTL (expiration) policy and semantic distance threshhold + llmcache = SemanticCache( + name="llmcache", + ttl=360, + redis_url="redis://localhost:6379" + ) + llmcache.set_threshold(0.2) # can be changed on-demand -## Getting Started + # store user queries and LLM responses in the semantic cache + llmcache.store( + prompt="What is the capital city of France?", + response="Paris", + metadata={} + ) -To get started with RedisVL, check out the + # quickly check the cache with a slightly different prompt (before invoiking an LLM) + response = llmcache.check(prompt="What is France's capital city?") + print(response[0]["response"]) + ``` + ```stdout + >>> "Paris" + ``` + + > Learn more about Semantic Caching in `redisvl` [here](https://www.redisvl.com/user_guide/llmcache_03.html). + +- **LLM Session Management** +COMING SOON + +- **LLM Contextual Access Control** COMING SOON + + +## Helpful Links + +To get started with `redisvl`, check out: - [Getting Started Guide](https://www.redisvl.com/user_guide/getting_started_01.html) - [API Reference](https://www.redisvl.com/api/index.html) - [Example Gallery](https://www.redisvl.com/examples/index.html) + - [Official Redis Vector Search Docs](https://redis.io/docs/interact/search-and-query/advanced-concepts/vectors/) + +## 🫱🏼‍🫲🏽 Contributing -## Contributing +Please help us by contributing PRs, opening GitHub issues for bugs or new feature ideas, improving documentation, or increasing test coverage. [Read more about how to contribute to RedisVL!](CONTRIBUTING.md) -Please help us by contributing PRs or opening GitHub issues for desired behaviors or discovered bugs. [Read more about how to contribute to RedisVL!](CONTRIBUTING.md) +## 🚧 Maintenance +**Note**: This project is supported by [Redis, Inc.](https://redis.com) on a good faith effort basis. To report bugs, request features, or receive assistance, please [file an issue](https://github.com/RedisVentures/redisvl/issues). From 20b06cf41fad1f792ff39732d7cc267ab4267bba Mon Sep 17 00:00:00 2001 From: Tyler Hutcherson Date: Thu, 1 Feb 2024 13:34:54 -0500 Subject: [PATCH 02/18] updates --- README.md | 109 ++++++++++++++++++++++++++++++++++++------------------ 1 file changed, 72 insertions(+), 37 deletions(-) diff --git a/README.md b/README.md index 6772f4f6..95ab2cc9 100644 --- a/README.md +++ b/README.md @@ -1,13 +1,14 @@ -# RedisVL -
-
- 🔥 Redis Vector Library: the AI-native Redis Python client    -
+
+ 🔥 Redis Vector Library: +
+ the AI-native Redis Python client +

+
[![Codecov](https://img.shields.io/codecov/c/github/RedisVentures/RedisVL/dev?label=Codecov&logo=codecov&token=E30WxqBeJJ)](https://codecov.io/gh/RedisVentures/RedisVL) @@ -30,29 +31,29 @@
-## Introduction +# Introduction `redisvl` is a Python client library, tailor-made for AI applications leveraging [Redis](https://redis.com). It's designed for use in: -- Information retrieval & semantic search apps +- Information retrieval & vector similarity search - Real-time RAG pipelines - Recommendation engines Enhance your AI applications with Redis's **speed**, **flexibility**, and **reliability**, incorporating capabilities like vector-based semantic search, full-text search, and geo-spatial search. -## 🚀 Why RedisVL? +# 🚀 Why RedisVL? -The emergence of the modern GenAI stack, including **vector databases** and **LLMs**, has become increasingly popular due to accelerated innovation & research in information retrieval, the ubiquity of tools & frameworks (e.g. [LangChain](), [LlamaIndex](), [EmbedChain]()), and the never-ending stream of business problems addressable by AI. +The emergence of the modern GenAI stack, including **vector databases** and **LLMs**, has become increasingly popular due to accelerated innovation & research in information retrieval, the ubiquity of tools & frameworks (e.g. [LangChain](https://github.com/langchain-ai/langchain), [LlamaIndex](https://www.llamaindex.ai/), [EmbedChain](https://github.com/embedchain/embedchain)), and the never-ending stream of business problems addressable by AI. -However, organizations struggle with delivering reliable solutions **quickly** (*time to value*) at **scale** (*beyond a demo*). +However, organizations still struggle with delivering reliable solutions **quickly** (*time to value*) at **scale** (*beyond a demo*). -[Redis](https://redis.io) has been a staple for over a decade in the NoSQL world, and boasts a number of flexible [data structures]() and [processing engines]() to handle realtime application workloads like [caching](), [session management](), [job queueing]() and [search](). +[Redis](https://redis.io) has been a staple for over a decade in the NoSQL world, and boasts a number of flexible [data structures](https://redis.io/docs/data-types/) and [processing engines](https://redis.io/docs/interact/) to handle realtime application workloads like caching, session management, and search. Most notably, Redis has been used as a vector database for RAG, LLM cache, and chat session memory store for conversational AI applications -`redisvl` bridges the gap between the emerging AI-native developer ecosystem and the capabilities of Redis by providing a lightweight, elegant, and intuitive interface. Built on the back of the popular Python client, [`redis-py`](), it extends the core features of Redis into a grammar that is more aligned to the needs of today's AI/ML engineers or scientists. +`redisvl` **bridges the gap between** the emerging AI-native developer ecosystem and the capabilities of Redis by providing a lightweight, elegant, and intuitive interface. Built on the back of the popular Python client, [`redis-py`](https://github.com/redis/redis-py/tree/master), it extends the core caching and search features of Redis into a grammar that is more aligned to the needs of today's AI/ML engineers or scientists. -## 💪 Getting Started +# 💪 Getting Started -### Installation +## Installation Install `redisvl` into your Python (>=3.8) environment using `pip`: @@ -61,7 +62,7 @@ pip install redisvl ``` > For more instructions, visit the `redisvl` [installation guide](https://www.redisvl.com/overview/installation.html). -### Setting up Redis +## Setting up Redis Choose from multiple Redis deployment options: @@ -76,30 +77,53 @@ Choose from multiple Redis deployment options: > Enhance your experience and obersvability with the free [Redis Insight GUI](https://redis.com/redis-enterprise/redis-insight/). -## Features and Usage +## What's included? -### 🗃️ Index Management -1. [Design an `IndexSchema`](https://www.redisvl.com/user_guide/getting_started_01.html#define-an-indexschema) that models your dataset with built-in Redis [data structures](https://www.redisvl.com/user_guide/hash_vs_json_05.html) (*Hash or JSON*) and indexable fields (*e.g. text, tags, numerics, geo, and vectors*). [Load a schema](https://www.redisvl.com/user_guide/getting_started_01.html#example-schema-creation) from YAML file or from a Python dictionary: +### 🗃️ Redis Index Management +1. [Design an `IndexSchema`](https://www.redisvl.com/user_guide/getting_started_01.html#define-an-indexschema) that models your dataset with built-in Redis [data structures](https://www.redisvl.com/user_guide/hash_vs_json_05.html) (*Hash or JSON*) and indexable fields (*e.g. text, tags, numerics, geo, and vectors*). - ```python - from redisvl.schema import IndexSchema + [Load a schema](https://www.redisvl.com/user_guide/getting_started_01.html#example-schema-creation) from a [YAML file](schemas/schema.yaml): + ```yaml + version: '0.1.0' + + index: + name: user-index-v1 + prefix: user + key_separator: ':' + storage_type: json + + fields: + - name: user + type: tag + - name: credit_score + type: tag + - name: embedding + type: vector + attrs: + algorithm: flat + dims: 3 + distance_metric: cosine + datatype: float32 ``` - Load schema from a [YAML file](schemas/schema.yaml): ```python + from redisvl.schema import IndexSchema + schema = IndexSchema.from_yaml("schemas/schema.yaml") ``` Or load directly from a Python dictionary: ```python schema = IndexSchema.from_dict({ "index": { - "name": "my-index", - "prefix": "docs", + "name": "user-index-v1", + "prefix": "user", + "storage_type": "json" }, "fields": [ - {"name": "content", "type": "text"}, + {"name": "user", "type": "tag"}, + {"name": "credit_score", "type": "tag"}, { - "name": "content-embedding", + "name": "embedding", "type": "vector", "attrs": { "algorithm": "flat", @@ -124,6 +148,19 @@ Choose from multiple Redis deployment options: # Create the index in Redis index.create() ``` + > Async-compliant search index class also available: `AsyncSearchIndex` + +3. [Load](https://www.redisvl.com/user_guide/getting_started_01.html#load-data-to-searchindex) +and [fetch](https://www.redisvl.com/user_guide/getting_started_01.html#fetch-an-object-from-redis) data to/from your Redis instance: + ```python + data = {"user": "john", "credit_score": "high", "embedding": [0.23, 0.49, -0.18, 0.95]} + + # load list of dictionaries, specify the id-field + index.load([data], id_field="user") + + # fetch + john = index.fetch("john") + ``` ### 🔍 Realtime Search @@ -136,7 +173,7 @@ Define queries and perform advanced searches over your indices, including the co query = VectorQuery( vector=[0.16, -0.34, 0.98, 0.23], - vector_field_name="content-embedding", + vector_field_name="embedding", num_results=3 ) @@ -145,15 +182,15 @@ Define queries and perform advanced searches over your indices, including the co Incorporate complex metadata filters on your queries: ```python - from redisvl.query.filter import Text + from redisvl.query.filter import Tag - # define a text filter - text_filter = Text("content") % "foo" + # define a tag match filter + tag_filter = Tag("user") == "john" # update query definition - query.set_filter(text_filter) + query.set_filter(tag_filter) - # execute + # execute query results = index.query(query) ``` @@ -188,7 +225,7 @@ Integrate with popular embedding models and providers to greatly simplify the pr - [GCP VertexAI](https://www.redisvl.com/api/vectorizer.html#vertexaitextvectorizer) ```python -from redisvl.vectorize import CohereTextVectorizer +from redisvl.utils.vectorize import CohereTextVectorizer # set COHERE_API_KEY in your environment co = CohereTextVectorizer() @@ -209,12 +246,12 @@ embeddings = co.embed_many( ### 💫 Beyond Vector Search Modern GenAI applications require much more than RAG-style vector search in order to perform well in production. `redisvl` provides some common extensions that -improve applications working with LLMs: +aim to improve applications working with LLMs: - **LLM Semantic Caching** is designed to increase the request QPS, reduce the cost of using LLM models in production, and drive towards more compliant + consistent responses, robust to nuanced input. ```python - from redisvl.llmcache import SemanticCache + from redisvl.extensions.llmcache import SemanticCache # init cache with TTL (expiration) policy and semantic distance threshhold llmcache = SemanticCache( @@ -241,9 +278,7 @@ improve applications working with LLMs: > Learn more about Semantic Caching in `redisvl` [here](https://www.redisvl.com/user_guide/llmcache_03.html). -- **LLM Session Management** -COMING SOON - +- **LLM Session Management** COMING SOON - **LLM Contextual Access Control** COMING SOON From 587dfe30f715cae34e2dc9622ef5eccd3006007e Mon Sep 17 00:00:00 2001 From: Tyler Hutcherson Date: Thu, 1 Feb 2024 13:36:20 -0500 Subject: [PATCH 03/18] bump font --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 95ab2cc9..6e5fe294 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,6 @@
- 🔥 Redis Vector Library: + 🔥 Redis Vector Library:
the AI-native Redis Python client
From c11dba362cbb3c5588d02b2f890a92b64a7f5112 Mon Sep 17 00:00:00 2001 From: Tyler Hutcherson Date: Thu, 1 Feb 2024 13:36:34 -0500 Subject: [PATCH 04/18] Remove colon --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 6e5fe294..8c8fb2e7 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,6 @@
- 🔥 Redis Vector Library: + 🔥 Redis Vector Library
the AI-native Redis Python client
From 7a53295dba810c781eafa2a9aa8a112159e7ef8e Mon Sep 17 00:00:00 2001 From: Tyler Hutcherson Date: Thu, 1 Feb 2024 13:37:17 -0500 Subject: [PATCH 05/18] try larger pixel --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 8c8fb2e7..3dd98ed1 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,6 @@
- 🔥 Redis Vector Library + 🔥 Redis Vector Library
the AI-native Redis Python client
From 086847bd2fc66696bec70610525a25e36ff8d2c1 Mon Sep 17 00:00:00 2001 From: Tyler Hutcherson Date: Thu, 1 Feb 2024 14:37:49 -0500 Subject: [PATCH 06/18] feedback --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 3dd98ed1..77e85a16 100644 --- a/README.md +++ b/README.md @@ -39,7 +39,7 @@ - Real-time RAG pipelines - Recommendation engines -Enhance your AI applications with Redis's **speed**, **flexibility**, and **reliability**, incorporating capabilities like vector-based semantic search, full-text search, and geo-spatial search. +Enhance your AI applications with Redis' **speed**, **flexibility**, and **reliability**, incorporating capabilities like vector-based semantic search, full-text search, and geo-spatial search. # 🚀 Why RedisVL? @@ -296,4 +296,4 @@ To get started with `redisvl`, check out: Please help us by contributing PRs, opening GitHub issues for bugs or new feature ideas, improving documentation, or increasing test coverage. [Read more about how to contribute to RedisVL!](CONTRIBUTING.md) ## 🚧 Maintenance -**Note**: This project is supported by [Redis, Inc.](https://redis.com) on a good faith effort basis. To report bugs, request features, or receive assistance, please [file an issue](https://github.com/RedisVentures/redisvl/issues). +RedisVL is supported by [Redis, Inc](https://redis.com) on a good faith effort basis. To report bugs, request features, or receive assistance, please [file an issue](https://github.com/RedisVentures/redisvl/issues). From b7e344c01495b5a152bcd9dda41693f0d7f2a465 Mon Sep 17 00:00:00 2001 From: Tyler Hutcherson Date: Thu, 1 Feb 2024 16:03:12 -0500 Subject: [PATCH 07/18] updates --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 77e85a16..c463a7c2 100644 --- a/README.md +++ b/README.md @@ -47,9 +47,9 @@ The emergence of the modern GenAI stack, including **vector databases** and **LL However, organizations still struggle with delivering reliable solutions **quickly** (*time to value*) at **scale** (*beyond a demo*). -[Redis](https://redis.io) has been a staple for over a decade in the NoSQL world, and boasts a number of flexible [data structures](https://redis.io/docs/data-types/) and [processing engines](https://redis.io/docs/interact/) to handle realtime application workloads like caching, session management, and search. Most notably, Redis has been used as a vector database for RAG, LLM cache, and chat session memory store for conversational AI applications +[Redis](https://redis.io) has been a staple for over a decade in the NoSQL world, and boasts a number of flexible [data structures](https://redis.io/docs/data-types/) and [processing engines](https://redis.io/docs/interact/) to handle realtime application workloads like caching, session management, and search. Most notably, Redis has been used as a vector database for RAG, as an LLM cache, and chat session memory store for conversational AI applications. -`redisvl` **bridges the gap between** the emerging AI-native developer ecosystem and the capabilities of Redis by providing a lightweight, elegant, and intuitive interface. Built on the back of the popular Python client, [`redis-py`](https://github.com/redis/redis-py/tree/master), it extends the core caching and search features of Redis into a grammar that is more aligned to the needs of today's AI/ML engineers or scientists. +`redisvl` **bridges the gap between** the emerging AI-native developer ecosystem and the capabilities of Redis by providing a lightweight, elegant, and intuitive interface. Built on the back of the popular Python client, [`redis-py`](https://github.com/redis/redis-py/tree/master), it extends the core caching and search features of Redis into a grammar that is more aligned to the needs of today's AI/ML Engineers or Data Scientists. # 💪 Getting Started From fc894afc345af3a26f0261910977abc6ec2b8b6f Mon Sep 17 00:00:00 2001 From: Tyler Hutcherson Date: Thu, 1 Feb 2024 16:03:43 -0500 Subject: [PATCH 08/18] spell check --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index c463a7c2..601605e7 100644 --- a/README.md +++ b/README.md @@ -74,7 +74,7 @@ Choose from multiple Redis deployment options: ``` 3. [Redis Enterprise](https://redis.com/redis-enterprise/): Commercial, self-hosted database -> Enhance your experience and obersvability with the free [Redis Insight GUI](https://redis.com/redis-enterprise/redis-insight/). +> Enhance your experience and observability with the free [Redis Insight GUI](https://redis.com/redis-enterprise/redis-insight/). ## What's included? From 0576f8d22f197af0ab82b0db718bb6a5e559fb93 Mon Sep 17 00:00:00 2001 From: Tyler Hutcherson Date: Mon, 5 Feb 2024 09:55:30 -0500 Subject: [PATCH 09/18] update getting started --- docs/user_guide/getting_started_01.ipynb | 156 ++++++++--------------- 1 file changed, 50 insertions(+), 106 deletions(-) diff --git a/docs/user_guide/getting_started_01.ipynb b/docs/user_guide/getting_started_01.ipynb index 6a25811e..45596510 100644 --- a/docs/user_guide/getting_started_01.ipynb +++ b/docs/user_guide/getting_started_01.ipynb @@ -215,7 +215,7 @@ { "data": { "text/plain": [ - "" + "" ] }, "execution_count": 4, @@ -249,7 +249,7 @@ { "data": { "text/plain": [ - "" + "" ] }, "execution_count": 5, @@ -304,8 +304,8 @@ "name": "stdout", "output_type": "stream", "text": [ - "\u001b[32m16:13:33\u001b[0m \u001b[34m[RedisVL]\u001b[0m \u001b[1;30mINFO\u001b[0m Indices:\n", - "\u001b[32m16:13:33\u001b[0m \u001b[34m[RedisVL]\u001b[0m \u001b[1;30mINFO\u001b[0m 1. user_simple\n" + "\u001b[32m09:54:16\u001b[0m \u001b[34m[RedisVL]\u001b[0m \u001b[1;30mINFO\u001b[0m Indices:\n", + "\u001b[32m09:54:16\u001b[0m \u001b[34m[RedisVL]\u001b[0m \u001b[1;30mINFO\u001b[0m 1. user_simple\n" ] } ], @@ -331,15 +331,15 @@ "│ user_simple │ HASH │ ['user_simple_docs'] │ [] │ 0 │\n", "╰──────────────┴────────────────┴──────────────────────┴─────────────────┴────────────╯\n", "Index Fields:\n", - "╭────────────────┬────────────────┬─────────┬────────────────┬────────────────╮\n", - "│ Name │ Attribute │ Type │ Field Option │ Option Value │\n", - "├────────────────┼────────────────┼─────────┼────────────────┼────────────────┤\n", - "│ user │ user │ TAG │ SEPARATOR │ , │\n", - "│ credit_score │ credit_score │ TAG │ SEPARATOR │ , │\n", - "│ job │ job │ TEXT │ WEIGHT │ 1 │\n", - "│ age │ age │ NUMERIC │ │ │\n", - "│ user_embedding │ user_embedding │ VECTOR │ │ │\n", - "╰────────────────┴────────────────┴─────────┴────────────────┴────────────────╯\n" + "╭────────────────┬────────────────┬─────────┬────────────────┬────────────────┬────────────────┬────────────────┬────────────────┬────────────────┬─────────────────┬────────────────╮\n", + "│ Name │ Attribute │ Type │ Field Option │ Option Value │ Field Option │ Option Value │ Field Option │ Option Value │ Field Option │ Option Value │\n", + "├────────────────┼────────────────┼─────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼─────────────────┼────────────────┤\n", + "│ user │ user │ TAG │ SEPARATOR │ , │ │ │ │ │ │ │\n", + "│ credit_score │ credit_score │ TAG │ SEPARATOR │ , │ │ │ │ │ │ │\n", + "│ job │ job │ TEXT │ WEIGHT │ 1 │ │ │ │ │ │ │\n", + "│ age │ age │ NUMERIC │ │ │ │ │ │ │ │ │\n", + "│ user_embedding │ user_embedding │ VECTOR │ algorithm │ FLAT │ data_type │ FLOAT32 │ dim │ 3 │ distance_metric │ COSINE │\n", + "╰────────────────┴────────────────┴─────────┴────────────────┴────────────────┴────────────────┴────────────────┴────────────────┴────────────────┴─────────────────┴────────────────╯\n" ] } ], @@ -365,7 +365,7 @@ "name": "stdout", "output_type": "stream", "text": [ - "['user_simple_docs:297be8ec3c6444a4b73c10e77daadb4a', 'user_simple_docs:ac0cc4c7ee4d4cd18e9002dbaf1b5cbc', 'user_simple_docs:6c746e3f02d94d9087e0d207cfed5701']\n" + "['user_simple_docs:99c6166a36744c3c998eccccd9fcfdbd', 'user_simple_docs:55ff82cbcc054ed6b91132f15fcec786', 'user_simple_docs:e8a36c9e75294c7697dabea0ebf17cd9']\n" ] } ], @@ -382,49 +382,6 @@ ">By default, `load` will create a unique Redis \"key\" as a combination of the index key `prefix` and a UUID. You can also customize the key by providing direct keys or pointing to a specified `id_field` on load." ] }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Fetch an object from Redis\n", - "\n", - "Fetch one of the previously written objects:" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Fetching data for user 297be8ec3c6444a4b73c10e77daadb4a\n" - ] - }, - { - "data": { - "text/plain": [ - "{'user': 'john',\n", - " 'age': '1',\n", - " 'job': 'engineer',\n", - " 'credit_score': 'high',\n", - " 'user_embedding': b'\\xcd\\xcc\\xcc=\\xcd\\xcc\\xcc=\\x00\\x00\\x00?'}" - ] - }, - "execution_count": 10, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "_id = keys[0].strip(f\"{index.prefix}:\") # strip the key prefix\n", - "\n", - "print(f\"Fetching data for user {_id}\")\n", - "index.fetch(id=_id)" - ] - }, { "cell_type": "markdown", "metadata": {}, @@ -442,7 +399,7 @@ "name": "stdout", "output_type": "stream", "text": [ - "['user_simple_docs:714e5ec6d4a946c082fe006d311e8d49']\n" + "['user_simple_docs:f53ce588049a4636b5ecd8b0a81ac8ff']\n" ] } ], @@ -524,17 +481,40 @@ "source": [ "## Using an Asynchronous Redis Client\n", "\n", - "The `SearchIndex` class allows for queries, index creation, and data loading to be done asynchronously. This is the\n", - "recommended route for working with `redisvl` in production-like settings.\n", - "\n", - "In order to enable it, you must either pass the `use_async` flag to the index\n", - "initializer, or provide an existing async redis client connection." + "The `AsyncSearchIndex` class along with an async Redis python client allows for queries, index creation, and data loading to be done asynchronously. This is the\n", + "recommended route for working with `redisvl` in production-like settings." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from redisvl.index import AsyncSearchIndex\n", + "from redis.asyncio import Redis\n", + "\n", + "client = Redis.from_url(\"redis://localhost:6379\")\n", + "\n", + "index = AsyncSearchIndex.from_dict(schema)\n", + "index.set_client(client)" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, "outputs": [ { "data": { @@ -550,10 +530,6 @@ } ], "source": [ - "from redisvl.index import AsyncSearchIndex\n", - "\n", - "index = AsyncSearchIndex.from_dict(schema, redis_url=\"redis://localhost:6379\")\n", - "\n", "# execute the vector query async\n", "results = await index.query(query)\n", "result_print(results)" @@ -563,7 +539,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Update a `SearchIndex`\n", + "## Updating a schema\n", "In some scenarios, it makes sense to update the index schema. With Redis and `redisvl`, this is easy because Redis can keep the underlying data in place while you change or make updates to the index configuration." ] }, @@ -572,39 +548,8 @@ "metadata": {}, "source": [ "So for our scenario, let's imagine we want to reindex this data in 2 ways:\n", - "- by using a `Tag` type for job field instead of `Text`\n", - "- by using an `hnsw` index for the `Vector` field instead of `flat`" - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "{'index': {'name': 'user_simple', 'prefix': 'user_simple_docs'},\n", - " 'fields': [{'name': 'user', 'type': 'tag'},\n", - " {'name': 'credit_score', 'type': 'tag'},\n", - " {'name': 'job', 'type': 'text'},\n", - " {'name': 'age', 'type': 'numeric'},\n", - " {'name': 'user_embedding',\n", - " 'type': 'vector',\n", - " 'attrs': {'dims': 3,\n", - " 'distance_metric': 'cosine',\n", - " 'algorithm': 'flat',\n", - " 'datatype': 'float32'}}]}" - ] - }, - "execution_count": 15, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# Inspect the previous schema\n", - "schema" + "- by using a `Tag` type for `job` field instead of `Text`\n", + "- by using an `hnsw` vector index for the `user_embedding` field instead of a `flat` vector index" ] }, { @@ -613,11 +558,10 @@ "metadata": {}, "outputs": [], "source": [ - "# We need to modify this schema to have what we want\n", + "# Modify this schema to have what we want\n", "\n", "index.schema.remove_field(\"job\")\n", "index.schema.remove_field(\"user_embedding\")\n", - "\n", "index.schema.add_fields([\n", " {\"name\": \"job\", \"type\": \"tag\"},\n", " {\n", @@ -642,7 +586,7 @@ "name": "stdout", "output_type": "stream", "text": [ - "16:13:34 redisvl.index INFO Index already exists, overwriting.\n" + "09:54:18 redisvl.index.index INFO Index already exists, overwriting.\n" ] } ], @@ -659,7 +603,7 @@ { "data": { "text/html": [ - "
vector_distanceuseragejobcredit_score
0mary2doctorlow
0john1engineerhigh
0.0566299557686tyler9engineerhigh
" + "
vector_distanceuseragejobcredit_score
0john1engineerhigh
0mary2doctorlow
0.0566299557686tyler9engineerhigh
" ], "text/plain": [ "" @@ -713,7 +657,7 @@ "│ offsets_per_term_avg │ 0 │\n", "│ records_per_doc_avg │ 5 │\n", "│ sortable_values_size_mb │ 0 │\n", - "│ total_indexing_time │ 0.138 │\n", + "│ total_indexing_time │ 0.254 │\n", "│ total_inverted_index_blocks │ 11 │\n", "│ vector_index_sz_mb │ 0.0201416 │\n", "╰─────────────────────────────┴─────────────╯\n" From a89549f4aa1750773481584467dc95391fe81933 Mon Sep 17 00:00:00 2001 From: Tyler Hutcherson Date: Mon, 5 Feb 2024 10:03:16 -0500 Subject: [PATCH 10/18] update redisvl extensions section on the readme --- README.md | 21 ++++++++++----------- 1 file changed, 10 insertions(+), 11 deletions(-) diff --git a/README.md b/README.md index 601605e7..7899ca56 100644 --- a/README.md +++ b/README.md @@ -148,17 +148,17 @@ Choose from multiple Redis deployment options: # Create the index in Redis index.create() ``` - > Async-compliant search index class also available: `AsyncSearchIndex` + > Async compliant search index class also available: `AsyncSearchIndex` 3. [Load](https://www.redisvl.com/user_guide/getting_started_01.html#load-data-to-searchindex) and [fetch](https://www.redisvl.com/user_guide/getting_started_01.html#fetch-an-object-from-redis) data to/from your Redis instance: ```python data = {"user": "john", "credit_score": "high", "embedding": [0.23, 0.49, -0.18, 0.95]} - # load list of dictionaries, specify the id-field + # load list of dictionaries, specify the "id" field index.load([data], id_field="user") - # fetch + # fetch by "id" john = index.fetch("john") ``` @@ -176,7 +176,7 @@ Define queries and perform advanced searches over your indices, including the co vector_field_name="embedding", num_results=3 ) - + # run the vector search query against the embedding field results = index.query(query) ``` @@ -198,7 +198,7 @@ Define queries and perform advanced searches over your indices, including the co - [FilterQuery](https://www.redisvl.com/api/query.html#filterquery) - Standard search using filters and the full-text search - [CountQuery](https://www.redisvl.com/api/query.html#countquery) - Count the number of indexed records given attributes -> Read more about building advanced queries [here](https://www.redisvl.com/user_guide/hybrid_queries_02.html). +> Read more about building advanced Redis queries [here](https://www.redisvl.com/user_guide/hybrid_queries_02.html). ### 🖥️ Command Line Interface @@ -244,11 +244,10 @@ embeddings = co.embed_many( > Learn more about using `redisvl` Vectorizers in your workflows [here](https://www.redisvl.com/user_guide/vectorizers_04.html). ### 💫 Beyond Vector Search -Modern GenAI applications require much more than RAG-style vector search in order -to perform well in production. `redisvl` provides some common extensions that +In order to perform well in production, modern GenAI applications require much more than vector search for retrieval. `redisvl` provides some common extensions that aim to improve applications working with LLMs: -- **LLM Semantic Caching** is designed to increase the request QPS, reduce the cost of using LLM models in production, and drive towards more compliant + consistent responses, robust to nuanced input. +- **LLM Semantic Caching** is designed to increase application throughput and reduce the cost of using LLM models in production by leveraging previously generated knowledge. ```python from redisvl.extensions.llmcache import SemanticCache @@ -256,7 +255,7 @@ aim to improve applications working with LLMs: # init cache with TTL (expiration) policy and semantic distance threshhold llmcache = SemanticCache( name="llmcache", - ttl=360, + ttl=360, redis_url="redis://localhost:6379" ) llmcache.set_threshold(0.2) # can be changed on-demand @@ -278,8 +277,8 @@ aim to improve applications working with LLMs: > Learn more about Semantic Caching in `redisvl` [here](https://www.redisvl.com/user_guide/llmcache_03.html). -- **LLM Session Management** COMING SOON -- **LLM Contextual Access Control** COMING SOON +- **LLM Session Management (COMING SOON)** aims to improve personalization and accuracy of the LLM application by providing user chat session information and conversational memory. +- **LLM Contextual Access Control (COMING SOON)** aims to improve security concerns by preventing malicious, irrelevant, or problematic user input from reaching LLMs and infrastructure. ## Helpful Links From 37d3763a331ee82c0b4922b63d1ba7ba896bcd1f Mon Sep 17 00:00:00 2001 From: Tyler Hutcherson Date: Mon, 5 Feb 2024 23:27:26 -0500 Subject: [PATCH 11/18] updates to schema and index documentation --- docs/api/cache.rst | 2 +- docs/api/query.rst | 36 ------------- docs/api/schema.rst | 75 +++++++++++++-------------- docs/api/searchindex.rst | 2 - docs/api/vectorizer.rst | 8 +-- redisvl/index/index.py | 96 +++++++++++++++++++++++++++------- redisvl/query/query.py | 108 +++++++++++++++++++++++---------------- redisvl/schema/schema.py | 47 ++++++++++++----- 8 files changed, 218 insertions(+), 156 deletions(-) diff --git a/docs/api/cache.rst b/docs/api/cache.rst index 024cff28..9c08be5d 100644 --- a/docs/api/cache.rst +++ b/docs/api/cache.rst @@ -8,7 +8,7 @@ SemanticCache .. _semantic_cache_api: -.. currentmodule:: redisvl.llmcache.semantic +.. currentmodule:: redisvl.extensions.llmcache .. autosummary:: diff --git a/docs/api/query.rst b/docs/api/query.rst index e3c9b98e..9146160a 100644 --- a/docs/api/query.rst +++ b/docs/api/query.rst @@ -10,17 +10,8 @@ VectorQuery .. currentmodule:: redisvl.query -.. autosummary:: - - VectorQuery.__init__ - VectorQuery.set_filter - VectorQuery.get_filter - VectorQuery.query - VectorQuery.params - .. autoclass:: VectorQuery - :show-inheritance: :members: :inherited-members: @@ -31,17 +22,8 @@ RangeQuery .. currentmodule:: redisvl.query -.. autosummary:: - - RangeQuery.__init__ - RangeQuery.set_filter - RangeQuery.get_filter - RangeQuery.query - RangeQuery.params - .. autoclass:: RangeQuery - :show-inheritance: :members: :inherited-members: @@ -52,17 +34,8 @@ FilterQuery .. currentmodule:: redisvl.query -.. autosummary:: - - FilterQuery.__init__ - FilterQuery.set_filter - FilterQuery.get_filter - FilterQuery.query - FilterQuery.params - .. autoclass:: FilterQuery - :show-inheritance: :members: :inherited-members: @@ -73,16 +46,7 @@ CountQuery .. currentmodule:: redisvl.query -.. autosummary:: - - CountQuery.__init__ - CountQuery.set_filter - CountQuery.get_filter - CountQuery.query - CountQuery.params - .. autoclass:: CountQuery - :show-inheritance: :members: :inherited-members: diff --git a/docs/api/schema.rst b/docs/api/schema.rst index 78fd948b..e8f7df3d 100644 --- a/docs/api/schema.rst +++ b/docs/api/schema.rst @@ -3,48 +3,49 @@ Schema *********** +Schema in RedisVL provides a structured format to define index settings and +field configurations using the following three components: + +.. list-table:: + :widths: 20 80 + :header-rows: 1 + + * - Component + - Description + * - `version` + - The version of the schema spec. Current supported version is `0.1.0`. + * - `index` + - Index specific settings like name, key prefix, key separator, and storage type. + * - `fields` + - Subset of fields within your data to include in the index and any custom settings. + IndexSchema =========== -.. _searchindex_api: +.. _indexschema_api: .. currentmodule:: redisvl.schema -.. autosummary:: - - IndexSchema.index - IndexSchema.fields - IndexSchema.version - IndexSchema.field_names - IndexSchema.redis_fields - IndexSchema.add_field - IndexSchema.add_fields - IndexSchema.remove_field - IndexSchema.from_yaml - IndexSchema.to_yaml - IndexSchema.from_dict - IndexSchema.to_dict - .. autoclass:: IndexSchema - :show-inheritance: - :inherited-members: - :members: - - -IndexInfo -========= - -.. currentmodule:: redisvl.schema - -.. autosummary:: - - IndexInfo.name - IndexInfo.prefix - IndexInfo.key_separator - IndexInfo.storage_type - - -.. autoclass:: IndexInfo - :show-inheritance: - :inherited-members: :members: + :exclude-members: generate_fields,validate_and_create_fields + +Supported Field Types +===================== + +.. list-table:: + :widths: 20 80 + :header-rows: 1 + + * - Field Type + - Description + * - `vector` + - Vector embeddings data typically generated from another AI/ML model to represent unstructured data. + * - `text` + - Full text data that enable full text search and filtering operations. + * - `tag` + - Label-like fields that are used for exact matches and filtering operations. + * - `numeric` + - Numeric fields used for range filters. + * - `geo` + - Geographic coordinates used for geo search. diff --git a/docs/api/searchindex.rst b/docs/api/searchindex.rst index 11a5f930..0b9ce2fb 100644 --- a/docs/api/searchindex.rst +++ b/docs/api/searchindex.rst @@ -21,7 +21,6 @@ SearchIndex .. currentmodule:: redisvl.index .. autoclass:: SearchIndex - :show-inheritance: :inherited-members: :members: @@ -33,6 +32,5 @@ AsyncSearchIndex .. currentmodule:: redisvl.index .. autoclass:: AsyncSearchIndex - :show-inheritance: :inherited-members: :members: diff --git a/docs/api/vectorizer.rst b/docs/api/vectorizer.rst index 9b21d432..f84fbc23 100644 --- a/docs/api/vectorizer.rst +++ b/docs/api/vectorizer.rst @@ -8,7 +8,7 @@ HFTextVectorizer .. _hftextvectorizer_api: -.. currentmodule:: redisvl.vectorize.text.huggingface +.. currentmodule:: redisvl.utils.vectorize.text.huggingface .. autosummary:: @@ -27,7 +27,7 @@ OpenAITextVectorizer .. _openaitextvectorizer_api: -.. currentmodule:: redisvl.vectorize.text.openai +.. currentmodule:: redisvl.utils.vectorize.text.openai .. autosummary:: @@ -48,7 +48,7 @@ VertexAITextVectorizer .. _vertexaitextvectorizer_api: -.. currentmodule:: redisvl.vectorize.text.vertexai +.. currentmodule:: redisvl.utils.vectorize.text.vertexai .. autosummary:: @@ -67,7 +67,7 @@ CohereTextVectorizer .. _coheretextvectorizer_api: -.. currentmodule:: redisvl.vectorize.text.cohere +.. currentmodule:: redisvl.utils.vectorize.text.cohere .. autosummary:: diff --git a/redisvl/index/index.py b/redisvl/index/index.py index be146e74..d765cf26 100644 --- a/redisvl/index/index.py +++ b/redisvl/index/index.py @@ -207,7 +207,8 @@ def key_separator(self) -> str: @property def storage_type(self) -> StorageType: - """The underlying storage type for the search index: hash or json.""" + """The underlying storage type for the search index; either + hash or json.""" return self.schema.index.storage_type @property @@ -228,8 +229,8 @@ def from_yaml(cls, schema_path: str, **kwargs): .. code-block:: python from redisvl.index import SearchIndex + index = SearchIndex.from_yaml("schemas/schema.yaml") - index.connect(redis_url="redis://localhost:6379") """ schema = IndexSchema.from_yaml(schema_path) return cls(schema=schema, **kwargs) @@ -249,6 +250,7 @@ def from_dict(cls, schema_dict: Dict[str, Any], **kwargs): .. code-block:: python from redisvl.index import SearchIndex + index = SearchIndex.from_dict({ "index": { "name": "my-index", @@ -259,7 +261,6 @@ def from_dict(cls, schema_dict: Dict[str, Any], **kwargs): {"name": "doc-id", "type": "tag"} ] }) - index.connect(redis_url="redis://localhost:6379") """ schema = IndexSchema.from_dict(schema_dict) @@ -274,12 +275,12 @@ def set_client(self, client: Union[redis.Redis, aredis.Redis]): raise NotImplementedError def disconnect(self): - """Reset the Redis connection.""" + """Disconnect from the Redis database.""" self._redis_client = None return self def key(self, id: str) -> str: - """Create a redis key as a combination of an index key prefix (optional) + """Construct a redis key as a combination of an index key prefix (optional) and specified id. The id is typically either a unique identifier, or @@ -301,10 +302,11 @@ def key(self, id: str) -> str: class SearchIndex(BaseSearchIndex): - """A class for interacting with Redis as a vector database. + """A search index class for interacting with Redis as a vector database. - This class is a wrapper around the redis-py client that provides - purpose-built methods for interacting with Redis as a vector database. + The SearchIndex is instantiated with a reference to a Redis database and an + IndexSchema (YAML path or dictionary object) that describes the various + settings and field configurations. .. code-block:: python @@ -384,7 +386,7 @@ def set_client(self, client: redis.Redis): return self def create(self, overwrite: bool = False, drop: bool = False) -> None: - """Create an index in Redis with the given schema and properties. + """Create an index in Redis with the current schema and properties. Args: overwrite (bool, optional): Whether to overwrite the index if it @@ -460,8 +462,12 @@ def load( preprocess: Optional[Callable] = None, batch_size: Optional[int] = None, ) -> List[str]: - """Load a batch of objects to Redis. Returns the list of keys loaded to - Redis. + """Load objects to the Redis database. Returns the list of keys loaded + to Redis. + + RedisVL automatically handles constructing the object keys, batching, + optional preprocessing steps, and setting optional expiration + (TTL policies) on keys. Args: data (Iterable[Any]): An iterable of objects to store. @@ -487,7 +493,22 @@ def load( .. code-block:: python - keys = index.load([{"test": "foo"}, {"test": "bar"}]) + data = [{"test": "foo"}, {"test": "bar"}] + + # simple case + keys = index.load(data) + + # set 360 second ttl policy on data + keys = index.load(data, ttl=360) + + # load data with predefined keys + keys = index.load(data, keys=["rvl:foo", "rvl:bar"]) + + # load data with preprocessing step + def add_field(d): + d["new_field"] = 123 + return d + keys = index.load(data, preprocess=add_field) """ try: return self._storage.write( @@ -559,6 +580,14 @@ def query(self, query: BaseQuery) -> List[Dict[str, Any]]: .. code-block:: python + from redisvl.query import VectorQuery + + query = VectorQuery( + vector=[0.16, -0.34, 0.98, 0.23], + vector_field_name="embedding", + num_results=3 + ) + results = index.query(query) """ @@ -639,10 +668,12 @@ def info(self) -> Dict[str, Any]: class AsyncSearchIndex(BaseSearchIndex): - """A class for interacting with Redis as a vector database in async mode. + """A search index class for interacting with Redis as a vector database in + async-mode. - This class is a wrapper around the redis-py async client that provides - purpose-built methods for interacting with Redis as a vector database. + The AsyncSearchIndex is instantiated with a reference to a Redis database + and an IndexSchema (YAML path or dictionary object) that describes the + various settings and field configurations. .. code-block:: python @@ -653,7 +684,7 @@ class AsyncSearchIndex(BaseSearchIndex): index.connect(redis_url="redis://localhost:6379") # create the index - await index.create(overwrite=True) + await index.create(overwrite=True) # data is an iterable of dictionaries await index.load(data) @@ -723,7 +754,7 @@ def set_client(self, client: aredis.Redis): return self async def create(self, overwrite: bool = False, drop: bool = False) -> None: - """Asynchronously create an index in Redis with the given schema + """Asynchronously create an index in Redis with the current schema and properties. Args: @@ -802,6 +833,10 @@ async def load( """Asynchronously load objects to Redis with concurrency control. Returns the list of keys loaded to Redis. + RedisVL automatically handles constructing the object keys, batching, + optional preprocessing steps, and setting optional expiration + (TTL policies) on keys. + Args: data (Iterable[Any]): An iterable of objects to store. id_field (Optional[str], optional): Specified field used as the id @@ -826,7 +861,22 @@ async def load( .. code-block:: python - keys = await index.aload([{"test": "foo"}, {"test": "bar"}]) + data = [{"test": "foo"}, {"test": "bar"}] + + # simple case + keys = await index.load(data) + + # set 360 second ttl policy on data + keys = await index.load(data, ttl=360) + + # load data with predefined keys + keys = await index.load(data, keys=["rvl:foo", "rvl:bar"]) + + # load data with preprocessing step + async def add_field(d): + d["new_field"] = 123 + return d + keys = await index.load(data, preprocess=add_field) """ try: @@ -897,7 +947,15 @@ async def query(self, query: BaseQuery) -> List[Dict[str, Any]]: .. code-block:: python - results = await aindex.query(query) + from redisvl.query import VectorQuery + + query = VectorQuery( + vector=[0.16, -0.34, 0.98, 0.23], + vector_field_name="embedding", + num_results=3 + ) + + results = await index.query(query) """ return await self._query(query) diff --git a/redisvl/query/query.py b/redisvl/query/query.py index 82111d04..a76a85b2 100644 --- a/redisvl/query/query.py +++ b/redisvl/query/query.py @@ -25,7 +25,7 @@ def __str__(self) -> str: return " ".join([str(x) for x in self.query.get_args()]) def set_filter(self, filter_expression: Optional[FilterExpression] = None): - """Set the filter for the query. + """Set the filter expression for the query. Args: filter_expression (Optional[FilterExpression], optional): The filter @@ -45,7 +45,7 @@ def set_filter(self, filter_expression: Optional[FilterExpression] = None): ) def get_filter(self) -> FilterExpression: - """Get the filter for the query. + """Get the filter expression for the query. Returns: FilterExpression: The filter for the query. @@ -53,21 +53,19 @@ def get_filter(self) -> FilterExpression: return self._filter def set_paging(self, first: int, limit: int): - """Set the paging parameters for the query to limit the results between - fist and num_results. + """Set the paging parameters for the query to limit the number of + results. Args: first (int): The zero-indexed offset for which to fetch query results - limit (int): _description_ + limit (int): The max number of results to include including the offset Raises: - TypeError: _description_ - TypeError: _description_ + TypeError: If first or limit are NOT integers. """ - if not isinstance(first, int): - raise TypeError("first must be of type int") - if not isinstance(limit, int): - raise TypeError("limit must be of type int") + if not isinstance(first, int) or not isinstance(limit, int): + raise TypeError("Paging params must both be integers") + self._first = first self._limit = limit @@ -87,7 +85,7 @@ def __init__( dialect: int = 2, params: Optional[Dict[str, Any]] = None, ): - """Query for a simple count operation provided some filter expression. + """A query for a simple count operation provided some filter expression. Args: filter_expression (FilterExpression): The filter expression to query for. @@ -100,9 +98,11 @@ def __init__( from redisvl.query import CountQuery from redisvl.query.filter import Tag + t = Tag("brand") == "Nike" - q = CountQuery(filter_expression=t) - count = index.query(q) + query = CountQuery(filter_expression=t) + + count = index.query(query) """ super().__init__(num_results=0, dialect=dialect) self.set_filter(filter_expression) @@ -113,7 +113,7 @@ def query(self) -> Query: """The loaded Redis-Py query. Returns: - redis.commands.search.query.Query: The query object. + redis.commands.search.query.Query: The Redis-Py query object. """ base_query = str(self._filter) query = Query(base_query).no_content().paging(0, 0).dialect(self._dialect) @@ -138,7 +138,7 @@ def __init__( dialect: int = 2, params: Optional[Dict[str, Any]] = None, ): - """Query for a filter expression. + """A query for a running a filtered search with a filter expression. Args: filter_expression (FilterExpression): The filter expression to @@ -157,6 +157,7 @@ def __init__( from redisvl.query import FilterQuery from redisvl.query.filter import Tag + t = Tag("brand") == "Nike" q = FilterQuery(return_fields=["brand", "price"], filter_expression=t) @@ -170,7 +171,7 @@ def query(self) -> Query: """Return a Redis-Py Query object representing the query. Returns: - redis.commands.search.query.Query: The query object. + redis.commands.search.query.Query: The Redis-Py query object. """ base_query = str(self._filter) query = ( @@ -223,21 +224,31 @@ def __init__( return_score: bool = True, dialect: int = 2, ): - """Query for vector fields. - - Read more: https://redis.io/docs/interact/search-and-query/search/vectors/#knn-search + """A query for running a vector search along with an optional filter + expression. Args: - vector (List[float]): The vector to query for. - vector_field_name (str): The name of the vector field. - return_fields (List[str]): The fields to return. - filter_expression (FilterExpression, optional): A filter to apply to the query. Defaults to None. - dtype (str, optional): The dtype of the vector. Defaults to "float32". - num_results (Optional[int], optional): The number of results to return. Defaults to 10. - return_score (bool, optional): Whether to return the score. Defaults to True. + vector (List[float]): The vector to perform the vector search with. + vector_field_name (str): The name of the vector field to search + against in the database. + return_fields (List[str]): The declared fields to return with search + results. + filter_expression (FilterExpression, optional): A filter to apply + along with the vector search. Defaults to None. + dtype (str, optional): The dtype of the vector. Defaults to + "float32". + num_results (int, optional): The top k results to return from the + vector search. Defaults to 10. + return_score (bool, optional): Whether to return the vector + distance. Defaults to True. + dialect (int, optional): The RediSearch query dialect. + Defaults to 2. Raises: TypeError: If filter_expression is not of type redisvl.query.FilterExpression + + Note: + Learn more about vector queries in Redis: https://redis.io/docs/interact/search-and-query/search/vectors/#knn-search """ super().__init__( vector, @@ -255,7 +266,7 @@ def query(self) -> Query: """Return a Redis-Py Query object representing the query. Returns: - redis.commands.search.query.Query: The query object. + redis.commands.search.query.Query: The Redis-Py query object. """ base_query = f"{str(self._filter)}=>[KNN {self._num_results} @{self._field} ${self.VECTOR_PARAM} AS {self.DISTANCE_ID}]" query = ( @@ -297,26 +308,35 @@ def __init__( return_score: bool = True, dialect: int = 2, ): - """Vector query by distance range. - - Range queries are for filtering vector search results - by the distance between a vector field value and a query - vector, in terms of the index distance metric. - - Read more: https://redis.io/docs/interact/search-and-query/search/vectors/#range-query + """A query for running a filtered vector search based on semantic + distance threshold. Args: - vector (List[float]): The vector to query for. - vector_field_name (str): The name of the vector field. - return_fields (List[str]): The fields to return. - filter_expression (FilterExpression, optional): A filter to apply to the query. Defaults to None. - dtype (str, optional): The dtype of the vector. Defaults to "float32". - distance_threshold (str, float): The threshold for vector distance. Defaults to 0.2. - num_results (int): The MAX number of results to return. defaults to 10. - return_score (bool, optional): Whether to return the score. Defaults to True. + vector (List[float]): The vector to perform the range query with. + vector_field_name (str): The name of the vector field to search + against in the database. + return_fields (List[str]): The declared fields to return with search + results. + filter_expression (FilterExpression, optional): A filter to apply + along with the range query. Defaults to None. + dtype (str, optional): The dtype of the vector. Defaults to + "float32". + distance_threshold (str, float): The threshold for vector distance. + A smaller threshold indicates a stricter semantic search. + Defaults to 0.2. + num_results (int): The MAX number of results to return. + Defaults to 10. + return_score (bool, optional): Whether to return the vector + distance. Defaults to True. + dialect (int, optional): The RediSearch query dialect. + Defaults to 2. Raises: TypeError: If filter_expression is not of type redisvl.query.FilterExpression + + Note: + Learn more about vector range queries: https://redis.io/docs/interact/search-and-query/search/vectors/#range-query + """ super().__init__( vector, @@ -354,7 +374,7 @@ def query(self) -> Query: """Return a Redis-Py Query object representing the query. Returns: - redis.commands.search.query.Query: The query object. + redis.commands.search.query.Query: The Redis-Py query object. """ base_query = f"@{self._field}:[VECTOR_RANGE ${self.DISTANCE_THRESHOLD_PARAM} ${self.VECTOR_PARAM}]" diff --git a/redisvl/schema/schema.py b/redisvl/schema/schema.py index 60d72bae..630c5660 100644 --- a/redisvl/schema/schema.py +++ b/redisvl/schema/schema.py @@ -28,11 +28,30 @@ class StorageType(Enum): class IndexInfo(BaseModel): - """ - Represents the basic configuration information for an index in Redis. + """Index info includes the essential details regarding index settings, + such as its name, prefix, key separator, and storage type in Redis. + + In yaml format, the index info section looks like: + + .. code-block:: yaml + + index: + name: user-index + prefix: user + key_separtor: ':' + storage_type: json + + In dict format, the index info section looks like: + + .. code-block:: python + + {"index": { + "name": "user-index", + "prefix": "user", + "key_separator": ":", + "storage_type": "json" + }} - This class includes the essential details required to define an index, such as - its name, prefix, key separator, and storage type. """ name: str @@ -54,12 +73,8 @@ def dict(self, *args, **kwargs) -> Dict[str, Any]: class IndexSchema(BaseModel): - """Represents a schema definition for a search index in Redis, primarily - used in RedisVL for organizing and querying vector and metadata fields. - - This schema provides a structured format to define the layout and types of - fields stored in Redis, including details such as storage type, field - definitions, and key formatting conventions. + """A schema definition for a search index in Redis, used in RedisVL for + configuring index settings and organizing vector and metadata fields. The class offers methods to create an index schema from a YAML file or a Python dictionary, supporting flexible schema definitions and easy @@ -74,6 +89,7 @@ class IndexSchema(BaseModel): index: name: user-index prefix: user + key_separator: ":" storage_type: json fields: @@ -89,20 +105,25 @@ class IndexSchema(BaseModel): distance_metric: cosine datatype: float32 - Loading the schema with RedisVL using yaml or dict format: + Loading the schema for RedisVL from yaml is as simple as: .. code-block:: python from redisvl.schema import IndexSchema - # From YAML schema = IndexSchema.from_yaml("schema.yaml") - # From Dict + Loading the schema for RedisVL from dict is as simple as: + + .. code-block:: python + + from redisvl.schema import IndexSchema + schema = IndexSchema.from_dict({ "index": { "name": "user-index", "prefix": "user", + "key_separator": ":", "storage_type": "json", }, "fields": [ From f9147256de804d62a9131aa400e191a5abaf6015 Mon Sep 17 00:00:00 2001 From: Tyler Hutcherson Date: Mon, 5 Feb 2024 23:32:05 -0500 Subject: [PATCH 12/18] fix formatting --- redisvl/index/index.py | 2 +- tests/integration/test_llmcache.py | 30 ++++++++--- tests/integration/test_search_results.py | 5 +- tests/unit/test_async_search_index.py | 4 +- tests/unit/test_search_index.py | 13 ++++- tests/unit/test_utils.py | 69 ++++++++++++++++-------- 6 files changed, 90 insertions(+), 33 deletions(-) diff --git a/redisvl/index/index.py b/redisvl/index/index.py index d765cf26..40847318 100644 --- a/redisvl/index/index.py +++ b/redisvl/index/index.py @@ -668,7 +668,7 @@ def info(self) -> Dict[str, Any]: class AsyncSearchIndex(BaseSearchIndex): - """A search index class for interacting with Redis as a vector database in + """A search index class for interacting with Redis as a vector database in async-mode. The AsyncSearchIndex is instantiated with a reference to a Redis database diff --git a/tests/integration/test_llmcache.py b/tests/integration/test_llmcache.py index de7495b0..d08cbaa8 100644 --- a/tests/integration/test_llmcache.py +++ b/tests/integration/test_llmcache.py @@ -1,11 +1,12 @@ +from collections import namedtuple from time import sleep import pytest from redisvl.extensions.llmcache import SemanticCache -from redisvl.utils.vectorize import HFTextVectorizer from redisvl.index.index import SearchIndex -from collections import namedtuple +from redisvl.utils.vectorize import HFTextVectorizer + @pytest.fixture def vectorizer(): @@ -19,11 +20,13 @@ def cache(vectorizer): cache_instance.clear() # Clear cache after each test cache_instance._index.delete(True) # Clean up index + @pytest.fixture def cache_no_cleanup(vectorizer): cache_instance = SemanticCache(vectorizer=vectorizer, distance_threshold=0.2) yield cache_instance + @pytest.fixture def cache_with_ttl(vectorizer): cache_instance = SemanticCache(vectorizer=vectorizer, distance_threshold=0.2, ttl=2) @@ -31,13 +34,17 @@ def cache_with_ttl(vectorizer): cache_instance.clear() # Clear cache after each test cache_instance._index.delete(True) # Clean up index + @pytest.fixture def cache_with_redis_client(vectorizer, client): - cache_instance = SemanticCache(vectorizer=vectorizer, redis_client=client, distance_threshold=0.2) + cache_instance = SemanticCache( + vectorizer=vectorizer, redis_client=client, distance_threshold=0.2 + ) yield cache_instance cache_instance.clear() # Clear cache after each test cache_instance._index.delete(True) # Clean up index + # Test basic store and check functionality def test_store_and_check(cache, vectorizer): prompt = "This is a test prompt." @@ -94,11 +101,13 @@ def test_check_invalid_input(cache): with pytest.raises(TypeError): cache.check(prompt="test", return_fields="bad value") + # Test handling invalid input for check method def test_bad_ttl(cache): with pytest.raises(ValueError): cache.set_ttl(2.5) + # Test storing with metadata def test_store_with_metadata(cache, vectorizer): prompt = "This is another test prompt." @@ -115,17 +124,21 @@ def test_store_with_metadata(cache, vectorizer): assert check_result[0]["metadata"] == metadata assert check_result[0]["prompt"] == prompt + # Test storing with invalid metadata def test_store_with_invalid_metadata(cache, vectorizer): prompt = "This is another test prompt." response = "This is another test response." - metadata = namedtuple('metadata', 'source')(**{'source': 'test'}) + metadata = namedtuple("metadata", "source")(**{"source": "test"}) vector = vectorizer.embed(prompt) - with pytest.raises(TypeError, match=r"If specified, cached metadata must be a dictionary."): + with pytest.raises( + TypeError, match=r"If specified, cached metadata must be a dictionary." + ): cache.store(prompt, response, vector=vector, metadata=metadata) + # Test setting and getting the distance threshold def test_distance_threshold(cache): initial_threshold = cache.distance_threshold @@ -135,12 +148,14 @@ def test_distance_threshold(cache): assert cache.distance_threshold == new_threshold assert cache.distance_threshold != initial_threshold + # Test out of range distance threshold def test_distance_threshold_out_of_range(cache): out_of_range_threshold = -1 with pytest.raises(ValueError): cache.set_threshold(out_of_range_threshold) + # Test storing and retrieving multiple items def test_multiple_items(cache, vectorizer): prompts_responses = { @@ -161,10 +176,12 @@ def test_multiple_items(cache, vectorizer): assert check_result[0]["response"] == expected_response assert "metadata" not in check_result[0] + # Test retrieving underlying SearchIndex for the cache. def test_get_index(cache): assert isinstance(cache.index, SearchIndex) + # Test basic functionality with cache created with user-provided Redis client def test_store_and_check_with_provided_client(cache_with_redis_client, vectorizer): prompt = "This is a test prompt." @@ -179,7 +196,8 @@ def test_store_and_check_with_provided_client(cache_with_redis_client, vectorize assert response == check_result[0]["response"] assert "metadata" not in check_result[0] + # Test deleting the cache def test_delete(cache_no_cleanup, vectorizer): cache_no_cleanup.delete() - assert not cache_no_cleanup.index.exists() \ No newline at end of file + assert not cache_no_cleanup.index.exists() diff --git a/tests/integration/test_search_results.py b/tests/integration/test_search_results.py index 15b2048a..ed7096b7 100644 --- a/tests/integration/test_search_results.py +++ b/tests/integration/test_search_results.py @@ -4,6 +4,7 @@ from redisvl.query import FilterQuery from redisvl.query.filter import Tag + @pytest.fixture def filter_query(): return FilterQuery( @@ -11,6 +12,7 @@ def filter_query(): filter_expression=Tag("credit_score") == "high", ) + @pytest.fixture def index(sample_data): fields_spec = [ @@ -57,6 +59,7 @@ def index(sample_data): # clean up index.delete(drop=True) + def test_process_results_unpacks_json_properly(index, filter_query): results = index.query(filter_query) - assert len(results) == 4 \ No newline at end of file + assert len(results) == 4 diff --git a/tests/unit/test_async_search_index.py b/tests/unit/test_async_search_index.py index 1676ff7d..ff1c7d9f 100644 --- a/tests/unit/test_async_search_index.py +++ b/tests/unit/test_async_search_index.py @@ -1,9 +1,9 @@ import pytest from redisvl.index import AsyncSearchIndex +from redisvl.query import VectorQuery from redisvl.redis.utils import convert_bytes from redisvl.schema import IndexSchema, StorageType -from redisvl.query import VectorQuery fields = [{"name": "test", "type": "tag"}] @@ -148,6 +148,7 @@ async def test_check_index_exists_before_delete(async_client, async_index): with pytest.raises(ValueError): await async_index.delete() + @pytest.mark.asyncio async def test_check_index_exists_before_search(async_client, async_index): async_index.set_client(async_client) @@ -163,6 +164,7 @@ async def test_check_index_exists_before_search(async_client, async_index): with pytest.raises(ValueError): await async_index.search(query.query, query_params=query.params) + @pytest.mark.asyncio async def test_check_index_exists_before_info(async_client, async_index): async_index.set_client(async_client) diff --git a/tests/unit/test_search_index.py b/tests/unit/test_search_index.py index 4e83c9ab..8a27e5b3 100644 --- a/tests/unit/test_search_index.py +++ b/tests/unit/test_search_index.py @@ -1,9 +1,9 @@ import pytest from redisvl.index import SearchIndex +from redisvl.query import VectorQuery from redisvl.redis.utils import convert_bytes from redisvl.schema import IndexSchema, StorageType -from redisvl.query import VectorQuery fields = [{"name": "test", "type": "tag"}] @@ -12,14 +12,17 @@ def index_schema(): return IndexSchema.from_dict({"index": {"name": "my_index"}, "fields": fields}) + @pytest.fixture def index(index_schema): return SearchIndex(schema=index_schema) + @pytest.fixture def index_from_yaml(): return SearchIndex.from_yaml("schemas/test_json_schema.yaml") + def test_search_index_properties(index_schema, index): assert index.schema == index_schema # custom settings @@ -31,6 +34,7 @@ def test_search_index_properties(index_schema, index): assert index.storage_type == index_schema.index.storage_type == StorageType.HASH assert index.key("foo").startswith(index.prefix) + def test_search_index_from_yaml(index_from_yaml): assert index_from_yaml.name == "json-test" assert index_from_yaml.client == None @@ -39,6 +43,7 @@ def test_search_index_from_yaml(index_from_yaml): assert index_from_yaml.storage_type == StorageType.JSON assert index_from_yaml.key("foo").startswith(index_from_yaml.prefix) + def test_search_index_no_prefix(index_schema): # specify an explicitly empty prefix... index_schema.index.prefix = "" @@ -129,6 +134,7 @@ def test_no_id_field(client, index): with pytest.raises(ValueError): index.load(bad_data, id_field="key") + def test_check_index_exists_before_delete(client, index): index.set_client(client) index.create(overwrite=True, drop=True) @@ -136,6 +142,7 @@ def test_check_index_exists_before_delete(client, index): with pytest.raises(ValueError): index.delete() + def test_check_index_exists_before_search(client, index): index.set_client(client) index.create(overwrite=True, drop=True) @@ -150,6 +157,7 @@ def test_check_index_exists_before_search(client, index): with pytest.raises(ValueError): index.search(query.query, query_params=query.params) + def test_check_index_exists_before_info(client, index): index.set_client(client) index.create(overwrite=True, drop=True) @@ -158,6 +166,7 @@ def test_check_index_exists_before_info(client, index): with pytest.raises(ValueError): index.info() + def test_index_needs_valid_schema(): with pytest.raises(ValueError, match=r"Must provide a valid IndexSchema object"): - index = SearchIndex(schema="Not A Valid Schema") \ No newline at end of file + index = SearchIndex(schema="Not A Valid Schema") diff --git a/tests/unit/test_utils.py b/tests/unit/test_utils.py index 9a3c45db..ca535c5a 100644 --- a/tests/unit/test_utils.py +++ b/tests/unit/test_utils.py @@ -1,45 +1,58 @@ -import pytest import numpy as np -from redisvl.redis.utils import make_dict, buffer_to_array, convert_bytes, array_to_buffer +import pytest + +from redisvl.redis.utils import ( + array_to_buffer, + buffer_to_array, + convert_bytes, + make_dict, +) + def test_even_number_of_elements(): """Test with an even number of elements""" - values = ['key1', 'value1', 'key2', 'value2'] - expected = {'key1': 'value1', 'key2': 'value2'} + values = ["key1", "value1", "key2", "value2"] + expected = {"key1": "value1", "key2": "value2"} assert make_dict(values) == expected + def test_odd_number_of_elements(): """Test with an odd number of elements - expecting the last element to be ignored""" - values = ['key1', 'value1', 'key2'] - expected = {'key1': 'value1'} # 'key2' has no pair, so it's ignored + values = ["key1", "value1", "key2"] + expected = {"key1": "value1"} # 'key2' has no pair, so it's ignored assert make_dict(values) == expected + def test_different_data_types(): """Test with different data types as keys and values""" - values = [1, 'one', 2.0, 'two'] - expected = {1: 'one', 2.0: 'two'} + values = [1, "one", 2.0, "two"] + expected = {1: "one", 2.0: "two"} assert make_dict(values) == expected + def test_empty_list(): """Test with an empty list""" values = [] expected = {} assert make_dict(values) == expected + def test_with_complex_objects(): """Test with complex objects like lists and dicts as values""" - key = 'a list' + key = "a list" value = [1, 2, 3] values = [key, value] expected = {key: value} assert make_dict(values) == expected + def test_simple_byte_buffer_to_floats(): """Test conversion of a simple byte buffer into floats""" buffer = np.array([1.0, 2.0, 3.0], dtype=np.float32).tobytes() expected = [1.0, 2.0, 3.0] assert buffer_to_array(buffer, dtype=np.float32) == expected + def test_different_data_types(): """Test conversion with different data types""" # Integer test @@ -52,54 +65,63 @@ def test_different_data_types(): expected = [1.0, 2.0, 3.0] assert buffer_to_array(buffer, dtype=np.float64) == expected + def test_empty_byte_buffer(): """Test conversion of an empty byte buffer""" - buffer = b'' + buffer = b"" expected = [] assert buffer_to_array(buffer, dtype=np.float32) == expected + def test_plain_bytes_to_string(): """Test conversion of plain bytes to string""" - data = b'hello world' - expected = 'hello world' + data = b"hello world" + expected = "hello world" assert convert_bytes(data) == expected + def test_bytes_in_dict(): """Test conversion of bytes in a dictionary, including nested dictionaries""" - data = {'key': b'value', 'nested': {'nkey': b'nvalue'}} - expected = {'key': 'value', 'nested': {'nkey': 'nvalue'}} + data = {"key": b"value", "nested": {"nkey": b"nvalue"}} + expected = {"key": "value", "nested": {"nkey": "nvalue"}} assert convert_bytes(data) == expected + def test_bytes_in_list(): """Test conversion of bytes in a list, including nested lists""" - data = [b'item1', b'item2', ['nested', b'nested item']] - expected = ['item1', 'item2', ['nested', 'nested item']] + data = [b"item1", b"item2", ["nested", b"nested item"]] + expected = ["item1", "item2", ["nested", "nested item"]] assert convert_bytes(data) == expected + def test_bytes_in_tuple(): """Test conversion of bytes in a tuple, including nested tuples""" - data = (b'item1', b'item2', ('nested', b'nested item')) - expected = ('item1', 'item2', ('nested', 'nested item')) + data = (b"item1", b"item2", ("nested", b"nested item")) + expected = ("item1", "item2", ("nested", "nested item")) assert convert_bytes(data) == expected + def test_non_bytes_data(): """Test handling of non-bytes data types""" - data = 'already a string' - expected = 'already a string' + data = "already a string" + expected = "already a string" assert convert_bytes(data) == expected + def test_bytes_with_invalid_utf8(): """Test handling bytes that cannot be decoded with UTF-8""" - data = b'\xff\xff' # Invalid in UTF-8 + data = b"\xff\xff" # Invalid in UTF-8 expected = data assert convert_bytes(data) == expected + def test_simple_list_to_bytes_default_dtype(): """Test conversion of a simple list of floats to bytes using the default dtype""" array = [1.0, 2.0, 3.0] expected = np.array(array, dtype=np.float32).tobytes() assert array_to_buffer(array) == expected + def test_list_to_bytes_non_default_dtype(): """Test conversion with a non-default dtype""" array = [1.0, 2.0, 3.0] @@ -107,12 +129,14 @@ def test_list_to_bytes_non_default_dtype(): expected = np.array(array, dtype=dtype).tobytes() assert array_to_buffer(array, dtype=dtype) == expected + def test_empty_list_to_bytes(): """Test conversion of an empty list""" array = [] expected = np.array(array, dtype=np.float32).tobytes() assert array_to_buffer(array) == expected + @pytest.mark.parametrize("dtype", [np.int32, np.float64]) def test_conversion_with_various_dtypes(dtype): """Test conversion of a list of floats to bytes with various dtypes""" @@ -120,8 +144,9 @@ def test_conversion_with_various_dtypes(dtype): expected = np.array(array, dtype=dtype).tobytes() assert array_to_buffer(array, dtype=dtype) == expected + def test_conversion_with_invalid_floats(): """Test conversion with invalid float values (numpy should handle them)""" - array = [float('inf'), float('-inf'), float('nan')] + array = [float("inf"), float("-inf"), float("nan")] result = array_to_buffer(array) assert len(result) > 0 # Simple check to ensure it returns anything From ad364dcd5ebe88adde2d87aa39e0618e720ea0ec Mon Sep 17 00:00:00 2001 From: Tyler Hutcherson Date: Tue, 6 Feb 2024 08:58:21 -0500 Subject: [PATCH 13/18] use paginate instead of query_batch --- redisvl/index/index.py | 115 ++++++++++++++++++-------------- tests/integration/test_query.py | 12 ++-- 2 files changed, 71 insertions(+), 56 deletions(-) diff --git a/redisvl/index/index.py b/redisvl/index/index.py index 40847318..ff6b1359 100644 --- a/redisvl/index/index.py +++ b/redisvl/index/index.py @@ -593,45 +593,53 @@ def query(self, query: BaseQuery) -> List[Dict[str, Any]]: """ return self._query(query) - def query_batch(self, query: BaseQuery, batch_size: int = 30) -> Generator: - """Execute a query on the index while batching results. + def paginate(self, query: BaseQuery, page_size: int = 30) -> Generator: + """Execute a given query against the index and return results in + paginated batches. - This method takes a BaseQuery object directly, handles optional paging - support, and post-processing of the search results. + This method accepts a RedisVL query instance, enabling pagination of + results which allows for subsequent processing over each batch with a + generator. Args: - query (BaseQuery): The query to run. - batch_size (int): The size of batches to return on each iteration. + query (BaseQuery): The search query to be executed. + page_size (int, optional): The number of results to return in each + batch. Defaults to 30. - Returns: - List[Result]: A list of search results. + Yields: + A generator yielding batches of search results. Raises: - TypeError: If the batch size is not an integer - ValueError: If the batch size is less than or equal to zero. - - .. code-block:: python + TypeError: If the page_size argument is not of type int. + ValueError: If the page_size argument is less than or equal to zero. - for batch in index.query_batch(query, batch_size=10): - # process batched results + Example: + # Iterate over paginated search results in batches of 10 + for result_batch in index.paginate(query, page_size=10): + # Process each batch of results pass + Note: + The page_size parameter controls the number of items each result + batch contains. Adjust this value based on performance + considerations and the expected volume of search results. + """ - if not isinstance(batch_size, int): - raise TypeError("batch_size must be an integer") + if not isinstance(page_size, int): + raise TypeError("page_size must be an integer") - if batch_size <= 0: - raise ValueError("batch_size must be greater than 0") + if page_size <= 0: + raise ValueError("page_size must be greater than 0") - first = 0 + offset = 0 while True: - query.set_paging(first, batch_size) - batch_results = self._query(query) - if not batch_results: + query.set_paging(offset, page_size) + results = self._query(query) + if not results: break - yield batch_results - # increment the pagination tracker - first += batch_size + yield results + # Increment the offset for the next batch of pagination + offset += page_size def listall(self) -> List[str]: """List all search indices in Redis database. @@ -959,46 +967,53 @@ async def query(self, query: BaseQuery) -> List[Dict[str, Any]]: """ return await self._query(query) - async def query_batch( - self, query: BaseQuery, batch_size: int = 30 - ) -> AsyncGenerator: - """Execute a query on the index with batching. + async def paginate(self, query: BaseQuery, page_size: int = 30) -> AsyncGenerator: + """Execute a given query against the index and return results in + paginated batches. - This method takes a BaseQuery object directly, handles optional paging - support, and post-processing of the search results. + This method accepts a RedisVL query instance, enabling async pagination + of results which allows for subsequent processing over each batch with a + generator. Args: - query (BaseQuery): The query to run. - batch_size (int): The size of batches to return on each iteration. + query (BaseQuery): The search query to be executed. + page_size (int, optional): The number of results to return in each + batch. Defaults to 30. - Returns: - List[Result]: A list of search results. + Yields: + An async generator yielding batches of search results. Raises: - TypeError: If the batch size is not an integer - ValueError: If the batch size is less than or equal to zero. + TypeError: If the page_size argument is not of type int. + ValueError: If the page_size argument is less than or equal to zero. - .. code-block:: python - - async for batch in index.query_batch(query, batch_size=10): - # process batched results + Example: + # Iterate over paginated search results in batches of 10 + async for result_batch in index.paginate(query, page_size=10): + # Process each batch of results pass + + Note: + The page_size parameter controls the number of items each result + batch contains. Adjust this value based on performance + considerations and the expected volume of search results. + """ - if not isinstance(batch_size, int): - raise TypeError("batch_size must be an integer") + if not isinstance(page_size, int): + raise TypeError("page_size must be an integer") - if batch_size <= 0: - raise ValueError("batch_size must be greater than 0") + if page_size <= 0: + raise ValueError("page_size must be greater than 0") first = 0 while True: - query.set_paging(first, batch_size) - batch_results = await self._query(query) - if not batch_results: + query.set_paging(first, page_size) + results = await self._query(query) + if not results: break - yield batch_results + yield results # increment the pagination tracker - first += batch_size + first += page_size async def listall(self) -> List[str]: """List all search indices in Redis database. diff --git a/tests/integration/test_query.py b/tests/integration/test_query.py index fb5b3f74..35aea066 100644 --- a/tests/integration/test_query.py +++ b/tests/integration/test_query.py @@ -300,10 +300,10 @@ def test_filter_combinations(index, query): search(query, index, n & t & g, 1, age_range=(18, 99), location="-122.4194,37.7749") -def test_query_batch_vector_query(index, vector_query, sample_data): +def test_paginate_vector_query(index, vector_query, sample_data): batch_size = 2 all_results = [] - for i, batch in enumerate(index.query_batch(vector_query, batch_size), start=1): + for i, batch in enumerate(index.paginate(vector_query, batch_size), start=1): all_results.extend(batch) assert len(batch) <= batch_size @@ -313,10 +313,10 @@ def test_query_batch_vector_query(index, vector_query, sample_data): assert i == expected_iterations -def test_query_batch_filter_query(index, filter_query): +def test_paginate_filter_query(index, filter_query): batch_size = 3 all_results = [] - for i, batch in enumerate(index.query_batch(filter_query, batch_size), start=1): + for i, batch in enumerate(index.paginate(filter_query, batch_size), start=1): all_results.extend(batch) assert len(batch) <= batch_size @@ -327,10 +327,10 @@ def test_query_batch_filter_query(index, filter_query): assert all(item["credit_score"] == "high" for item in all_results) -def test_query_batch_range_query(index, range_query): +def test_paginate_range_query(index, range_query): batch_size = 1 all_results = [] - for i, batch in enumerate(index.query_batch(range_query, batch_size), start=1): + for i, batch in enumerate(index.paginate(range_query, batch_size), start=1): all_results.extend(batch) assert len(batch) <= batch_size From d1442e3f5a1801f58da26be069b6a2c026988c51 Mon Sep 17 00:00:00 2001 From: Tyler Hutcherson Date: Tue, 6 Feb 2024 13:15:19 -0500 Subject: [PATCH 14/18] docstring updates --- docs/api/cache.rst | 13 ------ docs/api/filter.rst | 52 +++------------------- docs/api/schema.rst | 89 ++++++++++++++++++++++++++++-------- docs/api/vectorizer.rst | 36 ++------------- redisvl/query/filter.py | 99 ++++++++++++++++++++++------------------- 5 files changed, 133 insertions(+), 156 deletions(-) diff --git a/docs/api/cache.rst b/docs/api/cache.rst index 9c08be5d..7e34ee1f 100644 --- a/docs/api/cache.rst +++ b/docs/api/cache.rst @@ -10,19 +10,6 @@ SemanticCache .. currentmodule:: redisvl.extensions.llmcache -.. autosummary:: - - SemanticCache.__init__ - SemanticCache.check - SemanticCache.store - SemanticCache.clear - SemanticCache.delete - SemanticCache.distance_threshold - SemanticCache.set_threshold - SemanticCache.ttl - SemanticCache.set_ttl - - .. autoclass:: SemanticCache :show-inheritance: :members: diff --git a/docs/api/filter.rst b/docs/api/filter.rst index ddc00092..bcd11ab3 100644 --- a/docs/api/filter.rst +++ b/docs/api/filter.rst @@ -16,20 +16,10 @@ Tag .. currentmodule:: redisvl.query.filter -.. autosummary:: - - Tag.__init__ - Tag.__eq__ - Tag.__ne__ - Tag.__str__ - - .. autoclass:: Tag - :show-inheritance: :members: :special-members: - :inherited-members: - + :exclude-members: __hash__ Text @@ -38,19 +28,11 @@ Text .. currentmodule:: redisvl.query.filter -.. autosummary:: - - Text.__init__ - Text.__eq__ - Text.__ne__ - Text.__mod__ - Text.__str__ - .. autoclass:: Text - :show-inheritance: :members: :special-members: + :exclude-members: __hash__ Num @@ -59,22 +41,11 @@ Num .. currentmodule:: redisvl.query.filter -.. autosummary:: - - Num.__init__ - Num.__eq__ - Num.__ne__ - Num.__lt__ - Num.__le__ - Num.__gt__ - Num.__ge__ - Num.__str__ - .. autoclass:: Num - :show-inheritance: :members: :special-members: + :exclude-members: __hash__ Geo @@ -82,17 +53,10 @@ Geo .. currentmodule:: redisvl.query.filter -.. autosummary:: - - Geo.__init__ - Geo.__eq__ - Geo.__ne__ - Geo.__str__ - .. autoclass:: Geo - :show-inheritance: :members: :special-members: + :exclude-members: __hash__ GeoRadius @@ -100,11 +64,7 @@ GeoRadius .. currentmodule:: redisvl.query.filter -.. autosummary:: - - GeoRadius.__init__ - .. autoclass:: GeoRadius - :show-inheritance: :members: - :special-members: \ No newline at end of file + :special-members: + :exclude-members: __hash__ diff --git a/docs/api/schema.rst b/docs/api/schema.rst index e8f7df3d..ebe4ca8a 100644 --- a/docs/api/schema.rst +++ b/docs/api/schema.rst @@ -1,4 +1,3 @@ - *********** Schema *********** @@ -19,6 +18,7 @@ field configurations using the following three components: * - `fields` - Subset of fields within your data to include in the index and any custom settings. + IndexSchema =========== @@ -28,24 +28,75 @@ IndexSchema .. autoclass:: IndexSchema :members: - :exclude-members: generate_fields,validate_and_create_fields + :exclude-members: generate_fields,validate_and_create_fields,redis_fields -Supported Field Types -===================== -.. list-table:: - :widths: 20 80 - :header-rows: 1 +Defining Fields +=============== - * - Field Type - - Description - * - `vector` - - Vector embeddings data typically generated from another AI/ML model to represent unstructured data. - * - `text` - - Full text data that enable full text search and filtering operations. - * - `tag` - - Label-like fields that are used for exact matches and filtering operations. - * - `numeric` - - Numeric fields used for range filters. - * - `geo` - - Geographic coordinates used for geo search. +Fields in the schema can be defined in YAML format or as a Python dictionary, specifying a name, type, an optional path, and attributes for customization. + +**YAML Example**: + +.. code-block:: yaml + + - name: title + type: text + path: $.document.title + attrs: + weight: 1.0 + no_stem: false + withsuffixtrie: true + +**Python Dictionary Example**: + +.. code-block:: python + + { + "name": "location", + "type": "geo", + "attrs": { + "sortable": true + } + } + +Supported Field Types and Attributes +==================================== + +Each field type supports specific attributes that customize its behavior. Below are the field types and their available attributes: + +**Text Field Attributes**: + +- `weight`: Importance of the field in result calculation. +- `no_stem`: Disables stemming during indexing. +- `withsuffixtrie`: Optimizes queries by maintaining a suffix trie. +- `phonetic_matcher`: Enables phonetic matching. +- `sortable`: Allows sorting on this field. + +**Tag Field Attributes**: + +- `separator`: Character for splitting text into individual tags. +- `case_sensitive`: Case sensitivity in tag matching. +- `withsuffixtrie`: Suffix trie optimization for queries. +- `sortable`: Enables sorting based on the tag field. + +**Numeric and Geo Field Attributes**: + +- Both numeric and geo fields support the `sortable` attribute, enabling sorting on these fields. + +**Common Vector Field Attributes**: + +- `dims`: Dimensionality of the vector. +- `algorithm`: Indexing algorithm (`flat` or `hnsw`). +- `datatype`: Float datatype of the vector (`float32` or `float64`). +- `distance_metric`: Metric for measuring query relevance (`COSINE`, `L2`, `IP`). + +**HNSW Vector Field Specific Attributes**: + +- `m`: Max outgoing edges per node in each layer. +- `ef_construction`: Max edge candidates during build time. +- `ef_runtime`: Max top candidates during search. +- `epsilon`: Range search boundary factor. + +Note: + See fully documented Redis-supported fields and options here: https://redis.io/commands/ft.create/ \ No newline at end of file diff --git a/docs/api/vectorizer.rst b/docs/api/vectorizer.rst index f84fbc23..61dd432c 100644 --- a/docs/api/vectorizer.rst +++ b/docs/api/vectorizer.rst @@ -1,7 +1,7 @@ -********** -Vectorizer -********** +*********** +Vectorizers +*********** HFTextVectorizer ================ @@ -10,15 +10,8 @@ HFTextVectorizer .. currentmodule:: redisvl.utils.vectorize.text.huggingface -.. autosummary:: - - HFTextVectorizer.__init__ - HFTextVectorizer.embed - HFTextVectorizer.embed_many - .. autoclass:: HFTextVectorizer :show-inheritance: - :inherited-members: :members: @@ -29,17 +22,8 @@ OpenAITextVectorizer .. currentmodule:: redisvl.utils.vectorize.text.openai -.. autosummary:: - - OpenAITextVectorizer.__init__ - OpenAITextVectorizer.embed - OpenAITextVectorizer.embed_many - OpenAITextVectorizer.aembed - OpenAITextVectorizer.aembed_many - .. autoclass:: OpenAITextVectorizer :show-inheritance: - :inherited-members: :members: @@ -50,15 +34,8 @@ VertexAITextVectorizer .. currentmodule:: redisvl.utils.vectorize.text.vertexai -.. autosummary:: - - VertexAITextVectorizer.__init__ - VertexAITextVectorizer.embed - VertexAITextVectorizer.embed_many - .. autoclass:: VertexAITextVectorizer :show-inheritance: - :inherited-members: :members: @@ -69,14 +46,7 @@ CohereTextVectorizer .. currentmodule:: redisvl.utils.vectorize.text.cohere -.. autosummary:: - - CohereTextVectorizer.__init__ - CohereTextVectorizer.embed - CohereTextVectorizer.embed_many - .. autoclass:: CohereTextVectorizer :show-inheritance: - :inherited-members: :members: diff --git a/redisvl/query/filter.py b/redisvl/query/filter.py index 7ddea504..4ef2c0e6 100644 --- a/redisvl/query/filter.py +++ b/redisvl/query/filter.py @@ -80,7 +80,7 @@ def wrapper(instance: Any, *args: List[Any], **kwargs: Dict[str, Any]) -> Any: class Tag(FilterField): - """A Tag is a FilterField representing a tag in a Redis index.""" + """A Tag filter can be applied to Tag fields""" OPERATORS: Dict[FilterOperator, str] = { FilterOperator.EQ: "==", @@ -94,14 +94,6 @@ class Tag(FilterField): } SUPPORTED_VAL_TYPES = (list, set, tuple, str, type(None)) - def __init__(self, field: str): - """Create a Tag FilterField. - - Args: - field (str): The name of the tag field in the index to be queried against - """ - super().__init__(field) - def _set_tag_value( self, other: Union[List[str], Set[str], str], operator: FilterOperator ): @@ -129,7 +121,8 @@ def __eq__(self, other: Union[List[str], str]) -> "FilterExpression": .. code-block:: python from redisvl.query.filter import Tag - filter = Tag("brand") == "nike" + + f = Tag("brand") == "nike" """ self._set_tag_value(other, FilterOperator.EQ) return FilterExpression(str(self)) @@ -144,7 +137,7 @@ def __ne__(self, other) -> "FilterExpression": .. code-block:: python from redisvl.query.filter import Tag - filter = Tag("brand") != "nike" + f = Tag("brand") != "nike" """ self._set_tag_value(other, FilterOperator.NE) @@ -155,7 +148,7 @@ def _formatted_tag_value(self) -> str: return "|".join([self.escaper.escape(tag) for tag in self._value]) def __str__(self) -> str: - """Return the Redis Query syntax for a Tag filter expression.""" + """Return the Redis Query string for the Tag filter""" if not self._value: return "*" @@ -221,15 +214,16 @@ class Geo(FilterField): @check_operator_misuse def __eq__(self, other) -> "FilterExpression": - """Create a Geographic equality filter expression. + """Create a geographic filter within a specified GeoRadius. Args: - other (GeoSpec): The geographic spec to filter on. + other (GeoRadius): The geographic spec to filter on. .. code-block:: python from redisvl.query.filter import Geo, GeoRadius - filter = Geo("location") == GeoRadius(-122.4194, 37.7749, 1, unit="m") + + f = Geo("location") == GeoRadius(-122.4194, 37.7749, 1, unit="m") """ self._set_value(other, self.SUPPORTED_VAL_TYPES, FilterOperator.EQ) # type: ignore @@ -237,22 +231,23 @@ def __eq__(self, other) -> "FilterExpression": @check_operator_misuse def __ne__(self, other) -> "FilterExpression": - """Create a Geographic inequality filter expression. + """Create a geographic filter outside of a specified GeoRadius. Args: - other (GeoSpec): The geographic spec to filter on. + other (GeoRadius): The geographic spec to filter on. .. code-block:: python from redisvl.query.filter import Geo, GeoRadius - filter = Geo("location") != GeoRadius(-122.4194, 37.7749, 1, unit="m") + + f = Geo("location") != GeoRadius(-122.4194, 37.7749, 1, unit="m") """ self._set_value(other, self.SUPPORTED_VAL_TYPES, FilterOperator.NE) # type: ignore return FilterExpression(str(self)) def __str__(self) -> str: - """Return the Redis Query syntax for a Geographic filter expression.""" + """Return the Redis Query string for the Geo filter""" if not self._value: return "*" @@ -292,7 +287,7 @@ def __eq__(self, other: int) -> "FilterExpression": .. code-block:: python from redisvl.query.filter import Num - filter = Num("zipcode") == 90210 + f = Num("zipcode") == 90210 """ self._set_value(other, self.SUPPORTED_VAL_TYPES, FilterOperator.EQ) @@ -307,7 +302,8 @@ def __ne__(self, other: int) -> "FilterExpression": .. code-block:: python from redisvl.query.filter import Num - filter = Num("zipcode") != 90210 + + f = Num("zipcode") != 90210 """ self._set_value(other, self.SUPPORTED_VAL_TYPES, FilterOperator.NE) @@ -322,7 +318,8 @@ def __gt__(self, other: int) -> "FilterExpression": .. code-block:: python from redisvl.query.filter import Num - filter = Num("age") > 18 + + f = Num("age") > 18 """ self._set_value(other, self.SUPPORTED_VAL_TYPES, FilterOperator.GT) @@ -337,7 +334,8 @@ def __lt__(self, other: int) -> "FilterExpression": .. code-block:: python from redisvl.query.filter import Num - filter = Num("age") < 18 + + f = Num("age") < 18 """ self._set_value(other, self.SUPPORTED_VAL_TYPES, FilterOperator.LT) @@ -352,7 +350,8 @@ def __ge__(self, other: int) -> "FilterExpression": .. code-block:: python from redisvl.query.filter import Num - filter = Num("age") >= 18 + + f = Num("age") >= 18 """ self._set_value(other, self.SUPPORTED_VAL_TYPES, FilterOperator.GE) @@ -367,14 +366,15 @@ def __le__(self, other: int) -> "FilterExpression": .. code-block:: python from redisvl.query.filter import Num - filter = Num("age") <= 18 + + f = Num("age") <= 18 """ self._set_value(other, self.SUPPORTED_VAL_TYPES, FilterOperator.LE) return FilterExpression(str(self)) def __str__(self) -> str: - """Return the Redis Query syntax for a Numeric filter expression.""" + """Return the Redis Query string for the Numeric filter""" if not self._value: return "*" @@ -414,7 +414,8 @@ def __eq__(self, other: str) -> "FilterExpression": .. code-block:: python from redisvl.query.filter import Text - filter = Text("job") == "engineer" + + f = Text("job") == "engineer" """ self._set_value(other, self.SUPPORTED_VAL_TYPES, FilterOperator.EQ) @@ -432,7 +433,8 @@ def __ne__(self, other: str) -> "FilterExpression": .. code-block:: python from redisvl.query.filter import Text - filter = Text("job") != "engineer" + + f = Text("job") != "engineer" """ self._set_value(other, self.SUPPORTED_VAL_TYPES, FilterOperator.NE) @@ -450,16 +452,18 @@ def __mod__(self, other: str) -> "FilterExpression": .. code-block:: python from redisvl.query.filter import Text - filter = Text("job") % "engine*" # suffix wild card match - filter = Text("job") % "%%engine%%" # fuzzy match w/ Levenshtein Distance - filter = Text("job") % "engineer|doctor" # contains either term in field - filter = Text("job") % "engineer doctor" # contains both terms in field + + f = Text("job") % "engine*" # suffix wild card match + f = Text("job") % "%%engine%%" # fuzzy match w/ Levenshtein Distance + f = Text("job") % "engineer|doctor" # contains either term in field + f = Text("job") % "engineer doctor" # contains both terms in field """ self._set_value(other, self.SUPPORTED_VAL_TYPES, FilterOperator.LIKE) return FilterExpression(str(self)) def __str__(self) -> str: + """Return the Redis Query string for the Text filter""" if not self._value: return "*" @@ -470,37 +474,42 @@ def __str__(self) -> str: class FilterExpression: - """A FilterExpression is a logical expression of FilterFields. + """A FilterExpression is a logical combination of filters in RedisVL. FilterExpressions can be combined using the & and | operators to create - complex logical expressions that evaluate to the Redis Query language. + complex expressions that evaluate to the Redis Query language. This presents an interface by which users can create complex queries without having to know the Redis Query language. - Filter expressions are not created directly. Instead they are built - by combining FilterFields using the & and | operators. - .. code-block:: python from redisvl.query.filter import Tag, Num + brand_is_nike = Tag("brand") == "nike" price_is_over_100 = Num("price") < 100 - filter = brand_is_nike & price_is_over_100 - print(str(filter)) - (@brand:{nike} @price:[-inf (100)]) + f = brand_is_nike & price_is_over_100 + + print(str(f)) + + >>> (@brand:{nike} @price:[-inf (100)]) This can be combined with the VectorQuery class to create a query: .. code-block:: python from redisvl.query import VectorQuery + v = VectorQuery( - ... vector=[0.1, 0.1, 0.5, ...], - ... vector_field_name="product_embedding", - ... return_fields=["product_id", "brand", "price"], - ... filter_expression=filter, - ... ) + vector=[0.1, 0.1, 0.5, ...], + vector_field_name="product_embedding", + return_fields=["product_id", "brand", "price"], + filter_expression=f, + ) + + Note: + Filter expressions are typically not called directly. Instead they are + built by combining filter statements using the & and | operators. """ From e3e107e6c60769a5c91565c6e325c7edb6abe9aa Mon Sep 17 00:00:00 2001 From: Tyler Hutcherson Date: Wed, 7 Feb 2024 13:02:50 -0500 Subject: [PATCH 15/18] refactor connection check modules --- redisvl/extensions/llmcache/semantic.py | 2 +- redisvl/redis/connection.py | 28 ++++++++++++++++++------- 2 files changed, 22 insertions(+), 8 deletions(-) diff --git a/redisvl/extensions/llmcache/semantic.py b/redisvl/extensions/llmcache/semantic.py index c4e65426..10d356b9 100644 --- a/redisvl/extensions/llmcache/semantic.py +++ b/redisvl/extensions/llmcache/semantic.py @@ -310,7 +310,7 @@ def store( key = cache.store( prompt="What is the captial city of France?", response="Paris", - metadata={"city": "Paris", "country": "Fance"} + metadata={"city": "Paris", "country": "France"} ) """ # Vectorize prompt if necessary and create cache payload diff --git a/redisvl/redis/connection.py b/redisvl/redis/connection.py index c88efaf0..b5351ecd 100644 --- a/redisvl/redis/connection.py +++ b/redisvl/redis/connection.py @@ -1,5 +1,5 @@ import os -from typing import Optional +from typing import Any, Dict, List, Optional from redis import ConnectionPool, Redis from redis.asyncio import Redis as AsyncRedis @@ -101,7 +101,10 @@ def get_async_redis_connection(url: Optional[str] = None, **kwargs) -> AsyncRedi return AsyncRedis.from_url(get_address_from_env(), **kwargs) @staticmethod - def validate_redis_modules(client: Redis) -> None: + def validate_redis_modules( + client: Redis, + redis_required_modules: Optional[List[Dict[str, Any]]] = None + ) -> None: """Validates if the required Redis modules are installed. Args: @@ -111,11 +114,14 @@ def validate_redis_modules(client: Redis) -> None: ValueError: If required Redis modules are not installed. """ RedisConnectionFactory._validate_redis_modules( - convert_bytes(client.module_list()) + convert_bytes(client.module_list()), redis_required_modules ) @staticmethod - def validate_async_redis_modules(client: AsyncRedis) -> None: + def validate_async_redis_modules( + client: AsyncRedis, + redis_required_modules: Optional[List[Dict[str, Any]]] = None + ) -> None: """ Validates if the required Redis modules are installed. @@ -128,21 +134,29 @@ def validate_async_redis_modules(client: AsyncRedis) -> None: temp_client = Redis( connection_pool=ConnectionPool(**client.connection_pool.connection_kwargs) ) - RedisConnectionFactory.validate_redis_modules(temp_client) + RedisConnectionFactory.validate_redis_modules( + temp_client, redis_required_modules + ) @staticmethod - def _validate_redis_modules(installed_modules) -> None: + def _validate_redis_modules( + installed_modules, + redis_required_modules: Optional[List[Dict[str, Any]]] = None + ) -> None: """ Validates if required Redis modules are installed. Args: installed_modules: List of installed modules. + redis_required_modules: List of required modules. Raises: ValueError: If required Redis modules are not installed. """ installed_modules = {module["name"]: module for module in installed_modules} - for required_module in REDIS_REQUIRED_MODULES: + redis_required_modules = redis_required_modules or REDIS_REQUIRED_MODULES + + for required_module in redis_required_modules: if required_module["name"] in installed_modules: installed_version = installed_modules[required_module["name"]]["ver"] if int(installed_version) >= int(required_module["ver"]): # type: ignore From 2e1ba0392ec41476665bfe94c9ba6ee7f5d024a6 Mon Sep 17 00:00:00 2001 From: Tyler Hutcherson Date: Thu, 8 Feb 2024 12:06:19 -0500 Subject: [PATCH 16/18] fix formatting --- redisvl/redis/connection.py | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/redisvl/redis/connection.py b/redisvl/redis/connection.py index b5351ecd..98d76482 100644 --- a/redisvl/redis/connection.py +++ b/redisvl/redis/connection.py @@ -102,8 +102,7 @@ def get_async_redis_connection(url: Optional[str] = None, **kwargs) -> AsyncRedi @staticmethod def validate_redis_modules( - client: Redis, - redis_required_modules: Optional[List[Dict[str, Any]]] = None + client: Redis, redis_required_modules: Optional[List[Dict[str, Any]]] = None ) -> None: """Validates if the required Redis modules are installed. @@ -120,7 +119,7 @@ def validate_redis_modules( @staticmethod def validate_async_redis_modules( client: AsyncRedis, - redis_required_modules: Optional[List[Dict[str, Any]]] = None + redis_required_modules: Optional[List[Dict[str, Any]]] = None, ) -> None: """ Validates if the required Redis modules are installed. @@ -140,8 +139,7 @@ def validate_async_redis_modules( @staticmethod def _validate_redis_modules( - installed_modules, - redis_required_modules: Optional[List[Dict[str, Any]]] = None + installed_modules, redis_required_modules: Optional[List[Dict[str, Any]]] = None ) -> None: """ Validates if required Redis modules are installed. From fd24036eff625f6dc9e6e7096024a66cd59b1c32 Mon Sep 17 00:00:00 2001 From: Tyler Hutcherson Date: Thu, 8 Feb 2024 12:17:19 -0500 Subject: [PATCH 17/18] readme sanitation --- README.md | 30 ++++++++++++------------------ 1 file changed, 12 insertions(+), 18 deletions(-) diff --git a/README.md b/README.md index 7899ca56..d61a2e6f 100644 --- a/README.md +++ b/README.md @@ -1,15 +1,8 @@ -
-
- 🔥 Redis Vector Library -
- the AI-native Redis Python client -
-
-
- - +

🔥 Redis Vector Library

+ the AI-native Redis Python client +
[![Codecov](https://img.shields.io/codecov/c/github/RedisVentures/RedisVL/dev?label=Codecov&logo=codecov&token=E30WxqBeJJ)](https://codecov.io/gh/RedisVentures/RedisVL) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) @@ -33,13 +26,15 @@ # Introduction -`redisvl` is a Python client library, tailor-made for AI applications leveraging [Redis](https://redis.com). It's designed for use in: +The Python Redis Vector Library (RedisVL) is a tailor-made client for AI applications leveraging [Redis](https://redis.com). + +It's specifically designed for: - Information retrieval & vector similarity search - Real-time RAG pipelines - Recommendation engines -Enhance your AI applications with Redis' **speed**, **flexibility**, and **reliability**, incorporating capabilities like vector-based semantic search, full-text search, and geo-spatial search. +Enhance your applications with Redis' **speed**, **flexibility**, and **reliability**, incorporating capabilities like vector-based semantic search, full-text search, and geo-spatial search. # 🚀 Why RedisVL? @@ -49,7 +44,7 @@ However, organizations still struggle with delivering reliable solutions **quick [Redis](https://redis.io) has been a staple for over a decade in the NoSQL world, and boasts a number of flexible [data structures](https://redis.io/docs/data-types/) and [processing engines](https://redis.io/docs/interact/) to handle realtime application workloads like caching, session management, and search. Most notably, Redis has been used as a vector database for RAG, as an LLM cache, and chat session memory store for conversational AI applications. -`redisvl` **bridges the gap between** the emerging AI-native developer ecosystem and the capabilities of Redis by providing a lightweight, elegant, and intuitive interface. Built on the back of the popular Python client, [`redis-py`](https://github.com/redis/redis-py/tree/master), it extends the core caching and search features of Redis into a grammar that is more aligned to the needs of today's AI/ML Engineers or Data Scientists. +The vector library **bridges the gap between** the emerging AI-native developer ecosystem and the capabilities of Redis by providing a lightweight, elegant, and intuitive interface. Built on the back of the popular Python client, [`redis-py`](https://github.com/redis/redis-py/tree/master), it abstracts the features Redis into a grammar that is more aligned to the needs of today's AI/ML Engineers or Data Scientists. # 💪 Getting Started @@ -90,7 +85,6 @@ Choose from multiple Redis deployment options: index: name: user-index-v1 prefix: user - key_separator: ':' storage_type: json fields: @@ -275,7 +269,7 @@ aim to improve applications working with LLMs: >>> "Paris" ``` - > Learn more about Semantic Caching in `redisvl` [here](https://www.redisvl.com/user_guide/llmcache_03.html). + > Learn more about Semantic Caching [here](https://www.redisvl.com/user_guide/llmcache_03.html). - **LLM Session Management (COMING SOON)** aims to improve personalization and accuracy of the LLM application by providing user chat session information and conversational memory. - **LLM Contextual Access Control (COMING SOON)** aims to improve security concerns by preventing malicious, irrelevant, or problematic user input from reaching LLMs and infrastructure. @@ -283,7 +277,7 @@ aim to improve applications working with LLMs: ## Helpful Links -To get started with `redisvl`, check out: +To get started, check out the following guides: - [Getting Started Guide](https://www.redisvl.com/user_guide/getting_started_01.html) - [API Reference](https://www.redisvl.com/api/index.html) - [Example Gallery](https://www.redisvl.com/examples/index.html) @@ -292,7 +286,7 @@ To get started with `redisvl`, check out: ## 🫱🏼‍🫲🏽 Contributing -Please help us by contributing PRs, opening GitHub issues for bugs or new feature ideas, improving documentation, or increasing test coverage. [Read more about how to contribute to RedisVL!](CONTRIBUTING.md) +Please help us by contributing PRs, opening GitHub issues for bugs or new feature ideas, improving documentation, or increasing test coverage. [Read more about how to contribute!](CONTRIBUTING.md) ## 🚧 Maintenance -RedisVL is supported by [Redis, Inc](https://redis.com) on a good faith effort basis. To report bugs, request features, or receive assistance, please [file an issue](https://github.com/RedisVentures/redisvl/issues). +This project is supported by [Redis, Inc](https://redis.com) on a good faith effort basis. To report bugs, request features, or receive assistance, please [file an issue](https://github.com/RedisVentures/redisvl/issues). From 7500f308889cc5e13f54bf6db4ec4aac04a6bcd8 Mon Sep 17 00:00:00 2001 From: Tyler Hutcherson Date: Thu, 8 Feb 2024 15:38:07 -0500 Subject: [PATCH 18/18] docstring fix for schema --- redisvl/schema/schema.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/redisvl/schema/schema.py b/redisvl/schema/schema.py index 630c5660..36c1ab80 100644 --- a/redisvl/schema/schema.py +++ b/redisvl/schema/schema.py @@ -135,7 +135,7 @@ class IndexSchema(BaseModel): "attrs": { "algorithm": "flat", "dims": 3, - "distance_metrics": "cosine", + "distance_metric": "cosine", "datatype": "float32" } }