diff --git a/README.md b/README.md
index 8866f83d..d61a2e6f 100644
--- a/README.md
+++ b/README.md
@@ -1,8 +1,22 @@
-# RedisVL: Python Client Library for Redis as a Vector Database
+
🔥 Redis Vector Library
+
+
+ the AI-native Redis Python client
+
+
+[](https://codecov.io/gh/RedisVentures/RedisVL)
+[](https://opensource.org/licenses/MIT)
+
+[](https://github.com/psf/black)
+
+
+[](https://pypi.org/project/redisvl/)
+
+
-
+# Introduction
-[](https://codecov.io/gh/RedisVentures/RedisVL)
-[](https://opensource.org/licenses/mit/)
-
-[](https://github.com/psf/black)
-
-
-[](https://pypi.org/project/redisvl/)
+The Python Redis Vector Library (RedisVL) is a tailor-made client for AI applications leveraging [Redis](https://redis.com).
-
+It's specifically designed for:
-RedisVL provides a powerful Python client library for using Redis as a Vector Database. Leverage the speed and reliability of Redis along with vector-based semantic search capabilities to supercharge your application!
+- Information retrieval & vector similarity search
+- Real-time RAG pipelines
+- Recommendation engines
-**Note**: This supported by Redis, Inc. on a good faith effort basis. To report bugs, request features, or receive assistance, please [file an issue](https://github.com/RedisVentures/redisvl/issues).
+Enhance your applications with Redis' **speed**, **flexibility**, and **reliability**, incorporating capabilities like vector-based semantic search, full-text search, and geo-spatial search.
+# 🚀 Why RedisVL?
-------------
+The emergence of the modern GenAI stack, including **vector databases** and **LLMs**, has become increasingly popular due to accelerated innovation & research in information retrieval, the ubiquity of tools & frameworks (e.g. [LangChain](https://github.com/langchain-ai/langchain), [LlamaIndex](https://www.llamaindex.ai/), [EmbedChain](https://github.com/embedchain/embedchain)), and the never-ending stream of business problems addressable by AI.
+However, organizations still struggle with delivering reliable solutions **quickly** (*time to value*) at **scale** (*beyond a demo*).
-## 🚀 What is RedisVL?
+[Redis](https://redis.io) has been a staple for over a decade in the NoSQL world, and boasts a number of flexible [data structures](https://redis.io/docs/data-types/) and [processing engines](https://redis.io/docs/interact/) to handle realtime application workloads like caching, session management, and search. Most notably, Redis has been used as a vector database for RAG, as an LLM cache, and chat session memory store for conversational AI applications.
-Vector databases have become increasingly popular in recent years due to their ability to store and retrieve vectors efficiently. However, most vector databases are complex to use and require a lot of time and effort to set up. RedisVL aims to solve this problem by providing a simple and intuitive interface for using Redis as a vector database.
+The vector library **bridges the gap between** the emerging AI-native developer ecosystem and the capabilities of Redis by providing a lightweight, elegant, and intuitive interface. Built on the back of the popular Python client, [`redis-py`](https://github.com/redis/redis-py/tree/master), it abstracts the features Redis into a grammar that is more aligned to the needs of today's AI/ML Engineers or Data Scientists.
-RedisVL provides a client library that enables you to harness the power and flexibility of Redis as a vector database. This library simplifies the process of storing, retrieving, and performing complex semantic and hybrid searches over vectors in Redis. It also provides a robust index management system that allows you to create, update, and delete indices with ease.
+# 💪 Getting Started
+## Installation
-### Capabilities
+Install `redisvl` into your Python (>=3.8) environment using `pip`:
-RedisVL has a host of powerful features designed to streamline your vector database operations.
+```bash
+pip install redisvl
+```
+> For more instructions, visit the `redisvl` [installation guide](https://www.redisvl.com/overview/installation.html).
-1. **Index Management**: RedisVL allows for indices to be created, updated, and deleted with ease. A schema for each index can be defined in yaml or directly in python code and used throughout the lifetime of the index.
- - [Getting Started with SearchIndex](https://www.redisvl.com/user_guide/getting_started_01.html)
- - [``rvl`` Command Line Interface](https://www.redisvl.com/user_guide/cli.html)
+## Setting up Redis
-2. **Embedding Creation**: RedisVLs [Vectorizers](https://www.redisvl.com/user_guide/vectorizers_04.html) integrate with common embedding model services to simplify the process of vectorizing unstructured data.
- - [OpenAI](https://www.redisvl.com/api/vectorizer.html#openaitextvectorizer)
- - [HuggingFace](https://www.redisvl.com/api/vectorizer.html#hftextvectorizer)
- - [GCP VertexAI](https://www.redisvl.com/api/vectorizer.html#vertexaitextvectorizer)
+Choose from multiple Redis deployment options:
-3. **Vector Search**: RedisVL provides robust search capabilities that enable you quickly define complex search queries with flexible abstractions.
- - [VectorQuery](https://www.redisvl.com/api/query.html#vectorquery) - Flexible vector queries with filters
- - [RangeQuery](https://www.redisvl.com/api/query.html#rangequery) - Vector search within a defined range
- - [CountQuery](https://www.redisvl.com/api/query.html#countquery) - Count the number of records given attributes
- - [FilterQuery](https://www.redisvl.com/api/query.html#filterquery) - Filter records given attributes
-3. **[Hybrid (Filtered) queries](https://www.redisvl.com/user_guide/hybrid_queries_02.html)** that utilize tag, geographic, numeric, and other filters like full-text search are also supported.
+1. [Redis Cloud](https://redis.com/try-free): Managed cloud database (free tier available)
+2. [Redis Stack](https://redis.io/docs/getting-started/install-stack/docker/): Docker image for development
+ ```bash
+ docker run -d --name redis-stack -p 6379:6379 -p 8001:8001 redis/redis-stack:latest
+ ```
+3. [Redis Enterprise](https://redis.com/redis-enterprise/): Commercial, self-hosted database
-4. **Semantic Caching**: [`LLMCache`](https://www.redisvl.com/user_guide/llmcache_03.html) is a semantic caching interface built directly into RedisVL. Semantic caching is a popular technique to increase the QPS and reduce the cost of using LLM models in production.
+> Enhance your experience and observability with the free [Redis Insight GUI](https://redis.com/redis-enterprise/redis-insight/).
-5. [**JSON Storage**](https://www.redisvl.com/user_guide/hash_vs_json_05.html): RedisVL supports storing JSON objects, including vectors, in Redis.
-## Installation
+## What's included?
+
+
+### 🗃️ Redis Index Management
+1. [Design an `IndexSchema`](https://www.redisvl.com/user_guide/getting_started_01.html#define-an-indexschema) that models your dataset with built-in Redis [data structures](https://www.redisvl.com/user_guide/hash_vs_json_05.html) (*Hash or JSON*) and indexable fields (*e.g. text, tags, numerics, geo, and vectors*).
+
+ [Load a schema](https://www.redisvl.com/user_guide/getting_started_01.html#example-schema-creation) from a [YAML file](schemas/schema.yaml):
+ ```yaml
+ version: '0.1.0'
+
+ index:
+ name: user-index-v1
+ prefix: user
+ storage_type: json
+
+ fields:
+ - name: user
+ type: tag
+ - name: credit_score
+ type: tag
+ - name: embedding
+ type: vector
+ attrs:
+ algorithm: flat
+ dims: 3
+ distance_metric: cosine
+ datatype: float32
+ ```
+ ```python
+ from redisvl.schema import IndexSchema
+
+ schema = IndexSchema.from_yaml("schemas/schema.yaml")
+ ```
+ Or load directly from a Python dictionary:
+ ```python
+ schema = IndexSchema.from_dict({
+ "index": {
+ "name": "user-index-v1",
+ "prefix": "user",
+ "storage_type": "json"
+ },
+ "fields": [
+ {"name": "user", "type": "tag"},
+ {"name": "credit_score", "type": "tag"},
+ {
+ "name": "embedding",
+ "type": "vector",
+ "attrs": {
+ "algorithm": "flat",
+ "datatype": "float32",
+ "dims": 4,
+ "distance_metric": "cosine"
+ }
+ }
+ ]
+ })
+ ```
+
+2. [Create a SearchIndex](https://www.redisvl.com/user_guide/getting_started_01.html#create-a-searchindex) class with an input schema and client connection in order to perform admin and search operations on your index in Redis:
+ ```python
+ from redis import Redis
+ from redisvl.index import SearchIndex
+
+ # Establish Redis connection and define index
+ client = Redis.from_url("redis://localhost:6379")
+ index = SearchIndex(schema, client)
+
+ # Create the index in Redis
+ index.create()
+ ```
+ > Async compliant search index class also available: `AsyncSearchIndex`
+
+3. [Load](https://www.redisvl.com/user_guide/getting_started_01.html#load-data-to-searchindex)
+and [fetch](https://www.redisvl.com/user_guide/getting_started_01.html#fetch-an-object-from-redis) data to/from your Redis instance:
+ ```python
+ data = {"user": "john", "credit_score": "high", "embedding": [0.23, 0.49, -0.18, 0.95]}
-Install `redisvl` using `pip`:
+ # load list of dictionaries, specify the "id" field
+ index.load([data], id_field="user")
+
+ # fetch by "id"
+ john = index.fetch("john")
+ ```
+
+### 🔍 Realtime Search
+
+Define queries and perform advanced searches over your indices, including the combination of vectors, metadata filters, and more.
+
+- [VectorQuery](https://www.redisvl.com/api/query.html#vectorquery) - Flexible vector queries with customizable filters enabling semantic search:
+
+ ```python
+ from redisvl.query import VectorQuery
+
+ query = VectorQuery(
+ vector=[0.16, -0.34, 0.98, 0.23],
+ vector_field_name="embedding",
+ num_results=3
+ )
+ # run the vector search query against the embedding field
+ results = index.query(query)
+ ```
+
+ Incorporate complex metadata filters on your queries:
+ ```python
+ from redisvl.query.filter import Tag
+
+ # define a tag match filter
+ tag_filter = Tag("user") == "john"
+
+ # update query definition
+ query.set_filter(tag_filter)
+
+ # execute query
+ results = index.query(query)
+ ```
+
+- [RangeQuery](https://www.redisvl.com/api/query.html#rangequery) - Vector search within a defined range paired with customizable filters
+- [FilterQuery](https://www.redisvl.com/api/query.html#filterquery) - Standard search using filters and the full-text search
+- [CountQuery](https://www.redisvl.com/api/query.html#countquery) - Count the number of indexed records given attributes
+
+> Read more about building advanced Redis queries [here](https://www.redisvl.com/user_guide/hybrid_queries_02.html).
+
+
+### 🖥️ Command Line Interface
+Create, destroy, and manage Redis index configurations from a purpose-built CLI interface: `rvl`.
```bash
-pip install redisvl
+$ rvl -h
+
+usage: rvl []
+
+Commands:
+ index Index manipulation (create, delete, etc.)
+ version Obtain the version of RedisVL
+ stats Obtain statistics about an index
```
-For more instructions, see the [installation guide](https://www.redisvl.com/overview/installation.html).
+> Read more about using the `redisvl` CLI [here](https://www.redisvl.com/user_guide/cli.html).
-## Getting Started
+### ⚡ Community Integrations
+Integrate with popular embedding models and providers to greatly simplify the process of vectorizing unstructured data for your index and queries:
+- [Cohere](https://www.redisvl.com/api/vectorizer/html#coheretextvectorizer)
+- [OpenAI](https://www.redisvl.com/api/vectorizer.html#openaitextvectorizer)
+- [HuggingFace](https://www.redisvl.com/api/vectorizer.html#hftextvectorizer)
+- [GCP VertexAI](https://www.redisvl.com/api/vectorizer.html#vertexaitextvectorizer)
-To get started with RedisVL, check out the
+```python
+from redisvl.utils.vectorize import CohereTextVectorizer
+
+# set COHERE_API_KEY in your environment
+co = CohereTextVectorizer()
+
+embedding = co.embed(
+ text="What is the capital city of France?",
+ input_type="search_query"
+)
+
+embeddings = co.embed_many(
+ texts=["my document chunk content", "my other document chunk content"],
+ search_type="search_documents"
+)
+```
+
+> Learn more about using `redisvl` Vectorizers in your workflows [here](https://www.redisvl.com/user_guide/vectorizers_04.html).
+
+### 💫 Beyond Vector Search
+In order to perform well in production, modern GenAI applications require much more than vector search for retrieval. `redisvl` provides some common extensions that
+aim to improve applications working with LLMs:
+
+- **LLM Semantic Caching** is designed to increase application throughput and reduce the cost of using LLM models in production by leveraging previously generated knowledge.
+
+ ```python
+ from redisvl.extensions.llmcache import SemanticCache
+
+ # init cache with TTL (expiration) policy and semantic distance threshhold
+ llmcache = SemanticCache(
+ name="llmcache",
+ ttl=360,
+ redis_url="redis://localhost:6379"
+ )
+ llmcache.set_threshold(0.2) # can be changed on-demand
+
+ # store user queries and LLM responses in the semantic cache
+ llmcache.store(
+ prompt="What is the capital city of France?",
+ response="Paris",
+ metadata={}
+ )
+
+ # quickly check the cache with a slightly different prompt (before invoiking an LLM)
+ response = llmcache.check(prompt="What is France's capital city?")
+ print(response[0]["response"])
+ ```
+ ```stdout
+ >>> "Paris"
+ ```
+
+ > Learn more about Semantic Caching [here](https://www.redisvl.com/user_guide/llmcache_03.html).
+
+- **LLM Session Management (COMING SOON)** aims to improve personalization and accuracy of the LLM application by providing user chat session information and conversational memory.
+- **LLM Contextual Access Control (COMING SOON)** aims to improve security concerns by preventing malicious, irrelevant, or problematic user input from reaching LLMs and infrastructure.
+
+
+## Helpful Links
+
+To get started, check out the following guides:
- [Getting Started Guide](https://www.redisvl.com/user_guide/getting_started_01.html)
- [API Reference](https://www.redisvl.com/api/index.html)
- [Example Gallery](https://www.redisvl.com/examples/index.html)
+ - [Official Redis Vector Search Docs](https://redis.io/docs/interact/search-and-query/advanced-concepts/vectors/)
+
+## 🫱🏼🫲🏽 Contributing
-## Contributing
+Please help us by contributing PRs, opening GitHub issues for bugs or new feature ideas, improving documentation, or increasing test coverage. [Read more about how to contribute!](CONTRIBUTING.md)
-Please help us by contributing PRs or opening GitHub issues for desired behaviors or discovered bugs. [Read more about how to contribute to RedisVL!](CONTRIBUTING.md)
+## 🚧 Maintenance
+This project is supported by [Redis, Inc](https://redis.com) on a good faith effort basis. To report bugs, request features, or receive assistance, please [file an issue](https://github.com/RedisVentures/redisvl/issues).
diff --git a/docs/api/cache.rst b/docs/api/cache.rst
index 024cff28..7e34ee1f 100644
--- a/docs/api/cache.rst
+++ b/docs/api/cache.rst
@@ -8,20 +8,7 @@ SemanticCache
.. _semantic_cache_api:
-.. currentmodule:: redisvl.llmcache.semantic
-
-.. autosummary::
-
- SemanticCache.__init__
- SemanticCache.check
- SemanticCache.store
- SemanticCache.clear
- SemanticCache.delete
- SemanticCache.distance_threshold
- SemanticCache.set_threshold
- SemanticCache.ttl
- SemanticCache.set_ttl
-
+.. currentmodule:: redisvl.extensions.llmcache
.. autoclass:: SemanticCache
:show-inheritance:
diff --git a/docs/api/filter.rst b/docs/api/filter.rst
index ddc00092..bcd11ab3 100644
--- a/docs/api/filter.rst
+++ b/docs/api/filter.rst
@@ -16,20 +16,10 @@ Tag
.. currentmodule:: redisvl.query.filter
-.. autosummary::
-
- Tag.__init__
- Tag.__eq__
- Tag.__ne__
- Tag.__str__
-
-
.. autoclass:: Tag
- :show-inheritance:
:members:
:special-members:
- :inherited-members:
-
+ :exclude-members: __hash__
Text
@@ -38,19 +28,11 @@ Text
.. currentmodule:: redisvl.query.filter
-.. autosummary::
-
- Text.__init__
- Text.__eq__
- Text.__ne__
- Text.__mod__
- Text.__str__
-
.. autoclass:: Text
- :show-inheritance:
:members:
:special-members:
+ :exclude-members: __hash__
Num
@@ -59,22 +41,11 @@ Num
.. currentmodule:: redisvl.query.filter
-.. autosummary::
-
- Num.__init__
- Num.__eq__
- Num.__ne__
- Num.__lt__
- Num.__le__
- Num.__gt__
- Num.__ge__
- Num.__str__
-
.. autoclass:: Num
- :show-inheritance:
:members:
:special-members:
+ :exclude-members: __hash__
Geo
@@ -82,17 +53,10 @@ Geo
.. currentmodule:: redisvl.query.filter
-.. autosummary::
-
- Geo.__init__
- Geo.__eq__
- Geo.__ne__
- Geo.__str__
-
.. autoclass:: Geo
- :show-inheritance:
:members:
:special-members:
+ :exclude-members: __hash__
GeoRadius
@@ -100,11 +64,7 @@ GeoRadius
.. currentmodule:: redisvl.query.filter
-.. autosummary::
-
- GeoRadius.__init__
-
.. autoclass:: GeoRadius
- :show-inheritance:
:members:
- :special-members:
\ No newline at end of file
+ :special-members:
+ :exclude-members: __hash__
diff --git a/docs/api/query.rst b/docs/api/query.rst
index e3c9b98e..9146160a 100644
--- a/docs/api/query.rst
+++ b/docs/api/query.rst
@@ -10,17 +10,8 @@ VectorQuery
.. currentmodule:: redisvl.query
-.. autosummary::
-
- VectorQuery.__init__
- VectorQuery.set_filter
- VectorQuery.get_filter
- VectorQuery.query
- VectorQuery.params
-
.. autoclass:: VectorQuery
- :show-inheritance:
:members:
:inherited-members:
@@ -31,17 +22,8 @@ RangeQuery
.. currentmodule:: redisvl.query
-.. autosummary::
-
- RangeQuery.__init__
- RangeQuery.set_filter
- RangeQuery.get_filter
- RangeQuery.query
- RangeQuery.params
-
.. autoclass:: RangeQuery
- :show-inheritance:
:members:
:inherited-members:
@@ -52,17 +34,8 @@ FilterQuery
.. currentmodule:: redisvl.query
-.. autosummary::
-
- FilterQuery.__init__
- FilterQuery.set_filter
- FilterQuery.get_filter
- FilterQuery.query
- FilterQuery.params
-
.. autoclass:: FilterQuery
- :show-inheritance:
:members:
:inherited-members:
@@ -73,16 +46,7 @@ CountQuery
.. currentmodule:: redisvl.query
-.. autosummary::
-
- CountQuery.__init__
- CountQuery.set_filter
- CountQuery.get_filter
- CountQuery.query
- CountQuery.params
-
.. autoclass:: CountQuery
- :show-inheritance:
:members:
:inherited-members:
diff --git a/docs/api/schema.rst b/docs/api/schema.rst
index 78fd948b..ebe4ca8a 100644
--- a/docs/api/schema.rst
+++ b/docs/api/schema.rst
@@ -1,50 +1,102 @@
-
***********
Schema
***********
+Schema in RedisVL provides a structured format to define index settings and
+field configurations using the following three components:
+
+.. list-table::
+ :widths: 20 80
+ :header-rows: 1
+
+ * - Component
+ - Description
+ * - `version`
+ - The version of the schema spec. Current supported version is `0.1.0`.
+ * - `index`
+ - Index specific settings like name, key prefix, key separator, and storage type.
+ * - `fields`
+ - Subset of fields within your data to include in the index and any custom settings.
+
+
IndexSchema
===========
-.. _searchindex_api:
+.. _indexschema_api:
.. currentmodule:: redisvl.schema
-.. autosummary::
-
- IndexSchema.index
- IndexSchema.fields
- IndexSchema.version
- IndexSchema.field_names
- IndexSchema.redis_fields
- IndexSchema.add_field
- IndexSchema.add_fields
- IndexSchema.remove_field
- IndexSchema.from_yaml
- IndexSchema.to_yaml
- IndexSchema.from_dict
- IndexSchema.to_dict
-
.. autoclass:: IndexSchema
- :show-inheritance:
- :inherited-members:
:members:
+ :exclude-members: generate_fields,validate_and_create_fields,redis_fields
-IndexInfo
-=========
+Defining Fields
+===============
-.. currentmodule:: redisvl.schema
+Fields in the schema can be defined in YAML format or as a Python dictionary, specifying a name, type, an optional path, and attributes for customization.
-.. autosummary::
+**YAML Example**:
- IndexInfo.name
- IndexInfo.prefix
- IndexInfo.key_separator
- IndexInfo.storage_type
+.. code-block:: yaml
+ - name: title
+ type: text
+ path: $.document.title
+ attrs:
+ weight: 1.0
+ no_stem: false
+ withsuffixtrie: true
-.. autoclass:: IndexInfo
- :show-inheritance:
- :inherited-members:
- :members:
+**Python Dictionary Example**:
+
+.. code-block:: python
+
+ {
+ "name": "location",
+ "type": "geo",
+ "attrs": {
+ "sortable": true
+ }
+ }
+
+Supported Field Types and Attributes
+====================================
+
+Each field type supports specific attributes that customize its behavior. Below are the field types and their available attributes:
+
+**Text Field Attributes**:
+
+- `weight`: Importance of the field in result calculation.
+- `no_stem`: Disables stemming during indexing.
+- `withsuffixtrie`: Optimizes queries by maintaining a suffix trie.
+- `phonetic_matcher`: Enables phonetic matching.
+- `sortable`: Allows sorting on this field.
+
+**Tag Field Attributes**:
+
+- `separator`: Character for splitting text into individual tags.
+- `case_sensitive`: Case sensitivity in tag matching.
+- `withsuffixtrie`: Suffix trie optimization for queries.
+- `sortable`: Enables sorting based on the tag field.
+
+**Numeric and Geo Field Attributes**:
+
+- Both numeric and geo fields support the `sortable` attribute, enabling sorting on these fields.
+
+**Common Vector Field Attributes**:
+
+- `dims`: Dimensionality of the vector.
+- `algorithm`: Indexing algorithm (`flat` or `hnsw`).
+- `datatype`: Float datatype of the vector (`float32` or `float64`).
+- `distance_metric`: Metric for measuring query relevance (`COSINE`, `L2`, `IP`).
+
+**HNSW Vector Field Specific Attributes**:
+
+- `m`: Max outgoing edges per node in each layer.
+- `ef_construction`: Max edge candidates during build time.
+- `ef_runtime`: Max top candidates during search.
+- `epsilon`: Range search boundary factor.
+
+Note:
+ See fully documented Redis-supported fields and options here: https://redis.io/commands/ft.create/
\ No newline at end of file
diff --git a/docs/api/searchindex.rst b/docs/api/searchindex.rst
index 11a5f930..0b9ce2fb 100644
--- a/docs/api/searchindex.rst
+++ b/docs/api/searchindex.rst
@@ -21,7 +21,6 @@ SearchIndex
.. currentmodule:: redisvl.index
.. autoclass:: SearchIndex
- :show-inheritance:
:inherited-members:
:members:
@@ -33,6 +32,5 @@ AsyncSearchIndex
.. currentmodule:: redisvl.index
.. autoclass:: AsyncSearchIndex
- :show-inheritance:
:inherited-members:
:members:
diff --git a/docs/api/vectorizer.rst b/docs/api/vectorizer.rst
index 9b21d432..61dd432c 100644
--- a/docs/api/vectorizer.rst
+++ b/docs/api/vectorizer.rst
@@ -1,24 +1,17 @@
-**********
-Vectorizer
-**********
+***********
+Vectorizers
+***********
HFTextVectorizer
================
.. _hftextvectorizer_api:
-.. currentmodule:: redisvl.vectorize.text.huggingface
-
-.. autosummary::
-
- HFTextVectorizer.__init__
- HFTextVectorizer.embed
- HFTextVectorizer.embed_many
+.. currentmodule:: redisvl.utils.vectorize.text.huggingface
.. autoclass:: HFTextVectorizer
:show-inheritance:
- :inherited-members:
:members:
@@ -27,19 +20,10 @@ OpenAITextVectorizer
.. _openaitextvectorizer_api:
-.. currentmodule:: redisvl.vectorize.text.openai
-
-.. autosummary::
-
- OpenAITextVectorizer.__init__
- OpenAITextVectorizer.embed
- OpenAITextVectorizer.embed_many
- OpenAITextVectorizer.aembed
- OpenAITextVectorizer.aembed_many
+.. currentmodule:: redisvl.utils.vectorize.text.openai
.. autoclass:: OpenAITextVectorizer
:show-inheritance:
- :inherited-members:
:members:
@@ -48,17 +32,10 @@ VertexAITextVectorizer
.. _vertexaitextvectorizer_api:
-.. currentmodule:: redisvl.vectorize.text.vertexai
-
-.. autosummary::
-
- VertexAITextVectorizer.__init__
- VertexAITextVectorizer.embed
- VertexAITextVectorizer.embed_many
+.. currentmodule:: redisvl.utils.vectorize.text.vertexai
.. autoclass:: VertexAITextVectorizer
:show-inheritance:
- :inherited-members:
:members:
@@ -67,16 +44,9 @@ CohereTextVectorizer
.. _coheretextvectorizer_api:
-.. currentmodule:: redisvl.vectorize.text.cohere
-
-.. autosummary::
-
- CohereTextVectorizer.__init__
- CohereTextVectorizer.embed
- CohereTextVectorizer.embed_many
+.. currentmodule:: redisvl.utils.vectorize.text.cohere
.. autoclass:: CohereTextVectorizer
:show-inheritance:
- :inherited-members:
:members:
diff --git a/docs/user_guide/getting_started_01.ipynb b/docs/user_guide/getting_started_01.ipynb
index 6a25811e..45596510 100644
--- a/docs/user_guide/getting_started_01.ipynb
+++ b/docs/user_guide/getting_started_01.ipynb
@@ -215,7 +215,7 @@
{
"data": {
"text/plain": [
- ""
+ ""
]
},
"execution_count": 4,
@@ -249,7 +249,7 @@
{
"data": {
"text/plain": [
- ""
+ ""
]
},
"execution_count": 5,
@@ -304,8 +304,8 @@
"name": "stdout",
"output_type": "stream",
"text": [
- "\u001b[32m16:13:33\u001b[0m \u001b[34m[RedisVL]\u001b[0m \u001b[1;30mINFO\u001b[0m Indices:\n",
- "\u001b[32m16:13:33\u001b[0m \u001b[34m[RedisVL]\u001b[0m \u001b[1;30mINFO\u001b[0m 1. user_simple\n"
+ "\u001b[32m09:54:16\u001b[0m \u001b[34m[RedisVL]\u001b[0m \u001b[1;30mINFO\u001b[0m Indices:\n",
+ "\u001b[32m09:54:16\u001b[0m \u001b[34m[RedisVL]\u001b[0m \u001b[1;30mINFO\u001b[0m 1. user_simple\n"
]
}
],
@@ -331,15 +331,15 @@
"│ user_simple │ HASH │ ['user_simple_docs'] │ [] │ 0 │\n",
"╰──────────────┴────────────────┴──────────────────────┴─────────────────┴────────────╯\n",
"Index Fields:\n",
- "╭────────────────┬────────────────┬─────────┬────────────────┬────────────────╮\n",
- "│ Name │ Attribute │ Type │ Field Option │ Option Value │\n",
- "├────────────────┼────────────────┼─────────┼────────────────┼────────────────┤\n",
- "│ user │ user │ TAG │ SEPARATOR │ , │\n",
- "│ credit_score │ credit_score │ TAG │ SEPARATOR │ , │\n",
- "│ job │ job │ TEXT │ WEIGHT │ 1 │\n",
- "│ age │ age │ NUMERIC │ │ │\n",
- "│ user_embedding │ user_embedding │ VECTOR │ │ │\n",
- "╰────────────────┴────────────────┴─────────┴────────────────┴────────────────╯\n"
+ "╭────────────────┬────────────────┬─────────┬────────────────┬────────────────┬────────────────┬────────────────┬────────────────┬────────────────┬─────────────────┬────────────────╮\n",
+ "│ Name │ Attribute │ Type │ Field Option │ Option Value │ Field Option │ Option Value │ Field Option │ Option Value │ Field Option │ Option Value │\n",
+ "├────────────────┼────────────────┼─────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼─────────────────┼────────────────┤\n",
+ "│ user │ user │ TAG │ SEPARATOR │ , │ │ │ │ │ │ │\n",
+ "│ credit_score │ credit_score │ TAG │ SEPARATOR │ , │ │ │ │ │ │ │\n",
+ "│ job │ job │ TEXT │ WEIGHT │ 1 │ │ │ │ │ │ │\n",
+ "│ age │ age │ NUMERIC │ │ │ │ │ │ │ │ │\n",
+ "│ user_embedding │ user_embedding │ VECTOR │ algorithm │ FLAT │ data_type │ FLOAT32 │ dim │ 3 │ distance_metric │ COSINE │\n",
+ "╰────────────────┴────────────────┴─────────┴────────────────┴────────────────┴────────────────┴────────────────┴────────────────┴────────────────┴─────────────────┴────────────────╯\n"
]
}
],
@@ -365,7 +365,7 @@
"name": "stdout",
"output_type": "stream",
"text": [
- "['user_simple_docs:297be8ec3c6444a4b73c10e77daadb4a', 'user_simple_docs:ac0cc4c7ee4d4cd18e9002dbaf1b5cbc', 'user_simple_docs:6c746e3f02d94d9087e0d207cfed5701']\n"
+ "['user_simple_docs:99c6166a36744c3c998eccccd9fcfdbd', 'user_simple_docs:55ff82cbcc054ed6b91132f15fcec786', 'user_simple_docs:e8a36c9e75294c7697dabea0ebf17cd9']\n"
]
}
],
@@ -382,49 +382,6 @@
">By default, `load` will create a unique Redis \"key\" as a combination of the index key `prefix` and a UUID. You can also customize the key by providing direct keys or pointing to a specified `id_field` on load."
]
},
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### Fetch an object from Redis\n",
- "\n",
- "Fetch one of the previously written objects:"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 10,
- "metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "Fetching data for user 297be8ec3c6444a4b73c10e77daadb4a\n"
- ]
- },
- {
- "data": {
- "text/plain": [
- "{'user': 'john',\n",
- " 'age': '1',\n",
- " 'job': 'engineer',\n",
- " 'credit_score': 'high',\n",
- " 'user_embedding': b'\\xcd\\xcc\\xcc=\\xcd\\xcc\\xcc=\\x00\\x00\\x00?'}"
- ]
- },
- "execution_count": 10,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
- "source": [
- "_id = keys[0].strip(f\"{index.prefix}:\") # strip the key prefix\n",
- "\n",
- "print(f\"Fetching data for user {_id}\")\n",
- "index.fetch(id=_id)"
- ]
- },
{
"cell_type": "markdown",
"metadata": {},
@@ -442,7 +399,7 @@
"name": "stdout",
"output_type": "stream",
"text": [
- "['user_simple_docs:714e5ec6d4a946c082fe006d311e8d49']\n"
+ "['user_simple_docs:f53ce588049a4636b5ecd8b0a81ac8ff']\n"
]
}
],
@@ -524,17 +481,40 @@
"source": [
"## Using an Asynchronous Redis Client\n",
"\n",
- "The `SearchIndex` class allows for queries, index creation, and data loading to be done asynchronously. This is the\n",
- "recommended route for working with `redisvl` in production-like settings.\n",
- "\n",
- "In order to enable it, you must either pass the `use_async` flag to the index\n",
- "initializer, or provide an existing async redis client connection."
+ "The `AsyncSearchIndex` class along with an async Redis python client allows for queries, index creation, and data loading to be done asynchronously. This is the\n",
+ "recommended route for working with `redisvl` in production-like settings."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 14,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "from redisvl.index import AsyncSearchIndex\n",
+ "from redis.asyncio import Redis\n",
+ "\n",
+ "client = Redis.from_url(\"redis://localhost:6379\")\n",
+ "\n",
+ "index = AsyncSearchIndex.from_dict(schema)\n",
+ "index.set_client(client)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 15,
+ "metadata": {},
"outputs": [
{
"data": {
@@ -550,10 +530,6 @@
}
],
"source": [
- "from redisvl.index import AsyncSearchIndex\n",
- "\n",
- "index = AsyncSearchIndex.from_dict(schema, redis_url=\"redis://localhost:6379\")\n",
- "\n",
"# execute the vector query async\n",
"results = await index.query(query)\n",
"result_print(results)"
@@ -563,7 +539,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "## Update a `SearchIndex`\n",
+ "## Updating a schema\n",
"In some scenarios, it makes sense to update the index schema. With Redis and `redisvl`, this is easy because Redis can keep the underlying data in place while you change or make updates to the index configuration."
]
},
@@ -572,39 +548,8 @@
"metadata": {},
"source": [
"So for our scenario, let's imagine we want to reindex this data in 2 ways:\n",
- "- by using a `Tag` type for job field instead of `Text`\n",
- "- by using an `hnsw` index for the `Vector` field instead of `flat`"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 15,
- "metadata": {},
- "outputs": [
- {
- "data": {
- "text/plain": [
- "{'index': {'name': 'user_simple', 'prefix': 'user_simple_docs'},\n",
- " 'fields': [{'name': 'user', 'type': 'tag'},\n",
- " {'name': 'credit_score', 'type': 'tag'},\n",
- " {'name': 'job', 'type': 'text'},\n",
- " {'name': 'age', 'type': 'numeric'},\n",
- " {'name': 'user_embedding',\n",
- " 'type': 'vector',\n",
- " 'attrs': {'dims': 3,\n",
- " 'distance_metric': 'cosine',\n",
- " 'algorithm': 'flat',\n",
- " 'datatype': 'float32'}}]}"
- ]
- },
- "execution_count": 15,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
- "source": [
- "# Inspect the previous schema\n",
- "schema"
+ "- by using a `Tag` type for `job` field instead of `Text`\n",
+ "- by using an `hnsw` vector index for the `user_embedding` field instead of a `flat` vector index"
]
},
{
@@ -613,11 +558,10 @@
"metadata": {},
"outputs": [],
"source": [
- "# We need to modify this schema to have what we want\n",
+ "# Modify this schema to have what we want\n",
"\n",
"index.schema.remove_field(\"job\")\n",
"index.schema.remove_field(\"user_embedding\")\n",
- "\n",
"index.schema.add_fields([\n",
" {\"name\": \"job\", \"type\": \"tag\"},\n",
" {\n",
@@ -642,7 +586,7 @@
"name": "stdout",
"output_type": "stream",
"text": [
- "16:13:34 redisvl.index INFO Index already exists, overwriting.\n"
+ "09:54:18 redisvl.index.index INFO Index already exists, overwriting.\n"
]
}
],
@@ -659,7 +603,7 @@
{
"data": {
"text/html": [
- "| vector_distance | user | age | job | credit_score |
|---|
| 0 | mary | 2 | doctor | low |
| 0 | john | 1 | engineer | high |
| 0.0566299557686 | tyler | 9 | engineer | high |
"
+ "| vector_distance | user | age | job | credit_score |
|---|
| 0 | john | 1 | engineer | high |
| 0 | mary | 2 | doctor | low |
| 0.0566299557686 | tyler | 9 | engineer | high |
"
],
"text/plain": [
""
@@ -713,7 +657,7 @@
"│ offsets_per_term_avg │ 0 │\n",
"│ records_per_doc_avg │ 5 │\n",
"│ sortable_values_size_mb │ 0 │\n",
- "│ total_indexing_time │ 0.138 │\n",
+ "│ total_indexing_time │ 0.254 │\n",
"│ total_inverted_index_blocks │ 11 │\n",
"│ vector_index_sz_mb │ 0.0201416 │\n",
"╰─────────────────────────────┴─────────────╯\n"
diff --git a/redisvl/extensions/llmcache/semantic.py b/redisvl/extensions/llmcache/semantic.py
index c4e65426..10d356b9 100644
--- a/redisvl/extensions/llmcache/semantic.py
+++ b/redisvl/extensions/llmcache/semantic.py
@@ -310,7 +310,7 @@ def store(
key = cache.store(
prompt="What is the captial city of France?",
response="Paris",
- metadata={"city": "Paris", "country": "Fance"}
+ metadata={"city": "Paris", "country": "France"}
)
"""
# Vectorize prompt if necessary and create cache payload
diff --git a/redisvl/index/index.py b/redisvl/index/index.py
index be146e74..ff6b1359 100644
--- a/redisvl/index/index.py
+++ b/redisvl/index/index.py
@@ -207,7 +207,8 @@ def key_separator(self) -> str:
@property
def storage_type(self) -> StorageType:
- """The underlying storage type for the search index: hash or json."""
+ """The underlying storage type for the search index; either
+ hash or json."""
return self.schema.index.storage_type
@property
@@ -228,8 +229,8 @@ def from_yaml(cls, schema_path: str, **kwargs):
.. code-block:: python
from redisvl.index import SearchIndex
+
index = SearchIndex.from_yaml("schemas/schema.yaml")
- index.connect(redis_url="redis://localhost:6379")
"""
schema = IndexSchema.from_yaml(schema_path)
return cls(schema=schema, **kwargs)
@@ -249,6 +250,7 @@ def from_dict(cls, schema_dict: Dict[str, Any], **kwargs):
.. code-block:: python
from redisvl.index import SearchIndex
+
index = SearchIndex.from_dict({
"index": {
"name": "my-index",
@@ -259,7 +261,6 @@ def from_dict(cls, schema_dict: Dict[str, Any], **kwargs):
{"name": "doc-id", "type": "tag"}
]
})
- index.connect(redis_url="redis://localhost:6379")
"""
schema = IndexSchema.from_dict(schema_dict)
@@ -274,12 +275,12 @@ def set_client(self, client: Union[redis.Redis, aredis.Redis]):
raise NotImplementedError
def disconnect(self):
- """Reset the Redis connection."""
+ """Disconnect from the Redis database."""
self._redis_client = None
return self
def key(self, id: str) -> str:
- """Create a redis key as a combination of an index key prefix (optional)
+ """Construct a redis key as a combination of an index key prefix (optional)
and specified id.
The id is typically either a unique identifier, or
@@ -301,10 +302,11 @@ def key(self, id: str) -> str:
class SearchIndex(BaseSearchIndex):
- """A class for interacting with Redis as a vector database.
+ """A search index class for interacting with Redis as a vector database.
- This class is a wrapper around the redis-py client that provides
- purpose-built methods for interacting with Redis as a vector database.
+ The SearchIndex is instantiated with a reference to a Redis database and an
+ IndexSchema (YAML path or dictionary object) that describes the various
+ settings and field configurations.
.. code-block:: python
@@ -384,7 +386,7 @@ def set_client(self, client: redis.Redis):
return self
def create(self, overwrite: bool = False, drop: bool = False) -> None:
- """Create an index in Redis with the given schema and properties.
+ """Create an index in Redis with the current schema and properties.
Args:
overwrite (bool, optional): Whether to overwrite the index if it
@@ -460,8 +462,12 @@ def load(
preprocess: Optional[Callable] = None,
batch_size: Optional[int] = None,
) -> List[str]:
- """Load a batch of objects to Redis. Returns the list of keys loaded to
- Redis.
+ """Load objects to the Redis database. Returns the list of keys loaded
+ to Redis.
+
+ RedisVL automatically handles constructing the object keys, batching,
+ optional preprocessing steps, and setting optional expiration
+ (TTL policies) on keys.
Args:
data (Iterable[Any]): An iterable of objects to store.
@@ -487,7 +493,22 @@ def load(
.. code-block:: python
- keys = index.load([{"test": "foo"}, {"test": "bar"}])
+ data = [{"test": "foo"}, {"test": "bar"}]
+
+ # simple case
+ keys = index.load(data)
+
+ # set 360 second ttl policy on data
+ keys = index.load(data, ttl=360)
+
+ # load data with predefined keys
+ keys = index.load(data, keys=["rvl:foo", "rvl:bar"])
+
+ # load data with preprocessing step
+ def add_field(d):
+ d["new_field"] = 123
+ return d
+ keys = index.load(data, preprocess=add_field)
"""
try:
return self._storage.write(
@@ -559,50 +580,66 @@ def query(self, query: BaseQuery) -> List[Dict[str, Any]]:
.. code-block:: python
+ from redisvl.query import VectorQuery
+
+ query = VectorQuery(
+ vector=[0.16, -0.34, 0.98, 0.23],
+ vector_field_name="embedding",
+ num_results=3
+ )
+
results = index.query(query)
"""
return self._query(query)
- def query_batch(self, query: BaseQuery, batch_size: int = 30) -> Generator:
- """Execute a query on the index while batching results.
+ def paginate(self, query: BaseQuery, page_size: int = 30) -> Generator:
+ """Execute a given query against the index and return results in
+ paginated batches.
- This method takes a BaseQuery object directly, handles optional paging
- support, and post-processing of the search results.
+ This method accepts a RedisVL query instance, enabling pagination of
+ results which allows for subsequent processing over each batch with a
+ generator.
Args:
- query (BaseQuery): The query to run.
- batch_size (int): The size of batches to return on each iteration.
+ query (BaseQuery): The search query to be executed.
+ page_size (int, optional): The number of results to return in each
+ batch. Defaults to 30.
- Returns:
- List[Result]: A list of search results.
+ Yields:
+ A generator yielding batches of search results.
Raises:
- TypeError: If the batch size is not an integer
- ValueError: If the batch size is less than or equal to zero.
-
- .. code-block:: python
+ TypeError: If the page_size argument is not of type int.
+ ValueError: If the page_size argument is less than or equal to zero.
- for batch in index.query_batch(query, batch_size=10):
- # process batched results
+ Example:
+ # Iterate over paginated search results in batches of 10
+ for result_batch in index.paginate(query, page_size=10):
+ # Process each batch of results
pass
+ Note:
+ The page_size parameter controls the number of items each result
+ batch contains. Adjust this value based on performance
+ considerations and the expected volume of search results.
+
"""
- if not isinstance(batch_size, int):
- raise TypeError("batch_size must be an integer")
+ if not isinstance(page_size, int):
+ raise TypeError("page_size must be an integer")
- if batch_size <= 0:
- raise ValueError("batch_size must be greater than 0")
+ if page_size <= 0:
+ raise ValueError("page_size must be greater than 0")
- first = 0
+ offset = 0
while True:
- query.set_paging(first, batch_size)
- batch_results = self._query(query)
- if not batch_results:
+ query.set_paging(offset, page_size)
+ results = self._query(query)
+ if not results:
break
- yield batch_results
- # increment the pagination tracker
- first += batch_size
+ yield results
+ # Increment the offset for the next batch of pagination
+ offset += page_size
def listall(self) -> List[str]:
"""List all search indices in Redis database.
@@ -639,10 +676,12 @@ def info(self) -> Dict[str, Any]:
class AsyncSearchIndex(BaseSearchIndex):
- """A class for interacting with Redis as a vector database in async mode.
+ """A search index class for interacting with Redis as a vector database in
+ async-mode.
- This class is a wrapper around the redis-py async client that provides
- purpose-built methods for interacting with Redis as a vector database.
+ The AsyncSearchIndex is instantiated with a reference to a Redis database
+ and an IndexSchema (YAML path or dictionary object) that describes the
+ various settings and field configurations.
.. code-block:: python
@@ -653,7 +692,7 @@ class AsyncSearchIndex(BaseSearchIndex):
index.connect(redis_url="redis://localhost:6379")
# create the index
- await index.create(overwrite=True)
+ await index.create(overwrite=True)
# data is an iterable of dictionaries
await index.load(data)
@@ -723,7 +762,7 @@ def set_client(self, client: aredis.Redis):
return self
async def create(self, overwrite: bool = False, drop: bool = False) -> None:
- """Asynchronously create an index in Redis with the given schema
+ """Asynchronously create an index in Redis with the current schema
and properties.
Args:
@@ -802,6 +841,10 @@ async def load(
"""Asynchronously load objects to Redis with concurrency control.
Returns the list of keys loaded to Redis.
+ RedisVL automatically handles constructing the object keys, batching,
+ optional preprocessing steps, and setting optional expiration
+ (TTL policies) on keys.
+
Args:
data (Iterable[Any]): An iterable of objects to store.
id_field (Optional[str], optional): Specified field used as the id
@@ -826,7 +869,22 @@ async def load(
.. code-block:: python
- keys = await index.aload([{"test": "foo"}, {"test": "bar"}])
+ data = [{"test": "foo"}, {"test": "bar"}]
+
+ # simple case
+ keys = await index.load(data)
+
+ # set 360 second ttl policy on data
+ keys = await index.load(data, ttl=360)
+
+ # load data with predefined keys
+ keys = await index.load(data, keys=["rvl:foo", "rvl:bar"])
+
+ # load data with preprocessing step
+ async def add_field(d):
+ d["new_field"] = 123
+ return d
+ keys = await index.load(data, preprocess=add_field)
"""
try:
@@ -897,50 +955,65 @@ async def query(self, query: BaseQuery) -> List[Dict[str, Any]]:
.. code-block:: python
- results = await aindex.query(query)
+ from redisvl.query import VectorQuery
+
+ query = VectorQuery(
+ vector=[0.16, -0.34, 0.98, 0.23],
+ vector_field_name="embedding",
+ num_results=3
+ )
+
+ results = await index.query(query)
"""
return await self._query(query)
- async def query_batch(
- self, query: BaseQuery, batch_size: int = 30
- ) -> AsyncGenerator:
- """Execute a query on the index with batching.
+ async def paginate(self, query: BaseQuery, page_size: int = 30) -> AsyncGenerator:
+ """Execute a given query against the index and return results in
+ paginated batches.
- This method takes a BaseQuery object directly, handles optional paging
- support, and post-processing of the search results.
+ This method accepts a RedisVL query instance, enabling async pagination
+ of results which allows for subsequent processing over each batch with a
+ generator.
Args:
- query (BaseQuery): The query to run.
- batch_size (int): The size of batches to return on each iteration.
+ query (BaseQuery): The search query to be executed.
+ page_size (int, optional): The number of results to return in each
+ batch. Defaults to 30.
- Returns:
- List[Result]: A list of search results.
+ Yields:
+ An async generator yielding batches of search results.
Raises:
- TypeError: If the batch size is not an integer
- ValueError: If the batch size is less than or equal to zero.
+ TypeError: If the page_size argument is not of type int.
+ ValueError: If the page_size argument is less than or equal to zero.
- .. code-block:: python
-
- async for batch in index.query_batch(query, batch_size=10):
- # process batched results
+ Example:
+ # Iterate over paginated search results in batches of 10
+ async for result_batch in index.paginate(query, page_size=10):
+ # Process each batch of results
pass
+
+ Note:
+ The page_size parameter controls the number of items each result
+ batch contains. Adjust this value based on performance
+ considerations and the expected volume of search results.
+
"""
- if not isinstance(batch_size, int):
- raise TypeError("batch_size must be an integer")
+ if not isinstance(page_size, int):
+ raise TypeError("page_size must be an integer")
- if batch_size <= 0:
- raise ValueError("batch_size must be greater than 0")
+ if page_size <= 0:
+ raise ValueError("page_size must be greater than 0")
first = 0
while True:
- query.set_paging(first, batch_size)
- batch_results = await self._query(query)
- if not batch_results:
+ query.set_paging(first, page_size)
+ results = await self._query(query)
+ if not results:
break
- yield batch_results
+ yield results
# increment the pagination tracker
- first += batch_size
+ first += page_size
async def listall(self) -> List[str]:
"""List all search indices in Redis database.
diff --git a/redisvl/query/filter.py b/redisvl/query/filter.py
index 7ddea504..4ef2c0e6 100644
--- a/redisvl/query/filter.py
+++ b/redisvl/query/filter.py
@@ -80,7 +80,7 @@ def wrapper(instance: Any, *args: List[Any], **kwargs: Dict[str, Any]) -> Any:
class Tag(FilterField):
- """A Tag is a FilterField representing a tag in a Redis index."""
+ """A Tag filter can be applied to Tag fields"""
OPERATORS: Dict[FilterOperator, str] = {
FilterOperator.EQ: "==",
@@ -94,14 +94,6 @@ class Tag(FilterField):
}
SUPPORTED_VAL_TYPES = (list, set, tuple, str, type(None))
- def __init__(self, field: str):
- """Create a Tag FilterField.
-
- Args:
- field (str): The name of the tag field in the index to be queried against
- """
- super().__init__(field)
-
def _set_tag_value(
self, other: Union[List[str], Set[str], str], operator: FilterOperator
):
@@ -129,7 +121,8 @@ def __eq__(self, other: Union[List[str], str]) -> "FilterExpression":
.. code-block:: python
from redisvl.query.filter import Tag
- filter = Tag("brand") == "nike"
+
+ f = Tag("brand") == "nike"
"""
self._set_tag_value(other, FilterOperator.EQ)
return FilterExpression(str(self))
@@ -144,7 +137,7 @@ def __ne__(self, other) -> "FilterExpression":
.. code-block:: python
from redisvl.query.filter import Tag
- filter = Tag("brand") != "nike"
+ f = Tag("brand") != "nike"
"""
self._set_tag_value(other, FilterOperator.NE)
@@ -155,7 +148,7 @@ def _formatted_tag_value(self) -> str:
return "|".join([self.escaper.escape(tag) for tag in self._value])
def __str__(self) -> str:
- """Return the Redis Query syntax for a Tag filter expression."""
+ """Return the Redis Query string for the Tag filter"""
if not self._value:
return "*"
@@ -221,15 +214,16 @@ class Geo(FilterField):
@check_operator_misuse
def __eq__(self, other) -> "FilterExpression":
- """Create a Geographic equality filter expression.
+ """Create a geographic filter within a specified GeoRadius.
Args:
- other (GeoSpec): The geographic spec to filter on.
+ other (GeoRadius): The geographic spec to filter on.
.. code-block:: python
from redisvl.query.filter import Geo, GeoRadius
- filter = Geo("location") == GeoRadius(-122.4194, 37.7749, 1, unit="m")
+
+ f = Geo("location") == GeoRadius(-122.4194, 37.7749, 1, unit="m")
"""
self._set_value(other, self.SUPPORTED_VAL_TYPES, FilterOperator.EQ) # type: ignore
@@ -237,22 +231,23 @@ def __eq__(self, other) -> "FilterExpression":
@check_operator_misuse
def __ne__(self, other) -> "FilterExpression":
- """Create a Geographic inequality filter expression.
+ """Create a geographic filter outside of a specified GeoRadius.
Args:
- other (GeoSpec): The geographic spec to filter on.
+ other (GeoRadius): The geographic spec to filter on.
.. code-block:: python
from redisvl.query.filter import Geo, GeoRadius
- filter = Geo("location") != GeoRadius(-122.4194, 37.7749, 1, unit="m")
+
+ f = Geo("location") != GeoRadius(-122.4194, 37.7749, 1, unit="m")
"""
self._set_value(other, self.SUPPORTED_VAL_TYPES, FilterOperator.NE) # type: ignore
return FilterExpression(str(self))
def __str__(self) -> str:
- """Return the Redis Query syntax for a Geographic filter expression."""
+ """Return the Redis Query string for the Geo filter"""
if not self._value:
return "*"
@@ -292,7 +287,7 @@ def __eq__(self, other: int) -> "FilterExpression":
.. code-block:: python
from redisvl.query.filter import Num
- filter = Num("zipcode") == 90210
+ f = Num("zipcode") == 90210
"""
self._set_value(other, self.SUPPORTED_VAL_TYPES, FilterOperator.EQ)
@@ -307,7 +302,8 @@ def __ne__(self, other: int) -> "FilterExpression":
.. code-block:: python
from redisvl.query.filter import Num
- filter = Num("zipcode") != 90210
+
+ f = Num("zipcode") != 90210
"""
self._set_value(other, self.SUPPORTED_VAL_TYPES, FilterOperator.NE)
@@ -322,7 +318,8 @@ def __gt__(self, other: int) -> "FilterExpression":
.. code-block:: python
from redisvl.query.filter import Num
- filter = Num("age") > 18
+
+ f = Num("age") > 18
"""
self._set_value(other, self.SUPPORTED_VAL_TYPES, FilterOperator.GT)
@@ -337,7 +334,8 @@ def __lt__(self, other: int) -> "FilterExpression":
.. code-block:: python
from redisvl.query.filter import Num
- filter = Num("age") < 18
+
+ f = Num("age") < 18
"""
self._set_value(other, self.SUPPORTED_VAL_TYPES, FilterOperator.LT)
@@ -352,7 +350,8 @@ def __ge__(self, other: int) -> "FilterExpression":
.. code-block:: python
from redisvl.query.filter import Num
- filter = Num("age") >= 18
+
+ f = Num("age") >= 18
"""
self._set_value(other, self.SUPPORTED_VAL_TYPES, FilterOperator.GE)
@@ -367,14 +366,15 @@ def __le__(self, other: int) -> "FilterExpression":
.. code-block:: python
from redisvl.query.filter import Num
- filter = Num("age") <= 18
+
+ f = Num("age") <= 18
"""
self._set_value(other, self.SUPPORTED_VAL_TYPES, FilterOperator.LE)
return FilterExpression(str(self))
def __str__(self) -> str:
- """Return the Redis Query syntax for a Numeric filter expression."""
+ """Return the Redis Query string for the Numeric filter"""
if not self._value:
return "*"
@@ -414,7 +414,8 @@ def __eq__(self, other: str) -> "FilterExpression":
.. code-block:: python
from redisvl.query.filter import Text
- filter = Text("job") == "engineer"
+
+ f = Text("job") == "engineer"
"""
self._set_value(other, self.SUPPORTED_VAL_TYPES, FilterOperator.EQ)
@@ -432,7 +433,8 @@ def __ne__(self, other: str) -> "FilterExpression":
.. code-block:: python
from redisvl.query.filter import Text
- filter = Text("job") != "engineer"
+
+ f = Text("job") != "engineer"
"""
self._set_value(other, self.SUPPORTED_VAL_TYPES, FilterOperator.NE)
@@ -450,16 +452,18 @@ def __mod__(self, other: str) -> "FilterExpression":
.. code-block:: python
from redisvl.query.filter import Text
- filter = Text("job") % "engine*" # suffix wild card match
- filter = Text("job") % "%%engine%%" # fuzzy match w/ Levenshtein Distance
- filter = Text("job") % "engineer|doctor" # contains either term in field
- filter = Text("job") % "engineer doctor" # contains both terms in field
+
+ f = Text("job") % "engine*" # suffix wild card match
+ f = Text("job") % "%%engine%%" # fuzzy match w/ Levenshtein Distance
+ f = Text("job") % "engineer|doctor" # contains either term in field
+ f = Text("job") % "engineer doctor" # contains both terms in field
"""
self._set_value(other, self.SUPPORTED_VAL_TYPES, FilterOperator.LIKE)
return FilterExpression(str(self))
def __str__(self) -> str:
+ """Return the Redis Query string for the Text filter"""
if not self._value:
return "*"
@@ -470,37 +474,42 @@ def __str__(self) -> str:
class FilterExpression:
- """A FilterExpression is a logical expression of FilterFields.
+ """A FilterExpression is a logical combination of filters in RedisVL.
FilterExpressions can be combined using the & and | operators to create
- complex logical expressions that evaluate to the Redis Query language.
+ complex expressions that evaluate to the Redis Query language.
This presents an interface by which users can create complex queries
without having to know the Redis Query language.
- Filter expressions are not created directly. Instead they are built
- by combining FilterFields using the & and | operators.
-
.. code-block:: python
from redisvl.query.filter import Tag, Num
+
brand_is_nike = Tag("brand") == "nike"
price_is_over_100 = Num("price") < 100
- filter = brand_is_nike & price_is_over_100
- print(str(filter))
- (@brand:{nike} @price:[-inf (100)])
+ f = brand_is_nike & price_is_over_100
+
+ print(str(f))
+
+ >>> (@brand:{nike} @price:[-inf (100)])
This can be combined with the VectorQuery class to create a query:
.. code-block:: python
from redisvl.query import VectorQuery
+
v = VectorQuery(
- ... vector=[0.1, 0.1, 0.5, ...],
- ... vector_field_name="product_embedding",
- ... return_fields=["product_id", "brand", "price"],
- ... filter_expression=filter,
- ... )
+ vector=[0.1, 0.1, 0.5, ...],
+ vector_field_name="product_embedding",
+ return_fields=["product_id", "brand", "price"],
+ filter_expression=f,
+ )
+
+ Note:
+ Filter expressions are typically not called directly. Instead they are
+ built by combining filter statements using the & and | operators.
"""
diff --git a/redisvl/query/query.py b/redisvl/query/query.py
index 82111d04..a76a85b2 100644
--- a/redisvl/query/query.py
+++ b/redisvl/query/query.py
@@ -25,7 +25,7 @@ def __str__(self) -> str:
return " ".join([str(x) for x in self.query.get_args()])
def set_filter(self, filter_expression: Optional[FilterExpression] = None):
- """Set the filter for the query.
+ """Set the filter expression for the query.
Args:
filter_expression (Optional[FilterExpression], optional): The filter
@@ -45,7 +45,7 @@ def set_filter(self, filter_expression: Optional[FilterExpression] = None):
)
def get_filter(self) -> FilterExpression:
- """Get the filter for the query.
+ """Get the filter expression for the query.
Returns:
FilterExpression: The filter for the query.
@@ -53,21 +53,19 @@ def get_filter(self) -> FilterExpression:
return self._filter
def set_paging(self, first: int, limit: int):
- """Set the paging parameters for the query to limit the results between
- fist and num_results.
+ """Set the paging parameters for the query to limit the number of
+ results.
Args:
first (int): The zero-indexed offset for which to fetch query results
- limit (int): _description_
+ limit (int): The max number of results to include including the offset
Raises:
- TypeError: _description_
- TypeError: _description_
+ TypeError: If first or limit are NOT integers.
"""
- if not isinstance(first, int):
- raise TypeError("first must be of type int")
- if not isinstance(limit, int):
- raise TypeError("limit must be of type int")
+ if not isinstance(first, int) or not isinstance(limit, int):
+ raise TypeError("Paging params must both be integers")
+
self._first = first
self._limit = limit
@@ -87,7 +85,7 @@ def __init__(
dialect: int = 2,
params: Optional[Dict[str, Any]] = None,
):
- """Query for a simple count operation provided some filter expression.
+ """A query for a simple count operation provided some filter expression.
Args:
filter_expression (FilterExpression): The filter expression to query for.
@@ -100,9 +98,11 @@ def __init__(
from redisvl.query import CountQuery
from redisvl.query.filter import Tag
+
t = Tag("brand") == "Nike"
- q = CountQuery(filter_expression=t)
- count = index.query(q)
+ query = CountQuery(filter_expression=t)
+
+ count = index.query(query)
"""
super().__init__(num_results=0, dialect=dialect)
self.set_filter(filter_expression)
@@ -113,7 +113,7 @@ def query(self) -> Query:
"""The loaded Redis-Py query.
Returns:
- redis.commands.search.query.Query: The query object.
+ redis.commands.search.query.Query: The Redis-Py query object.
"""
base_query = str(self._filter)
query = Query(base_query).no_content().paging(0, 0).dialect(self._dialect)
@@ -138,7 +138,7 @@ def __init__(
dialect: int = 2,
params: Optional[Dict[str, Any]] = None,
):
- """Query for a filter expression.
+ """A query for a running a filtered search with a filter expression.
Args:
filter_expression (FilterExpression): The filter expression to
@@ -157,6 +157,7 @@ def __init__(
from redisvl.query import FilterQuery
from redisvl.query.filter import Tag
+
t = Tag("brand") == "Nike"
q = FilterQuery(return_fields=["brand", "price"], filter_expression=t)
@@ -170,7 +171,7 @@ def query(self) -> Query:
"""Return a Redis-Py Query object representing the query.
Returns:
- redis.commands.search.query.Query: The query object.
+ redis.commands.search.query.Query: The Redis-Py query object.
"""
base_query = str(self._filter)
query = (
@@ -223,21 +224,31 @@ def __init__(
return_score: bool = True,
dialect: int = 2,
):
- """Query for vector fields.
-
- Read more: https://redis.io/docs/interact/search-and-query/search/vectors/#knn-search
+ """A query for running a vector search along with an optional filter
+ expression.
Args:
- vector (List[float]): The vector to query for.
- vector_field_name (str): The name of the vector field.
- return_fields (List[str]): The fields to return.
- filter_expression (FilterExpression, optional): A filter to apply to the query. Defaults to None.
- dtype (str, optional): The dtype of the vector. Defaults to "float32".
- num_results (Optional[int], optional): The number of results to return. Defaults to 10.
- return_score (bool, optional): Whether to return the score. Defaults to True.
+ vector (List[float]): The vector to perform the vector search with.
+ vector_field_name (str): The name of the vector field to search
+ against in the database.
+ return_fields (List[str]): The declared fields to return with search
+ results.
+ filter_expression (FilterExpression, optional): A filter to apply
+ along with the vector search. Defaults to None.
+ dtype (str, optional): The dtype of the vector. Defaults to
+ "float32".
+ num_results (int, optional): The top k results to return from the
+ vector search. Defaults to 10.
+ return_score (bool, optional): Whether to return the vector
+ distance. Defaults to True.
+ dialect (int, optional): The RediSearch query dialect.
+ Defaults to 2.
Raises:
TypeError: If filter_expression is not of type redisvl.query.FilterExpression
+
+ Note:
+ Learn more about vector queries in Redis: https://redis.io/docs/interact/search-and-query/search/vectors/#knn-search
"""
super().__init__(
vector,
@@ -255,7 +266,7 @@ def query(self) -> Query:
"""Return a Redis-Py Query object representing the query.
Returns:
- redis.commands.search.query.Query: The query object.
+ redis.commands.search.query.Query: The Redis-Py query object.
"""
base_query = f"{str(self._filter)}=>[KNN {self._num_results} @{self._field} ${self.VECTOR_PARAM} AS {self.DISTANCE_ID}]"
query = (
@@ -297,26 +308,35 @@ def __init__(
return_score: bool = True,
dialect: int = 2,
):
- """Vector query by distance range.
-
- Range queries are for filtering vector search results
- by the distance between a vector field value and a query
- vector, in terms of the index distance metric.
-
- Read more: https://redis.io/docs/interact/search-and-query/search/vectors/#range-query
+ """A query for running a filtered vector search based on semantic
+ distance threshold.
Args:
- vector (List[float]): The vector to query for.
- vector_field_name (str): The name of the vector field.
- return_fields (List[str]): The fields to return.
- filter_expression (FilterExpression, optional): A filter to apply to the query. Defaults to None.
- dtype (str, optional): The dtype of the vector. Defaults to "float32".
- distance_threshold (str, float): The threshold for vector distance. Defaults to 0.2.
- num_results (int): The MAX number of results to return. defaults to 10.
- return_score (bool, optional): Whether to return the score. Defaults to True.
+ vector (List[float]): The vector to perform the range query with.
+ vector_field_name (str): The name of the vector field to search
+ against in the database.
+ return_fields (List[str]): The declared fields to return with search
+ results.
+ filter_expression (FilterExpression, optional): A filter to apply
+ along with the range query. Defaults to None.
+ dtype (str, optional): The dtype of the vector. Defaults to
+ "float32".
+ distance_threshold (str, float): The threshold for vector distance.
+ A smaller threshold indicates a stricter semantic search.
+ Defaults to 0.2.
+ num_results (int): The MAX number of results to return.
+ Defaults to 10.
+ return_score (bool, optional): Whether to return the vector
+ distance. Defaults to True.
+ dialect (int, optional): The RediSearch query dialect.
+ Defaults to 2.
Raises:
TypeError: If filter_expression is not of type redisvl.query.FilterExpression
+
+ Note:
+ Learn more about vector range queries: https://redis.io/docs/interact/search-and-query/search/vectors/#range-query
+
"""
super().__init__(
vector,
@@ -354,7 +374,7 @@ def query(self) -> Query:
"""Return a Redis-Py Query object representing the query.
Returns:
- redis.commands.search.query.Query: The query object.
+ redis.commands.search.query.Query: The Redis-Py query object.
"""
base_query = f"@{self._field}:[VECTOR_RANGE ${self.DISTANCE_THRESHOLD_PARAM} ${self.VECTOR_PARAM}]"
diff --git a/redisvl/redis/connection.py b/redisvl/redis/connection.py
index c88efaf0..98d76482 100644
--- a/redisvl/redis/connection.py
+++ b/redisvl/redis/connection.py
@@ -1,5 +1,5 @@
import os
-from typing import Optional
+from typing import Any, Dict, List, Optional
from redis import ConnectionPool, Redis
from redis.asyncio import Redis as AsyncRedis
@@ -101,7 +101,9 @@ def get_async_redis_connection(url: Optional[str] = None, **kwargs) -> AsyncRedi
return AsyncRedis.from_url(get_address_from_env(), **kwargs)
@staticmethod
- def validate_redis_modules(client: Redis) -> None:
+ def validate_redis_modules(
+ client: Redis, redis_required_modules: Optional[List[Dict[str, Any]]] = None
+ ) -> None:
"""Validates if the required Redis modules are installed.
Args:
@@ -111,11 +113,14 @@ def validate_redis_modules(client: Redis) -> None:
ValueError: If required Redis modules are not installed.
"""
RedisConnectionFactory._validate_redis_modules(
- convert_bytes(client.module_list())
+ convert_bytes(client.module_list()), redis_required_modules
)
@staticmethod
- def validate_async_redis_modules(client: AsyncRedis) -> None:
+ def validate_async_redis_modules(
+ client: AsyncRedis,
+ redis_required_modules: Optional[List[Dict[str, Any]]] = None,
+ ) -> None:
"""
Validates if the required Redis modules are installed.
@@ -128,21 +133,28 @@ def validate_async_redis_modules(client: AsyncRedis) -> None:
temp_client = Redis(
connection_pool=ConnectionPool(**client.connection_pool.connection_kwargs)
)
- RedisConnectionFactory.validate_redis_modules(temp_client)
+ RedisConnectionFactory.validate_redis_modules(
+ temp_client, redis_required_modules
+ )
@staticmethod
- def _validate_redis_modules(installed_modules) -> None:
+ def _validate_redis_modules(
+ installed_modules, redis_required_modules: Optional[List[Dict[str, Any]]] = None
+ ) -> None:
"""
Validates if required Redis modules are installed.
Args:
installed_modules: List of installed modules.
+ redis_required_modules: List of required modules.
Raises:
ValueError: If required Redis modules are not installed.
"""
installed_modules = {module["name"]: module for module in installed_modules}
- for required_module in REDIS_REQUIRED_MODULES:
+ redis_required_modules = redis_required_modules or REDIS_REQUIRED_MODULES
+
+ for required_module in redis_required_modules:
if required_module["name"] in installed_modules:
installed_version = installed_modules[required_module["name"]]["ver"]
if int(installed_version) >= int(required_module["ver"]): # type: ignore
diff --git a/redisvl/schema/schema.py b/redisvl/schema/schema.py
index 60d72bae..36c1ab80 100644
--- a/redisvl/schema/schema.py
+++ b/redisvl/schema/schema.py
@@ -28,11 +28,30 @@ class StorageType(Enum):
class IndexInfo(BaseModel):
- """
- Represents the basic configuration information for an index in Redis.
+ """Index info includes the essential details regarding index settings,
+ such as its name, prefix, key separator, and storage type in Redis.
+
+ In yaml format, the index info section looks like:
+
+ .. code-block:: yaml
+
+ index:
+ name: user-index
+ prefix: user
+ key_separtor: ':'
+ storage_type: json
+
+ In dict format, the index info section looks like:
+
+ .. code-block:: python
+
+ {"index": {
+ "name": "user-index",
+ "prefix": "user",
+ "key_separator": ":",
+ "storage_type": "json"
+ }}
- This class includes the essential details required to define an index, such as
- its name, prefix, key separator, and storage type.
"""
name: str
@@ -54,12 +73,8 @@ def dict(self, *args, **kwargs) -> Dict[str, Any]:
class IndexSchema(BaseModel):
- """Represents a schema definition for a search index in Redis, primarily
- used in RedisVL for organizing and querying vector and metadata fields.
-
- This schema provides a structured format to define the layout and types of
- fields stored in Redis, including details such as storage type, field
- definitions, and key formatting conventions.
+ """A schema definition for a search index in Redis, used in RedisVL for
+ configuring index settings and organizing vector and metadata fields.
The class offers methods to create an index schema from a YAML file or a
Python dictionary, supporting flexible schema definitions and easy
@@ -74,6 +89,7 @@ class IndexSchema(BaseModel):
index:
name: user-index
prefix: user
+ key_separator: ":"
storage_type: json
fields:
@@ -89,20 +105,25 @@ class IndexSchema(BaseModel):
distance_metric: cosine
datatype: float32
- Loading the schema with RedisVL using yaml or dict format:
+ Loading the schema for RedisVL from yaml is as simple as:
.. code-block:: python
from redisvl.schema import IndexSchema
- # From YAML
schema = IndexSchema.from_yaml("schema.yaml")
- # From Dict
+ Loading the schema for RedisVL from dict is as simple as:
+
+ .. code-block:: python
+
+ from redisvl.schema import IndexSchema
+
schema = IndexSchema.from_dict({
"index": {
"name": "user-index",
"prefix": "user",
+ "key_separator": ":",
"storage_type": "json",
},
"fields": [
@@ -114,7 +135,7 @@ class IndexSchema(BaseModel):
"attrs": {
"algorithm": "flat",
"dims": 3,
- "distance_metrics": "cosine",
+ "distance_metric": "cosine",
"datatype": "float32"
}
}
diff --git a/tests/integration/test_llmcache.py b/tests/integration/test_llmcache.py
index de7495b0..d08cbaa8 100644
--- a/tests/integration/test_llmcache.py
+++ b/tests/integration/test_llmcache.py
@@ -1,11 +1,12 @@
+from collections import namedtuple
from time import sleep
import pytest
from redisvl.extensions.llmcache import SemanticCache
-from redisvl.utils.vectorize import HFTextVectorizer
from redisvl.index.index import SearchIndex
-from collections import namedtuple
+from redisvl.utils.vectorize import HFTextVectorizer
+
@pytest.fixture
def vectorizer():
@@ -19,11 +20,13 @@ def cache(vectorizer):
cache_instance.clear() # Clear cache after each test
cache_instance._index.delete(True) # Clean up index
+
@pytest.fixture
def cache_no_cleanup(vectorizer):
cache_instance = SemanticCache(vectorizer=vectorizer, distance_threshold=0.2)
yield cache_instance
+
@pytest.fixture
def cache_with_ttl(vectorizer):
cache_instance = SemanticCache(vectorizer=vectorizer, distance_threshold=0.2, ttl=2)
@@ -31,13 +34,17 @@ def cache_with_ttl(vectorizer):
cache_instance.clear() # Clear cache after each test
cache_instance._index.delete(True) # Clean up index
+
@pytest.fixture
def cache_with_redis_client(vectorizer, client):
- cache_instance = SemanticCache(vectorizer=vectorizer, redis_client=client, distance_threshold=0.2)
+ cache_instance = SemanticCache(
+ vectorizer=vectorizer, redis_client=client, distance_threshold=0.2
+ )
yield cache_instance
cache_instance.clear() # Clear cache after each test
cache_instance._index.delete(True) # Clean up index
+
# Test basic store and check functionality
def test_store_and_check(cache, vectorizer):
prompt = "This is a test prompt."
@@ -94,11 +101,13 @@ def test_check_invalid_input(cache):
with pytest.raises(TypeError):
cache.check(prompt="test", return_fields="bad value")
+
# Test handling invalid input for check method
def test_bad_ttl(cache):
with pytest.raises(ValueError):
cache.set_ttl(2.5)
+
# Test storing with metadata
def test_store_with_metadata(cache, vectorizer):
prompt = "This is another test prompt."
@@ -115,17 +124,21 @@ def test_store_with_metadata(cache, vectorizer):
assert check_result[0]["metadata"] == metadata
assert check_result[0]["prompt"] == prompt
+
# Test storing with invalid metadata
def test_store_with_invalid_metadata(cache, vectorizer):
prompt = "This is another test prompt."
response = "This is another test response."
- metadata = namedtuple('metadata', 'source')(**{'source': 'test'})
+ metadata = namedtuple("metadata", "source")(**{"source": "test"})
vector = vectorizer.embed(prompt)
- with pytest.raises(TypeError, match=r"If specified, cached metadata must be a dictionary."):
+ with pytest.raises(
+ TypeError, match=r"If specified, cached metadata must be a dictionary."
+ ):
cache.store(prompt, response, vector=vector, metadata=metadata)
+
# Test setting and getting the distance threshold
def test_distance_threshold(cache):
initial_threshold = cache.distance_threshold
@@ -135,12 +148,14 @@ def test_distance_threshold(cache):
assert cache.distance_threshold == new_threshold
assert cache.distance_threshold != initial_threshold
+
# Test out of range distance threshold
def test_distance_threshold_out_of_range(cache):
out_of_range_threshold = -1
with pytest.raises(ValueError):
cache.set_threshold(out_of_range_threshold)
+
# Test storing and retrieving multiple items
def test_multiple_items(cache, vectorizer):
prompts_responses = {
@@ -161,10 +176,12 @@ def test_multiple_items(cache, vectorizer):
assert check_result[0]["response"] == expected_response
assert "metadata" not in check_result[0]
+
# Test retrieving underlying SearchIndex for the cache.
def test_get_index(cache):
assert isinstance(cache.index, SearchIndex)
+
# Test basic functionality with cache created with user-provided Redis client
def test_store_and_check_with_provided_client(cache_with_redis_client, vectorizer):
prompt = "This is a test prompt."
@@ -179,7 +196,8 @@ def test_store_and_check_with_provided_client(cache_with_redis_client, vectorize
assert response == check_result[0]["response"]
assert "metadata" not in check_result[0]
+
# Test deleting the cache
def test_delete(cache_no_cleanup, vectorizer):
cache_no_cleanup.delete()
- assert not cache_no_cleanup.index.exists()
\ No newline at end of file
+ assert not cache_no_cleanup.index.exists()
diff --git a/tests/integration/test_query.py b/tests/integration/test_query.py
index fb5b3f74..35aea066 100644
--- a/tests/integration/test_query.py
+++ b/tests/integration/test_query.py
@@ -300,10 +300,10 @@ def test_filter_combinations(index, query):
search(query, index, n & t & g, 1, age_range=(18, 99), location="-122.4194,37.7749")
-def test_query_batch_vector_query(index, vector_query, sample_data):
+def test_paginate_vector_query(index, vector_query, sample_data):
batch_size = 2
all_results = []
- for i, batch in enumerate(index.query_batch(vector_query, batch_size), start=1):
+ for i, batch in enumerate(index.paginate(vector_query, batch_size), start=1):
all_results.extend(batch)
assert len(batch) <= batch_size
@@ -313,10 +313,10 @@ def test_query_batch_vector_query(index, vector_query, sample_data):
assert i == expected_iterations
-def test_query_batch_filter_query(index, filter_query):
+def test_paginate_filter_query(index, filter_query):
batch_size = 3
all_results = []
- for i, batch in enumerate(index.query_batch(filter_query, batch_size), start=1):
+ for i, batch in enumerate(index.paginate(filter_query, batch_size), start=1):
all_results.extend(batch)
assert len(batch) <= batch_size
@@ -327,10 +327,10 @@ def test_query_batch_filter_query(index, filter_query):
assert all(item["credit_score"] == "high" for item in all_results)
-def test_query_batch_range_query(index, range_query):
+def test_paginate_range_query(index, range_query):
batch_size = 1
all_results = []
- for i, batch in enumerate(index.query_batch(range_query, batch_size), start=1):
+ for i, batch in enumerate(index.paginate(range_query, batch_size), start=1):
all_results.extend(batch)
assert len(batch) <= batch_size
diff --git a/tests/integration/test_search_results.py b/tests/integration/test_search_results.py
index 15b2048a..ed7096b7 100644
--- a/tests/integration/test_search_results.py
+++ b/tests/integration/test_search_results.py
@@ -4,6 +4,7 @@
from redisvl.query import FilterQuery
from redisvl.query.filter import Tag
+
@pytest.fixture
def filter_query():
return FilterQuery(
@@ -11,6 +12,7 @@ def filter_query():
filter_expression=Tag("credit_score") == "high",
)
+
@pytest.fixture
def index(sample_data):
fields_spec = [
@@ -57,6 +59,7 @@ def index(sample_data):
# clean up
index.delete(drop=True)
+
def test_process_results_unpacks_json_properly(index, filter_query):
results = index.query(filter_query)
- assert len(results) == 4
\ No newline at end of file
+ assert len(results) == 4
diff --git a/tests/unit/test_async_search_index.py b/tests/unit/test_async_search_index.py
index 1676ff7d..ff1c7d9f 100644
--- a/tests/unit/test_async_search_index.py
+++ b/tests/unit/test_async_search_index.py
@@ -1,9 +1,9 @@
import pytest
from redisvl.index import AsyncSearchIndex
+from redisvl.query import VectorQuery
from redisvl.redis.utils import convert_bytes
from redisvl.schema import IndexSchema, StorageType
-from redisvl.query import VectorQuery
fields = [{"name": "test", "type": "tag"}]
@@ -148,6 +148,7 @@ async def test_check_index_exists_before_delete(async_client, async_index):
with pytest.raises(ValueError):
await async_index.delete()
+
@pytest.mark.asyncio
async def test_check_index_exists_before_search(async_client, async_index):
async_index.set_client(async_client)
@@ -163,6 +164,7 @@ async def test_check_index_exists_before_search(async_client, async_index):
with pytest.raises(ValueError):
await async_index.search(query.query, query_params=query.params)
+
@pytest.mark.asyncio
async def test_check_index_exists_before_info(async_client, async_index):
async_index.set_client(async_client)
diff --git a/tests/unit/test_search_index.py b/tests/unit/test_search_index.py
index 4e83c9ab..8a27e5b3 100644
--- a/tests/unit/test_search_index.py
+++ b/tests/unit/test_search_index.py
@@ -1,9 +1,9 @@
import pytest
from redisvl.index import SearchIndex
+from redisvl.query import VectorQuery
from redisvl.redis.utils import convert_bytes
from redisvl.schema import IndexSchema, StorageType
-from redisvl.query import VectorQuery
fields = [{"name": "test", "type": "tag"}]
@@ -12,14 +12,17 @@
def index_schema():
return IndexSchema.from_dict({"index": {"name": "my_index"}, "fields": fields})
+
@pytest.fixture
def index(index_schema):
return SearchIndex(schema=index_schema)
+
@pytest.fixture
def index_from_yaml():
return SearchIndex.from_yaml("schemas/test_json_schema.yaml")
+
def test_search_index_properties(index_schema, index):
assert index.schema == index_schema
# custom settings
@@ -31,6 +34,7 @@ def test_search_index_properties(index_schema, index):
assert index.storage_type == index_schema.index.storage_type == StorageType.HASH
assert index.key("foo").startswith(index.prefix)
+
def test_search_index_from_yaml(index_from_yaml):
assert index_from_yaml.name == "json-test"
assert index_from_yaml.client == None
@@ -39,6 +43,7 @@ def test_search_index_from_yaml(index_from_yaml):
assert index_from_yaml.storage_type == StorageType.JSON
assert index_from_yaml.key("foo").startswith(index_from_yaml.prefix)
+
def test_search_index_no_prefix(index_schema):
# specify an explicitly empty prefix...
index_schema.index.prefix = ""
@@ -129,6 +134,7 @@ def test_no_id_field(client, index):
with pytest.raises(ValueError):
index.load(bad_data, id_field="key")
+
def test_check_index_exists_before_delete(client, index):
index.set_client(client)
index.create(overwrite=True, drop=True)
@@ -136,6 +142,7 @@ def test_check_index_exists_before_delete(client, index):
with pytest.raises(ValueError):
index.delete()
+
def test_check_index_exists_before_search(client, index):
index.set_client(client)
index.create(overwrite=True, drop=True)
@@ -150,6 +157,7 @@ def test_check_index_exists_before_search(client, index):
with pytest.raises(ValueError):
index.search(query.query, query_params=query.params)
+
def test_check_index_exists_before_info(client, index):
index.set_client(client)
index.create(overwrite=True, drop=True)
@@ -158,6 +166,7 @@ def test_check_index_exists_before_info(client, index):
with pytest.raises(ValueError):
index.info()
+
def test_index_needs_valid_schema():
with pytest.raises(ValueError, match=r"Must provide a valid IndexSchema object"):
- index = SearchIndex(schema="Not A Valid Schema")
\ No newline at end of file
+ index = SearchIndex(schema="Not A Valid Schema")
diff --git a/tests/unit/test_utils.py b/tests/unit/test_utils.py
index 9a3c45db..ca535c5a 100644
--- a/tests/unit/test_utils.py
+++ b/tests/unit/test_utils.py
@@ -1,45 +1,58 @@
-import pytest
import numpy as np
-from redisvl.redis.utils import make_dict, buffer_to_array, convert_bytes, array_to_buffer
+import pytest
+
+from redisvl.redis.utils import (
+ array_to_buffer,
+ buffer_to_array,
+ convert_bytes,
+ make_dict,
+)
+
def test_even_number_of_elements():
"""Test with an even number of elements"""
- values = ['key1', 'value1', 'key2', 'value2']
- expected = {'key1': 'value1', 'key2': 'value2'}
+ values = ["key1", "value1", "key2", "value2"]
+ expected = {"key1": "value1", "key2": "value2"}
assert make_dict(values) == expected
+
def test_odd_number_of_elements():
"""Test with an odd number of elements - expecting the last element to be ignored"""
- values = ['key1', 'value1', 'key2']
- expected = {'key1': 'value1'} # 'key2' has no pair, so it's ignored
+ values = ["key1", "value1", "key2"]
+ expected = {"key1": "value1"} # 'key2' has no pair, so it's ignored
assert make_dict(values) == expected
+
def test_different_data_types():
"""Test with different data types as keys and values"""
- values = [1, 'one', 2.0, 'two']
- expected = {1: 'one', 2.0: 'two'}
+ values = [1, "one", 2.0, "two"]
+ expected = {1: "one", 2.0: "two"}
assert make_dict(values) == expected
+
def test_empty_list():
"""Test with an empty list"""
values = []
expected = {}
assert make_dict(values) == expected
+
def test_with_complex_objects():
"""Test with complex objects like lists and dicts as values"""
- key = 'a list'
+ key = "a list"
value = [1, 2, 3]
values = [key, value]
expected = {key: value}
assert make_dict(values) == expected
+
def test_simple_byte_buffer_to_floats():
"""Test conversion of a simple byte buffer into floats"""
buffer = np.array([1.0, 2.0, 3.0], dtype=np.float32).tobytes()
expected = [1.0, 2.0, 3.0]
assert buffer_to_array(buffer, dtype=np.float32) == expected
+
def test_different_data_types():
"""Test conversion with different data types"""
# Integer test
@@ -52,54 +65,63 @@ def test_different_data_types():
expected = [1.0, 2.0, 3.0]
assert buffer_to_array(buffer, dtype=np.float64) == expected
+
def test_empty_byte_buffer():
"""Test conversion of an empty byte buffer"""
- buffer = b''
+ buffer = b""
expected = []
assert buffer_to_array(buffer, dtype=np.float32) == expected
+
def test_plain_bytes_to_string():
"""Test conversion of plain bytes to string"""
- data = b'hello world'
- expected = 'hello world'
+ data = b"hello world"
+ expected = "hello world"
assert convert_bytes(data) == expected
+
def test_bytes_in_dict():
"""Test conversion of bytes in a dictionary, including nested dictionaries"""
- data = {'key': b'value', 'nested': {'nkey': b'nvalue'}}
- expected = {'key': 'value', 'nested': {'nkey': 'nvalue'}}
+ data = {"key": b"value", "nested": {"nkey": b"nvalue"}}
+ expected = {"key": "value", "nested": {"nkey": "nvalue"}}
assert convert_bytes(data) == expected
+
def test_bytes_in_list():
"""Test conversion of bytes in a list, including nested lists"""
- data = [b'item1', b'item2', ['nested', b'nested item']]
- expected = ['item1', 'item2', ['nested', 'nested item']]
+ data = [b"item1", b"item2", ["nested", b"nested item"]]
+ expected = ["item1", "item2", ["nested", "nested item"]]
assert convert_bytes(data) == expected
+
def test_bytes_in_tuple():
"""Test conversion of bytes in a tuple, including nested tuples"""
- data = (b'item1', b'item2', ('nested', b'nested item'))
- expected = ('item1', 'item2', ('nested', 'nested item'))
+ data = (b"item1", b"item2", ("nested", b"nested item"))
+ expected = ("item1", "item2", ("nested", "nested item"))
assert convert_bytes(data) == expected
+
def test_non_bytes_data():
"""Test handling of non-bytes data types"""
- data = 'already a string'
- expected = 'already a string'
+ data = "already a string"
+ expected = "already a string"
assert convert_bytes(data) == expected
+
def test_bytes_with_invalid_utf8():
"""Test handling bytes that cannot be decoded with UTF-8"""
- data = b'\xff\xff' # Invalid in UTF-8
+ data = b"\xff\xff" # Invalid in UTF-8
expected = data
assert convert_bytes(data) == expected
+
def test_simple_list_to_bytes_default_dtype():
"""Test conversion of a simple list of floats to bytes using the default dtype"""
array = [1.0, 2.0, 3.0]
expected = np.array(array, dtype=np.float32).tobytes()
assert array_to_buffer(array) == expected
+
def test_list_to_bytes_non_default_dtype():
"""Test conversion with a non-default dtype"""
array = [1.0, 2.0, 3.0]
@@ -107,12 +129,14 @@ def test_list_to_bytes_non_default_dtype():
expected = np.array(array, dtype=dtype).tobytes()
assert array_to_buffer(array, dtype=dtype) == expected
+
def test_empty_list_to_bytes():
"""Test conversion of an empty list"""
array = []
expected = np.array(array, dtype=np.float32).tobytes()
assert array_to_buffer(array) == expected
+
@pytest.mark.parametrize("dtype", [np.int32, np.float64])
def test_conversion_with_various_dtypes(dtype):
"""Test conversion of a list of floats to bytes with various dtypes"""
@@ -120,8 +144,9 @@ def test_conversion_with_various_dtypes(dtype):
expected = np.array(array, dtype=dtype).tobytes()
assert array_to_buffer(array, dtype=dtype) == expected
+
def test_conversion_with_invalid_floats():
"""Test conversion with invalid float values (numpy should handle them)"""
- array = [float('inf'), float('-inf'), float('nan')]
+ array = [float("inf"), float("-inf"), float("nan")]
result = array_to_buffer(array)
assert len(result) > 0 # Simple check to ensure it returns anything