Redis metadata filtering and specification, index customization #8612

Spartee · 2023-08-02T04:58:55Z

Description

The previous Redis implementation did not allow for the user to specify the index configuration (i.e. changing the underlying algorithm) or add additional metadata to use for querying (i.e. hybrid or "filtered" search).

This PR introduces the ability to specify custom index attributes and metadata attributes as well as use that metadata in filtered queries. Overall, more structure was introduced to the Redis implementation that should allow for easier maintainability moving forward.

Example data

Suppose we have the following sample data

metadata = [
    {
        "user": "john",
        "age": 18,
        "job": "engineer",
        "credit_score": "high",
    },
    {
        "user": "derrick",
        "age": 14,
        "job": "doctor",
        "credit_score": "low",
    },
    {
        "user": "nancy",
        "age": 94,
        "job": "doctor",
        "credit_score": "high",
    },
    {
        "user": "tyler",
        "age": 100,
        "job": "engineer",
        "credit_score": "high",
    },
    {
        "user": "tim",
        "age": 12,
        "job": "dermatologist",
        "credit_score": "high",
    },
    {
        "user": "taimur",
        "age": 15,
        "job": "CEO",
        "credit_score": "low",
    },
    {
        "user": "joe",
        "age": 35,
        "job": "dentist",
        "credit_score": "medium",
    },
]

texts = ["foo", "foo", "foo", "foo", "bar", "bar", "bar"]

New Features

The following features are now available with the Redis integration into Langchain

Index schema generation

The schema for the index will now be automatically generated if not specified by the user. For example, the data above has the multiple metadata categories. The the following example

from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores.redis import Redis

embeddings = OpenAIEmbeddings()


rds, keys = Redis.from_texts_return_keys(
    texts,
    embeddings,
    metadatas=metadata,
    redis_url="redis://localhost:6379",
    index_name="users"
)

Loading the data in through this and the other from_documents and from_texts methods will now generate index schema in Redis like the following.

view index schema with the redisvl tool. link

$ rvl index info -i users

Index Information:

Index Name	Storage Type	Prefixes	Index Options	Indexing
users	HASH	['doc:users']	[]	0
Index Fields:
Name	Attribute	Type	Field Option	Option Value
----------------	----------------	---------	----------------	----------------
user	user	TEXT	WEIGHT	1
job	job	TEXT	WEIGHT	1
credit_score	credit_score	TEXT	WEIGHT	1
content	content	TEXT	WEIGHT	1
age	age	NUMERIC
content_vector	content_vector	VECTOR

Custom Metadata specification

The metadata schema generation has the following rules

All text fields are indexed as text fields.
All numeric fields are index as numeric fields.

If you would like to have a text field as a tag field, users can specify overrides like the following for the example data

# this can also be a path to a yaml file
index_schema = {
    "text": [{"name": "user"}, {"name": "job"}],
    "tag": [{"name": "credit_score"}],
    "numeric": [{"name": "age"}],
}

rds, keys = Redis.from_texts_return_keys(
    texts,
    embeddings,
    metadatas=metadata,
    redis_url="redis://localhost:6379",
    index_name="users"
)

This will change the index specification to

Index Information:

Index Name	Storage Type	Prefixes	Index Options	Indexing
users2	HASH	['doc:users2']	[]	0
Index Fields:
Name	Attribute	Type	Field Option	Option Value
----------------	----------------	---------	----------------	----------------
user	user	TEXT	WEIGHT	1
job	job	TEXT	WEIGHT	1
content	content	TEXT	WEIGHT	1
credit_score	credit_score	TAG	SEPARATOR	,
age	age	NUMERIC
content_vector	content_vector	VECTOR

and throw a warning to the user (log output) that the generated schema does not match the specified schema.

index_schema does not match generated schema from metadata.
index_schema: {'text': [{'name': 'user'}, {'name': 'job'}], 'tag': [{'name': 'credit_score'}], 'numeric': [{'name': 'age'}]}
generated_schema: {'text': [{'name': 'user'}, {'name': 'job'}, {'name': 'credit_score'}], 'numeric': [{'name': 'age'}]}

As long as this is on purpose, this is fine.

The schema can be defined as a yaml file or a dictionary

text:
  - name: user
  - name: job
tag:
  - name: credit_score
numeric:
  - name: age

and you pass in a path like

rds, keys = Redis.from_texts_return_keys(
    texts,
    embeddings,
    metadatas=metadata,
    redis_url="redis://localhost:6379",
    index_name="users3",
    index_schema=Path("sample1.yml").resolve()
)

Which will create the same schema as defined in the dictionary example

Index Information:

Index Name	Storage Type	Prefixes	Index Options	Indexing
users3	HASH	['doc:users3']	[]	0
Index Fields:
Name	Attribute	Type	Field Option	Option Value
----------------	----------------	---------	----------------	----------------
user	user	TEXT	WEIGHT	1
job	job	TEXT	WEIGHT	1
content	content	TEXT	WEIGHT	1
credit_score	credit_score	TAG	SEPARATOR	,
age	age	NUMERIC
content_vector	content_vector	VECTOR

Custom Vector Indexing Schema

Users with large use cases may want to change how they formulate the vector index created by Langchain

To utilize all the features of Redis for vector database use cases like this, you can now do the following to pass in index attribute modifiers like changing the indexing algorithm to HNSW.

vector_schema = {
    "algorithm": "HNSW"
}

rds, keys = Redis.from_texts_return_keys(
    texts,
    embeddings,
    metadatas=metadata,
    redis_url="redis://localhost:6379",
    index_name="users3",
    vector_schema=vector_schema
)

A more complex example may look like

vector_schema = {
    "algorithm": "HNSW",
    "ef_construction": 200,
    "ef_runtime": 20
}

rds, keys = Redis.from_texts_return_keys(
    texts,
    embeddings,
    metadatas=metadata,
    redis_url="redis://localhost:6379",
    index_name="users3",
    vector_schema=vector_schema
)

All names correspond to the arguments you would set if using Redis-py or RedisVL. (put in doc link later)

Better Querying

Both vector queries and Range (limit) queries are now available and metadata is returned by default. The outputs are shown.

>>> query = "foo"
>>> results = rds.similarity_search(query, k=1)
>>> print(results)
[Document(page_content='foo', metadata={'user': 'derrick', 'job': 'doctor', 'credit_score': 'low', 'age': '14', 'id': 'doc:users:657a47d7db8b447e88598b83da879b9d', 'score': '7.15255737305e-07'})]

>>> results = rds.similarity_search_with_score(query, k=1, return_metadata=False)
>>> print(results) # no metadata, but with scores
[(Document(page_content='foo', metadata={}), 7.15255737305e-07)]

>>> results = rds.similarity_search_limit_score(query, k=6, score_threshold=0.0001)
>>> print(len(results)) # range query (only above threshold even if k is higher)
4

Custom metadata filtering

A big advantage of Redis in this space is being able to do filtering on data stored alongside the vector itself. With the example above, the following is now possible in langchain. The equivalence operators are overridden to describe a new expression language that mimic that of redisvl. This allows for arbitrarily long sequences of filters that resemble SQL commands that can be used directly with vector queries and range queries.

There are two interfaces by which to do so and both are shown.

>>> from langchain.vectorstores.redis import RedisFilter, RedisNum, RedisText

>>> age_filter = RedisFilter.num("age") > 18
>>> age_filter = RedisNum("age") > 18 # equivalent
>>> results = rds.similarity_search(query, filter=age_filter)
>>> print(len(results))
3

>>> job_filter = RedisFilter.text("job") == "engineer" 
>>> job_filter = RedisText("job") == "engineer" # equivalent
>>> results = rds.similarity_search(query, filter=job_filter)
>>> print(len(results))
2

# fuzzy match text search
>>> job_filter = RedisFilter.text("job") % "eng*"
>>> results = rds.similarity_search(query, filter=job_filter)
>>> print(len(results))
2


# combined filters (AND)
>>> combined = age_filter & job_filter
>>> results = rds.similarity_search(query, filter=combined)
>>> print(len(results))
1

# combined filters (OR)
>>> combined = age_filter | job_filter
>>> results = rds.similarity_search(query, filter=combined)
>>> print(len(results))
4

All the above filter results can be checked against the data above.

TODO

more tests
docstrings
docs

Other

Issue: [Feature] Redis Vectorestore - similarity_search filter by metadata #3967
Dependencies: No added dependencies
Tag maintainer: @hwchase17 @baskaryan @rlancemartin
Twitter handle: @sampartee

vercel · 2023-08-02T04:59:00Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment

Name	Status	Preview	Comments	Updated (UTC)
langchain	⬜️ Ignored (Inspect)	Visit Preview		Aug 26, 2023 0:19am

tylerhutcherson

nice -- left a few comments and questions

libs/langchain/langchain/utilities/redis.py

libs/langchain/langchain/vectorstores/redis/__init__.py

libs/langchain/langchain/vectorstores/redis/redis.py

libs/langchain/langchain/utilities/redis.py

libs/langchain/langchain/vectorstores/redis/__init__.py

libs/langchain/langchain/vectorstores/redis/filters.py

libs/langchain/langchain/vectorstores/redis/redis.py

baskaryan

would be good to have someone from redis side review the redis-specific logic!

libs/langchain/langchain/utilities/redis.py

libs/langchain/langchain/vectorstores/redis/redis.py

libs/langchain/langchain/vectorstores/redis/schema.py

libs/langchain/langchain/vectorstores/redis/redis.py

libs/langchain/langchain/vectorstores/redis/schema.py

libs/langchain/langchain/vectorstores/redis/redis.py

libs/langchain/langchain/vectorstores/redis/base.py

baskaryan · 2023-08-22T20:54:50Z

libs/langchain/langchain/vectorstores/redis/base.py

+        "distance_metric": "COSINE",
+        "datatype": "FLOAT32",
+    }
+
    def __init__(
        self,
        redis_url: str,
        index_name: str,
        embedding_function: Callable,


since we're already making breaking changes, i'd suggest we update this to be

Suggested change

embedding_function: Callable,

embedding: Embeddings,

which is interface all newer VectorStores have

libs/langchain/langchain/vectorstores/redis/base.py

Add the ability to clean the metadata before it goes into redis enabling document_loaders that return lists of strings to create categorical values for Tags in Redis indices. Also, added docstrings and updated the jupyter notebook

baskaryan · 2023-08-24T15:05:32Z

fixed some of the lint issues here #9705 if you want to merge that into this pr

tylerhutcherson · 2023-08-24T16:56:53Z

libs/langchain/langchain/vectorstores/redis/base.py

+        base_query = f"({query_prefix})=>[KNN {k} @{vector_key} $vector AS score]"
+
+        query = (
+            Query(base_query).return_fields(*return_fields).sort_by("score").dialect(2)


still need to add .paging(0, k) here

Suggested change

Query(base_query).return_fields(*return_fields).sort_by("score").dialect(2)

Query(base_query)

.return_fields(*return_fields)

.sort_by("score")

.paging(0, k)

.dialect(2)

alonre24

Added a few RediSearch syntax-related comments

alonre24 · 2023-08-24T21:23:24Z

libs/langchain/langchain/vectorstores/redis/base.py

+                from langchain.vectorstores import Redis
+                from langchain.embeddings import OpenAIEmbeddings
+                embeddings = OpenAIEmbeddings()
+                redisearch, keys = RediSearch.from_texts_return_keys(


Isn't it Redis.from_texts_return_keys... ?

alonre24 · 2023-08-25T13:14:00Z

libs/langchain/langchain/vectorstores/redis/filters.py

+    OPERATOR_MAP = {
+        RedisFilterOperator.EQ: '@%s:"%s"',
+        RedisFilterOperator.NE: '(-@%s:"%s")',
+        RedisFilterOperator.LIKE: "@%s:%s",


I'm not sure that LIKE represents what you meant it would be. The difference between @%s:"%s" and @%s:%s is that the first one allows you to find exact matches for phrases, while the last one also looks for exact matches but for each token separately. Also, using @%s:%s (without quotes) will support stemming by default.
To get a behavior that is more similar to SQL's LIKE operator, we can use prefix, infix or suffix matching (see query docs).

alonre24 · 2023-08-25T13:18:02Z

libs/langchain/langchain/vectorstores/redis/base.py

+            Query(query_string)
+            .return_fields(*return_fields)
+            .sort_by("distance")
+            .paging(0, k)


One of the advantages of range queries is that you can get all the results that are within a given range (not just the top k ones), and in particular you can see how many results are within the range. To utilize this feature I would suggest having k as an optional argument (and perhaps using an upper bound such as 1000 for paging), if that makes sense to you.

agreed -- was going to suggest using K as an optional upper bound. @alonre24 isn't paging a default param 0-10 though?

Maybe something like:

query = ( Query(query_string) .return_fields(*return_fields) .sort_by("distance") .dialect(2) ) if k: query = query.paging(0, k)

Yes, the default paging is 10, that's why I suggest also choosing an upper bound, so we have:

query = query.paging(0, k if k else UPPER_BOUND)

It's part of the higher level interface abstraction so it would have to be set. I've started this conversation the lc folks though. Most likely the best route is another method that exposes this feature better.

alonre24 · 2023-08-25T13:19:04Z

libs/langchain/langchain/vectorstores/redis/base.py

+
+        # if it's a list of strings, we assume it's a tag
+        if isinstance(value, (list, tuple)):
+            if not value or isinstance(value[0], str):


why if not value is ok?

same thing as saying len(value) == 0.

>>> x = [] >>> if not x: ... print("it's like an emptiness check") ... it's like an emptiness check

looks cleaner than len check or try/except

alonre24 · 2023-08-25T13:19:27Z

libs/langchain/langchain/vectorstores/redis/base.py

+        # if it's a list/tuple of strings, we join it
+        elif isinstance(value, (list, tuple)):
+            if not value or isinstance(value[0], str):
+                clean_meta[key] = ",".join(value)


Note that if there are tag values within the list that contain a comma, they will be split when indexed in RediSearch. Consider validating that there are no such values, or allowing the user to specify a different separator (API allows it).

So currently, they could specify it in the schema and clean up the data themselves beforehand, but for the automatically generated metadata (how most will use it because they just use the data loaders), it's defaulting to , right now. Been thinking I should have a default here anyway. Good call out.

alonre24 · 2023-08-25T13:20:49Z

libs/langchain/langchain/vectorstores/redis/filters.py

+
+        >>> from langchain.vectorstores.redis import RedisTag, RedisNum
+        >>> brand_is_nike = RedisTag("brand") == "nike"
+        >>> price_is_over_100 = RedisNum("price") < 100


Suggested change

>>> price_is_over_100 = RedisNum("price") < 100

>>> price_is_under_100 = RedisNum("price") < 100

…edis-refactor

@hwchase17

…chain-ai#8612) ### Description The previous Redis implementation did not allow for the user to specify the index configuration (i.e. changing the underlying algorithm) or add additional metadata to use for querying (i.e. hybrid or "filtered" search). This PR introduces the ability to specify custom index attributes and metadata attributes as well as use that metadata in filtered queries. Overall, more structure was introduced to the Redis implementation that should allow for easier maintainability moving forward. # New Features The following features are now available with the Redis integration into Langchain ## Index schema generation The schema for the index will now be automatically generated if not specified by the user. For example, the data above has the multiple metadata categories. The the following example ```python from langchain.embeddings import OpenAIEmbeddings from langchain.vectorstores.redis import Redis embeddings = OpenAIEmbeddings() rds, keys = Redis.from_texts_return_keys( texts, embeddings, metadatas=metadata, redis_url="redis://localhost:6379", index_name="users" ) ``` Loading the data in through this and the other ``from_documents`` and ``from_texts`` methods will now generate index schema in Redis like the following. view index schema with the ``redisvl`` tool. [link](redisvl.com) ```bash $ rvl index info -i users ``` Index Information: | Index Name | Storage Type | Prefixes | Index Options | Indexing | |--------------|----------------|---------------|-----------------|------------| | users | HASH | ['doc:users'] | [] | 0 | Index Fields: | Name | Attribute | Type | Field Option | Option Value | |----------------|----------------|---------|----------------|----------------| | user | user | TEXT | WEIGHT | 1 | | job | job | TEXT | WEIGHT | 1 | | credit_score | credit_score | TEXT | WEIGHT | 1 | | content | content | TEXT | WEIGHT | 1 | | age | age | NUMERIC | | | | content_vector | content_vector | VECTOR | | | ### Custom Metadata specification The metadata schema generation has the following rules 1. All text fields are indexed as text fields. 2. All numeric fields are index as numeric fields. If you would like to have a text field as a tag field, users can specify overrides like the following for the example data ```python # this can also be a path to a yaml file index_schema = { "text": [{"name": "user"}, {"name": "job"}], "tag": [{"name": "credit_score"}], "numeric": [{"name": "age"}], } rds, keys = Redis.from_texts_return_keys( texts, embeddings, metadatas=metadata, redis_url="redis://localhost:6379", index_name="users" ) ``` This will change the index specification to Index Information: | Index Name | Storage Type | Prefixes | Index Options | Indexing | |--------------|----------------|----------------|-----------------|------------| | users2 | HASH | ['doc:users2'] | [] | 0 | Index Fields: | Name | Attribute | Type | Field Option | Option Value | |----------------|----------------|---------|----------------|----------------| | user | user | TEXT | WEIGHT | 1 | | job | job | TEXT | WEIGHT | 1 | | content | content | TEXT | WEIGHT | 1 | | credit_score | credit_score | TAG | SEPARATOR | , | | age | age | NUMERIC | | | | content_vector | content_vector | VECTOR | | | and throw a warning to the user (log output) that the generated schema does not match the specified schema. ```text index_schema does not match generated schema from metadata. index_schema: {'text': [{'name': 'user'}, {'name': 'job'}], 'tag': [{'name': 'credit_score'}], 'numeric': [{'name': 'age'}]} generated_schema: {'text': [{'name': 'user'}, {'name': 'job'}, {'name': 'credit_score'}], 'numeric': [{'name': 'age'}]} ``` As long as this is on purpose, this is fine. The schema can be defined as a yaml file or a dictionary ```yaml text: - name: user - name: job tag: - name: credit_score numeric: - name: age ``` and you pass in a path like ```python rds, keys = Redis.from_texts_return_keys( texts, embeddings, metadatas=metadata, redis_url="redis://localhost:6379", index_name="users3", index_schema=Path("sample1.yml").resolve() ) ``` Which will create the same schema as defined in the dictionary example Index Information: | Index Name | Storage Type | Prefixes | Index Options | Indexing | |--------------|----------------|----------------|-----------------|------------| | users3 | HASH | ['doc:users3'] | [] | 0 | Index Fields: | Name | Attribute | Type | Field Option | Option Value | |----------------|----------------|---------|----------------|----------------| | user | user | TEXT | WEIGHT | 1 | | job | job | TEXT | WEIGHT | 1 | | content | content | TEXT | WEIGHT | 1 | | credit_score | credit_score | TAG | SEPARATOR | , | | age | age | NUMERIC | | | | content_vector | content_vector | VECTOR | | | ### Custom Vector Indexing Schema Users with large use cases may want to change how they formulate the vector index created by Langchain To utilize all the features of Redis for vector database use cases like this, you can now do the following to pass in index attribute modifiers like changing the indexing algorithm to HNSW. ```python vector_schema = { "algorithm": "HNSW" } rds, keys = Redis.from_texts_return_keys( texts, embeddings, metadatas=metadata, redis_url="redis://localhost:6379", index_name="users3", vector_schema=vector_schema ) ``` A more complex example may look like ```python vector_schema = { "algorithm": "HNSW", "ef_construction": 200, "ef_runtime": 20 } rds, keys = Redis.from_texts_return_keys( texts, embeddings, metadatas=metadata, redis_url="redis://localhost:6379", index_name="users3", vector_schema=vector_schema ) ``` All names correspond to the arguments you would set if using Redis-py or RedisVL. (put in doc link later) ### Better Querying Both vector queries and Range (limit) queries are now available and metadata is returned by default. The outputs are shown. ```python >>> query = "foo" >>> results = rds.similarity_search(query, k=1) >>> print(results) [Document(page_content='foo', metadata={'user': 'derrick', 'job': 'doctor', 'credit_score': 'low', 'age': '14', 'id': 'doc:users:657a47d7db8b447e88598b83da879b9d', 'score': '7.15255737305e-07'})] >>> results = rds.similarity_search_with_score(query, k=1, return_metadata=False) >>> print(results) # no metadata, but with scores [(Document(page_content='foo', metadata={}), 7.15255737305e-07)] >>> results = rds.similarity_search_limit_score(query, k=6, score_threshold=0.0001) >>> print(len(results)) # range query (only above threshold even if k is higher) 4 ``` ### Custom metadata filtering A big advantage of Redis in this space is being able to do filtering on data stored alongside the vector itself. With the example above, the following is now possible in langchain. The equivalence operators are overridden to describe a new expression language that mimic that of [redisvl](redisvl.com). This allows for arbitrarily long sequences of filters that resemble SQL commands that can be used directly with vector queries and range queries. There are two interfaces by which to do so and both are shown. ```python >>> from langchain.vectorstores.redis import RedisFilter, RedisNum, RedisText >>> age_filter = RedisFilter.num("age") > 18 >>> age_filter = RedisNum("age") > 18 # equivalent >>> results = rds.similarity_search(query, filter=age_filter) >>> print(len(results)) 3 >>> job_filter = RedisFilter.text("job") == "engineer" >>> job_filter = RedisText("job") == "engineer" # equivalent >>> results = rds.similarity_search(query, filter=job_filter) >>> print(len(results)) 2 # fuzzy match text search >>> job_filter = RedisFilter.text("job") % "eng*" >>> results = rds.similarity_search(query, filter=job_filter) >>> print(len(results)) 2 # combined filters (AND) >>> combined = age_filter & job_filter >>> results = rds.similarity_search(query, filter=combined) >>> print(len(results)) 1 # combined filters (OR) >>> combined = age_filter | job_filter >>> results = rds.similarity_search(query, filter=combined) >>> print(len(results)) 4 ``` All the above filter results can be checked against the data above. ### Other - Issue: langchain-ai#3967 - Dependencies: No added dependencies - Tag maintainer: @hwchase17 @baskaryan @rlancemartin - Twitter handle: @sampartee --------- Co-authored-by: Naresh Rangan <naresh.rangan0@walmart.com> Co-authored-by: Bagatur <baskaryan@gmail.com>

meiravgri

I know it was already merged but I had a few comments :)

meiravgri · 2023-08-26T07:13:18Z

libs/langchain/langchain/vectorstores/redis/base.py

+        "dims": 1536,
+        "distance_metric": "COSINE",
+        "datatype": "FLOAT32",


I suggest to update the vector field attributes according to the embedding model
i.e
self._schema = self._get_schema_with_defaults(index_schema, vector_schema, embedding)

meiravgri · 2023-08-26T07:27:08Z

libs/langchain/langchain/vectorstores/redis/base.py

+            index_schema (Optional[Union[Dict[str, str], str, os.PathLike]], optional):
+                Optional fields to index within the metadata. Overrides generated
+                schema. Defaults to None.
+            vector_schema (Optional[Dict[str, Union[str, int]]], optional): Optional
+                vector schema to use. Defaults to None.


I wouldn't say that it defaults to None. I think that the default schema used if index_schema or/and vector_schema are not passed should be documented.

meiravgri · 2023-08-26T07:35:59Z

libs/langchain/langchain/vectorstores/redis/base.py

+                    )
+            else:
+                # use the generated schema
+                index_schema = generated_schema


I wonder what was the idea behind indexing all the metadata fields? They are anyway stored and loaded from redis key space during the query.
Was it to allow the hybrid search?

meiravgri · 2023-08-26T09:11:06Z

libs/langchain/langchain/vectorstores/redis/schema.py

+
+    # filled by default_vector_schema
+    vector: Optional[List[Union[FlatVectorField, HNSWVectorField]]] = None
+    content_key: str = "content"


If I understand correctly, having a field called "content" is mandatory.
Fix me if I'm wrong but the only location I found it might be a problem to give this field a user-defined name is in similarity_search_with_score where we return result.content
(in similarity_search() it is handled better IMO by using getattr(result, content_key) instead)
Another solution is to define the return fields as
Query().return_field(self.content_field, as_field="content")

Also, where is the field name enforced in from_existing_index()?

meiravgri · 2023-08-26T09:22:20Z

libs/langchain/tests/integration_tests/vectorstores/test_redis.py

    assert output == TEST_RESULT
    assert drop(docsearch.index_name)


 def test_redis_from_existing(texts: List[str]) -> None:
    """Test adding a new document"""
-    Redis.from_texts(
+    docsearch = Redis.from_texts(


I'd add tests that create or connect to an existing index that doesn't include the default fields names

dosubot bot added Ɑ: vector store Related to vector store module 🤖:improvement Medium size change to existing code to handle new use-cases labels Aug 2, 2023

tylerhutcherson reviewed Aug 2, 2023

View reviewed changes

baskaryan reviewed Aug 3, 2023

View reviewed changes

Spartee marked this pull request as ready for review August 7, 2023 02:18

baskaryan reviewed Aug 8, 2023

View reviewed changes

baskaryan reviewed Aug 22, 2023

View reviewed changes

Naresh Rangan and others added 18 commits August 22, 2023 21:17

Redis Meta data Filter changes

194515f

Add support to Numeric and Tags

32ba2c8

Remove Print statement

45935c0

Add score_threshold filter to the query

7e12c63

Enable maleable index and query formation

8b5b2ff

Restructure

5e3937b

fix custom vector schema

508c0e9

Re-formatting

90da194

mypy and address review

7839f9f

formatting

eaba95c

Refactor redis vector store

07ba934

Add tests for new capability

07a2410

remove prints

dce67b1

interim commit

8adaa78

interim commit

0fbff02

Add metadata cleaning, docstrings, docs

215d944

Add the ability to clean the metadata before it goes into redis enabling document_loaders that return lists of strings to create categorical values for Tags in Redis indices. Also, added docstrings and updated the jupyter notebook

Merge branch 'master' into redis-refactor

8acc988

Fix some linting errors

48b7df5

Spartee force-pushed the redis-refactor branch from 0341cc4 to 48b7df5 Compare August 24, 2023 10:07

lint

b8060a6

tylerhutcherson reviewed Aug 24, 2023

View reviewed changes

Address from_existing issues

6bbbe82

vercel bot deployed to Preview – langchain August 25, 2023 06:31 View deployment

Sam Partee added 2 commits August 25, 2023 02:24

minor context_key bug fix

23a3705

Fix semantic cache

a992889

alonre24 reviewed Aug 25, 2023

View reviewed changes

Sam Partee added 2 commits August 25, 2023 15:24

Address Embeddings interface and retriever tests

e0dcf56

Merge branch 'fix-semantic-cache' into redis-refactor

03d9a23

vercel bot deployed to Preview – langchain August 25, 2023 22:34 View deployment

Sam Partee added 3 commits August 25, 2023 15:44

interim commit for cache fix

dee06ff

interim commit for cache fix (dirty)

6864df2

Cache tests passing with real embeddings

1ce1dda

vercel bot deployed to Preview – langchain August 26, 2023 00:11 View deployment

baskaryan and others added 3 commits August 25, 2023 17:13

lint

93ec5a6

Merge remote-tracking branch 'upstream/bagatur/redis_refactor' into r…

9c20f22

…edis-refactor

linting fixes

4588b0a

baskaryan merged commit a28eea5 into langchain-ai:master Aug 26, 2023
26 checks passed

meiravgri reviewed Aug 26, 2023

View reviewed changes

ramosmario mentioned this pull request Sep 4, 2023

Issue: RedisVectorStoreRetriever not accessible #10186

Closed

baskaryan mentioned this pull request Sep 5, 2023

Redis Meta data Filter changes #8464

Closed

This was referenced Sep 12, 2023

[Feature] Redis Vectorestore - similarity_search filter by metadata #3967

Closed

[Question]: How to add OR metadata filter for RedisVectorStore retriever? run-llama/llama_index#7535

Closed

efriis mentioned this pull request Nov 7, 2023

Allow ids instead of keys for Redis vectorstore #6443

Closed

tylerhutcherson deleted the redis-refactor branch November 13, 2023 17:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Redis metadata filtering and specification, index customization #8612

Redis metadata filtering and specification, index customization #8612

Spartee commented Aug 2, 2023 •

edited

Loading

vercel bot commented Aug 2, 2023 •

edited

Loading

tylerhutcherson left a comment

baskaryan left a comment

baskaryan Aug 22, 2023

baskaryan commented Aug 24, 2023

tylerhutcherson Aug 24, 2023

alonre24 left a comment

alonre24 Aug 24, 2023

alonre24 Aug 25, 2023

alonre24 Aug 25, 2023

tylerhutcherson Aug 25, 2023

alonre24 Aug 25, 2023

Spartee Aug 26, 2023

alonre24 Aug 25, 2023

Spartee Aug 25, 2023

alonre24 Aug 25, 2023

Spartee Aug 25, 2023

alonre24 Aug 25, 2023

meiravgri left a comment

meiravgri Aug 26, 2023

meiravgri Aug 26, 2023

meiravgri Aug 26, 2023

meiravgri Aug 26, 2023

meiravgri Aug 26, 2023

	>>> price_is_over_100 = RedisNum("price") < 100
	>>> price_is_under_100 = RedisNum("price") < 100

Redis metadata filtering and specification, index customization #8612

Redis metadata filtering and specification, index customization #8612

Conversation

Spartee commented Aug 2, 2023 • edited Loading

Description

Example data

New Features

Index schema generation

Custom Metadata specification

Custom Vector Indexing Schema

Better Querying

Custom metadata filtering

TODO

Other

vercel bot commented Aug 2, 2023 • edited Loading

tylerhutcherson left a comment

Choose a reason for hiding this comment

baskaryan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

baskaryan commented Aug 24, 2023

Choose a reason for hiding this comment

alonre24 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

meiravgri left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Spartee commented Aug 2, 2023 •

edited

Loading

vercel bot commented Aug 2, 2023 •

edited

Loading