Skip to content

Conversation

@Spartee
Copy link
Contributor

@Spartee Spartee commented Jul 25, 2023

Introduce new Query design with Filters and notebooks to show examples

includes

  • documenation updates
  • examples
  • semantic cache and SearchIndex updates
  • tests

@Spartee Spartee added documentation Improvements or additions to documentation enhancement New feature or request labels Jul 25, 2023
@Spartee Spartee requested a review from tylerhutcherson July 25, 2023 10:00
@Spartee Spartee added the query query or filter related label Jul 25, 2023
@codecov-commenter
Copy link

codecov-commenter commented Jul 25, 2023

Codecov Report

Merging #22 (a24de64) into main (139a72e) will increase coverage by 1.29%.
The diff coverage is 92.80%.

@@            Coverage Diff             @@
##             main      #22      +/-   ##
==========================================
+ Coverage   86.76%   88.05%   +1.29%     
==========================================
  Files          11       11              
  Lines         461      536      +75     
==========================================
+ Hits          400      472      +72     
- Misses         61       64       +3     
Files Changed Coverage Δ
redisvl/llmcache/base.py 100.00% <ø> (+9.09%) ⬆️
redisvl/llmcache/semantic.py 86.95% <76.19%> (-6.89%) ⬇️
redisvl/query.py 95.29% <95.23%> (-4.71%) ⬇️
redisvl/index.py 91.05% <100.00%> (+0.62%) ⬆️
redisvl/schema.py 89.77% <100.00%> (+0.11%) ⬆️

... and 1 file with indirect coverage changes

Copy link
Collaborator

@tylerhutcherson tylerhutcherson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wooo!

def hash_input(self, prompt: str):
"""Hashes the input using SHA256."""
return hashlib.sha256(prompt.encode("utf-8")).hexdigest()

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't want to kep the decorator option?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll explain, needs more work on the edge cases. I found too many ways to break it for me to be comfortable for a release with it.

"prompt_vector",
"FLAT",
{"DIM": 768, "TYPE": "FLOAT32", "DISTANCE_METRIC": "COSINE"},
),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right now it looks like the default fields is "fixed" but you can alter the embeddings provider. would this cause a potential clash with the schema (embedding dimensions etc)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a TODO in there about this. Struggling with the intersection between hand-holding and custom capability.

if not vector:
vector = self._provider.embed(prompt) # type: ignore

v = VectorQuery(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest we use a range query for this use case - less client side processing. If you agree -- we will want to flip the polarity of the threshold such that smaller is more "strict"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

premature Optimization is the root of all evil. Next release.

self._refresh_ttl(doc.id)
sim = similarity(doc.vector_score)
sim = similarity(doc.vector_distance)
if sim > self.threshold:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the part we could do without if we use range query

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could also be added later

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see previous comment

query.sort_by("vector_score")
return query

class Filter:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might suggest keeping a separate filter module

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought about this, but then everyone has to do

from redisvl.filter import TagFilter
from redisvl.query import VectorQuery

instead of

from redisvl.query import TagFilter, VectorQuery

It also made type hints easier... but I hear you.. what do you think now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the first looks cleaner..I'll do this

n = NumericFilter("age", 18, 100)
t ^= n
v.set_filter(t)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only other input on tests would be to test some more tricky character scenarios where we'd have to properly escape them to make it work

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. Examples? Feel like you know more about this than I do.

@Spartee Spartee merged commit ff083c7 into main Jul 26, 2023
@Spartee Spartee deleted the query branch July 26, 2023 20:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation enhancement New feature or request query query or filter related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants