-
Notifications
You must be signed in to change notification settings - Fork 62
Update query interface #22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## main #22 +/- ##
==========================================
+ Coverage 86.76% 88.05% +1.29%
==========================================
Files 11 11
Lines 461 536 +75
==========================================
+ Hits 400 472 +72
- Misses 61 64 +3
|
tylerhutcherson
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wooo!
| def hash_input(self, prompt: str): | ||
| """Hashes the input using SHA256.""" | ||
| return hashlib.sha256(prompt.encode("utf-8")).hexdigest() | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't want to kep the decorator option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll explain, needs more work on the edge cases. I found too many ways to break it for me to be comfortable for a release with it.
| "prompt_vector", | ||
| "FLAT", | ||
| {"DIM": 768, "TYPE": "FLOAT32", "DISTANCE_METRIC": "COSINE"}, | ||
| ), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right now it looks like the default fields is "fixed" but you can alter the embeddings provider. would this cause a potential clash with the schema (embedding dimensions etc)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a TODO in there about this. Struggling with the intersection between hand-holding and custom capability.
| if not vector: | ||
| vector = self._provider.embed(prompt) # type: ignore | ||
|
|
||
| v = VectorQuery( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest we use a range query for this use case - less client side processing. If you agree -- we will want to flip the polarity of the threshold such that smaller is more "strict"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
premature Optimization is the root of all evil. Next release.
| self._refresh_ttl(doc.id) | ||
| sim = similarity(doc.vector_score) | ||
| sim = similarity(doc.vector_distance) | ||
| if sim > self.threshold: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the part we could do without if we use range query
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could also be added later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see previous comment
| query.sort_by("vector_score") | ||
| return query | ||
|
|
||
| class Filter: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might suggest keeping a separate filter module
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thought about this, but then everyone has to do
from redisvl.filter import TagFilter
from redisvl.query import VectorQueryinstead of
from redisvl.query import TagFilter, VectorQueryIt also made type hints easier... but I hear you.. what do you think now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, the first looks cleaner..I'll do this
| n = NumericFilter("age", 18, 100) | ||
| t ^= n | ||
| v.set_filter(t) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only other input on tests would be to test some more tricky character scenarios where we'd have to properly escape them to make it work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call. Examples? Feel like you know more about this than I do.
Introduce new Query design with Filters and notebooks to show examples
includes