-
Notifications
You must be signed in to change notification settings - Fork 3
use qdrant vectors in hybrid search #2718
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
OpenAPI ChangesShow/hide No detectable change.Unexpected changes? Ensure your branch is up-to-date with |
11eb260 to
2bca7f2
Compare
shanbady
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Functionally it works great! just some minor cleanup suggestions/comments
vector_search/utils.py
Outdated
| } | ||
|
|
||
|
|
||
| def get_vector_for_learning_resource(readable_id): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be cleaner to add a "with_vectors" as an optional parameter to retrieve_points_matching_params and use that:
retrieve_points_matching_params({"readable_id": "course-234"}, with_vectors=True)[0]
learning_resources_search/api.py
Outdated
| } | ||
| encoder = dense_encoder() | ||
| query_vector = encoder.embed_query(text) | ||
| vector_query = {"knn": {"vector_embedding": {"vector": query_vector, "k": 5}}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be good to have "k" and "pagination_depth" defined in constants or in settings.py (or omitted altogether if they default to something reasonable)
learning_resources_search/api.py
Outdated
| "combination": { | ||
| "technique": "arithmetic_mean", | ||
| "parameters": {"weights": [0.6, 0.2, 0.2]}, | ||
| "parameters": {"weights": [0.8, 0.2]}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these also seem like magic numbers we may tweak often. consider moving to settings or as a constant. since all of search_pipeline is static json we could do something like:
OPENSEARCH_HYBRID_PIPELINE_CONFIGURATION = {...}
| } | ||
| } | ||
|
|
||
| if object_type == COMBINED_INDEX: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
imo the naming of these indexes are getting a bit confusing (constants.COMBINED_INDEX vs constants.ALL_INDEXES vs constants.BOTH_INDEXES) might be better to call this HYBRID_INDEX or HYBRID_COMBINED_INDEX etc
shanbady
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
0a3af13 to
3586b9e
Compare
What are the relevant tickets?
closes https://github.com/mitodl/hq/issues/9148
Description (What does it do?)
This pr updates the hybrid search index in opensearch to reuse vectors from qdrant. The hybrid search is currently behind a flag (search_mode=hybrid)
How can this be tested?
set
OPENAI_API_KEY to the value from rc
and
QDRANT_ENCODER=vector_search.encoders.litellm.LiteLLMEncoder
run
docker-compose run web ./manage.py generate_embeddings --skip-contentfilesto ensure you have resource embedding in qdrantrun
docker-compose run web ./manage.py recreate_index --combined_hybridGot to
http://open.odl.local:8062/search
search for "intro to ai course" you should have no results. The search with search mode not set should behave normally
select search_mode=hybrid from the admin options panel (login required) or just add it to the url
search for "intro to ai course" you should get results. Facets and base search should also still work