Skip to content

removing unused payload fields#3154

Merged
shanbady merged 11 commits intomainfrom
shanbady/remove-unused-qdrant-payloads
Apr 6, 2026
Merged

removing unused payload fields#3154
shanbady merged 11 commits intomainfrom
shanbady/remove-unused-qdrant-payloads

Conversation

@shanbady
Copy link
Copy Markdown
Contributor

@shanbady shanbady commented Apr 2, 2026

What are the relevant tickets?

Closes https://github.com/mitodl/hq/issues/10782

Description (What does it do?)

This PR removes unnecessary fields from the payload index in Qdrant. clients in mit-learn and learn-ai that call the contentfile vector endpoint only use a few fields to filter by. In Qdrant - each field we decide to add to the payload index introduces significant storage and memory requirements in addition to performance hits since it needs to maintain new HNSW graphs so we should be intentional about what is actually added there.

This PR also contains a small bugfix for an issue where the learning resource responses were not sorted by score

How can this be tested?

On the live performance side of things I have validated it on the RC qdrant cluster.

  1. checkout this branch
  2. restart celery
  3. run python manage.py generate_embeddings --courses
  4. check the contentfiles collection info in your local qdrant dashboard and look at the "payload" list - ensure that the fields line up with the fields configured in the constants file

Checklist:

  • Double-check that there are no calls being made to our vector search endpoint /api/v0/vector_content_files_search/ that uses any of the removed fields in this PR as a query parameter.

@shanbady shanbady added the Needs Review An open Pull Request that is ready for review label Apr 2, 2026
@shanbady shanbady marked this pull request as ready for review April 2, 2026 14:54
Copilot AI review requested due to automatic review settings April 2, 2026 14:54
Comment thread vector_search/constants.py
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to reduce Qdrant storage/memory overhead by removing unused payload indexes for the content_files embeddings collection, keeping only the fields intended for frequent filtering/faceting.

Changes:

  • Removed multiple fields from QDRANT_CONTENT_FILE_INDEXES to reduce payload index footprint.
  • Added an inline note to encourage intentional selection of indexed fields.

Comment thread vector_search/constants.py
Comment thread vector_search/constants.py
Comment thread vector_search/constants.py
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 2, 2026

OpenAPI Changes

5 changes: 0 error, 5 warning, 0 info

View full changelog

Unexpected changes? Ensure your branch is up-to-date with main (consider rebasing).

Comment thread vector_search/constants.py
Comment thread vector_search/tasks.py Outdated
Comment thread vector_search/constants.py
@abeglova abeglova self-assigned this Apr 6, 2026
Copy link
Copy Markdown
Contributor

@abeglova abeglova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@shanbady shanbady merged commit 15d7683 into main Apr 6, 2026
14 checks passed
@shanbady shanbady deleted the shanbady/remove-unused-qdrant-payloads branch April 6, 2026 19:27
@odlbot odlbot mentioned this pull request Apr 6, 2026
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Needs Review An open Pull Request that is ready for review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants