Skip to content

vector learning resources sortby/sorting support#3228

Merged
shanbady merged 11 commits into
mainfrom
shanbady/vector-search-sortby
Apr 23, 2026
Merged

vector learning resources sortby/sorting support#3228
shanbady merged 11 commits into
mainfrom
shanbady/vector-search-sortby

Conversation

@shanbady
Copy link
Copy Markdown
Contributor

@shanbady shanbady commented Apr 21, 2026

What are the relevant tickets?

Closes https://github.com/mitodl/hq/issues/10924

Description (What does it do?)

This PR adds a "sortby" parameter for the vector learning resources search endpoint so we can sort by different fields as we do with the regular opensearch endpoints.

How can this be tested?

  1. checkout this branch
  2. generate embeddings for just courses via python manage.py generate_embeddings --courses --skip-contentfiles (run this even if you have some embedded since there are new indexes)
  3. create and login as a superuser
  4. visit your local search interface
  5. in the "admin options" make sure hybrid search is enabled
  6. use the "sort" dropdown to sort. the results should be ordered appropriately

Additional Context

I made a deliberate decision to only be able to sort by actual field names of payloads we index rather than alias things like "popular"== "sort by views descending" and "upcoming" == "sort by next_start_date" etc. we leave this translation on the frontend since doing this on the backend adds unnecessary complexity.

At the end of the day, the sort by mapping to payload looks like this:
best -> This is the default order (by relevance) qdrant returns
popular -> Sort by "views" descending
upcoming -> Sort by next_start_date (ascending)
recently added -> Sort by "created_on" date

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 21, 2026

OpenAPI Changes

2 changes: 0 error, 1 warning, 1 info

View full changelog

Unexpected changes? Ensure your branch is up-to-date with main (consider rebasing).

@shanbady shanbady marked this pull request as ready for review April 21, 2026 21:11
Copilot AI review requested due to automatic review settings April 21, 2026 21:11
@shanbady shanbady added the Needs Review An open Pull Request that is ready for review label Apr 21, 2026
@shanbady shanbady changed the title Shanbady/vector search sortby vector learning resources sortby/sorting support Apr 21, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds backend + frontend support for a sortby query parameter on the Qdrant-backed vector learning-resources search endpoint, enabling sorting by selected indexed payload fields (e.g., views, created_on, next_start_date) and propagating the contract through OpenAPI and generated clients.

Changes:

  • Add sortby request validation and pass-through into Qdrant query/scroll execution for vector learning-resources search.
  • Extend Qdrant resource payload indexing/mapping to include sortable fields and add test coverage for order-by formatting/behavior.
  • Update OpenAPI + generated TS client types and map UI sort options to the vector API’s sortby values.

Reviewed changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
vector_search/views.py Implements order-by handling in Qdrant query/scroll paths and refactors search execution helpers.
vector_search/views_test.py Adds tests covering _format_order_by and sortby propagation into Qdrant calls.
vector_search/serializers.py Adds sortby to learning-resources vector search request; removes sortby from content-files request.
vector_search/constants.py Adds sortable fields to resource param map/indexes and introduces computed sort-field allowlists.
openapi/specs/v0.yaml Documents sortby for vector learning-resources search and updates related enums; removes sortby from content-files endpoint docs.
frontends/main/src/page-components/SearchDisplay/SearchDisplay.tsx Maps UI sort dropdown values to vector API sortby values.
frontends/api/src/generated/v0/api.ts Regenerates API client to include vector learning-resources sortby and remove content-files sortby.

Comment thread openapi/specs/v0.yaml
Comment thread vector_search/views.py Outdated
Comment thread vector_search/constants.py
Comment thread vector_search/serializers.py
Comment on lines 169 to 171
aggregation_choices = [
(key, key.replace("_", " ").title()) for key in QDRANT_RESOURCE_PARAM_MAP
]
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding next_start_date, views, and created_on to QDRANT_RESOURCE_PARAM_MAP implicitly makes them available as aggregations choices (and updates the OpenAPI enum). Faceting on datetime/high-cardinality fields can be expensive and often isn’t meaningful—if these fields are only intended for sorting, consider excluding them from aggregation_choices (e.g., maintain a separate aggregation allowlist) so the API surface stays tight.

Copilot uses AI. Check for mistakes.
Comment on lines +50 to +52
"next_start_date": "next_start_date",
"views": "views",
"created_on": "created_on",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Adding fields to QDRANT_RESOURCE_PARAM_MAP for sorting unintentionally exposes them as aggregation options. Aggregating on these non-KEYWORD fields will cause the entire search request to fail.
Severity: HIGH

Suggested Fix

Separate the list of fields available for sorting from those available for aggregation. Create a new constant for sortable fields that includes 'next_start_date', 'views', and 'created_on', and ensure aggregation_choices in the serializer is derived from a list containing only KEYWORD-indexed fields.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.

Location: vector_search/constants.py#L50-L52

Potential issue: Adding 'next_start_date', 'views', and 'created_on' to
`QDRANT_RESOURCE_PARAM_MAP` makes them available as choices for the `aggregations`
parameter. The `async_qdrant_aggregations` function uses Qdrant's `facet()` method,
which is only compatible with `KEYWORD`-indexed fields. Since these new fields are
indexed as `INTEGER` or `DATETIME`, requesting an aggregation on any of them will cause
Qdrant to return an error. This error is not handled gracefully and will cause the
entire vector search API request to fail with a 4xx response.

Did we get this right? 👍 / 👎 to inform future reviews.

@ChristopherChudzicki ChristopherChudzicki self-assigned this Apr 22, 2026
Copy link
Copy Markdown
Contributor

@ChristopherChudzicki ChristopherChudzicki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a deliberate decision to only be able to sort by actual field names of payloads we index rather than alias things like "popular"== "sort by views descending" and "upcoming" == "sort by next_start_date" etc. we leave this translation on the frontend since doing this on the backend adds unnecessary complexity.

Makes sense 👍

Sorting & Cutoffs: I think this is a separate issue, but if it's not tracked, we may want to track it somewhere... Consider these queries:

IIRC, our opensearch endpoint has a relevance cutoff. So if you search for "linear algebra", totally irrelevant things don't show up in the results.

When totally irrelevant things DO show up, it screws up the sorting. When I sort by "Popularity", I end with results that are popular but completely irrelevant to my search term.

Should we add a cutoff for the vector search, too? (I'm guessing we don't have one, based on results above).

One other question

Comment thread frontends/main/src/page-components/SearchDisplay/SearchDisplay.tsx Outdated
*/
const toVectorSearchParams = (
params: ReturnType<typeof getSearchParams>,
params: ReturnType<typeof getSearchParams> & { sortby?: string },
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this type change is unnecessary, it's already in the returntype of getSearchParams

}


QDRANT_CONTENT_FILES_SORTBY_FIELDS = [
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shanbady Did you mean to add this? I saw below you said ContentFileVectorSearchRequestSerializer doesn't support sorting by design.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea we want to support sorting on INTEGERFIELD, FLOATFIELD (contentfile vector endpoint doesnt index any at the moment)

@shanbady shanbady merged commit 15423c5 into main Apr 23, 2026
14 checks passed
@shanbady shanbady deleted the shanbady/vector-search-sortby branch April 23, 2026 15:43
This was referenced Apr 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Needs Review An open Pull Request that is ready for review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants