vector learning resources sortby/sorting support#3228
Conversation
OpenAPI Changes2 changes: 0 error, 1 warning, 1 info Unexpected changes? Ensure your branch is up-to-date with |
There was a problem hiding this comment.
Pull request overview
Adds backend + frontend support for a sortby query parameter on the Qdrant-backed vector learning-resources search endpoint, enabling sorting by selected indexed payload fields (e.g., views, created_on, next_start_date) and propagating the contract through OpenAPI and generated clients.
Changes:
- Add
sortbyrequest validation and pass-through into Qdrant query/scroll execution for vector learning-resources search. - Extend Qdrant resource payload indexing/mapping to include sortable fields and add test coverage for order-by formatting/behavior.
- Update OpenAPI + generated TS client types and map UI sort options to the vector API’s
sortbyvalues.
Reviewed changes
Copilot reviewed 6 out of 7 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| vector_search/views.py | Implements order-by handling in Qdrant query/scroll paths and refactors search execution helpers. |
| vector_search/views_test.py | Adds tests covering _format_order_by and sortby propagation into Qdrant calls. |
| vector_search/serializers.py | Adds sortby to learning-resources vector search request; removes sortby from content-files request. |
| vector_search/constants.py | Adds sortable fields to resource param map/indexes and introduces computed sort-field allowlists. |
| openapi/specs/v0.yaml | Documents sortby for vector learning-resources search and updates related enums; removes sortby from content-files endpoint docs. |
| frontends/main/src/page-components/SearchDisplay/SearchDisplay.tsx | Maps UI sort dropdown values to vector API sortby values. |
| frontends/api/src/generated/v0/api.ts | Regenerates API client to include vector learning-resources sortby and remove content-files sortby. |
| aggregation_choices = [ | ||
| (key, key.replace("_", " ").title()) for key in QDRANT_RESOURCE_PARAM_MAP | ||
| ] |
There was a problem hiding this comment.
Adding next_start_date, views, and created_on to QDRANT_RESOURCE_PARAM_MAP implicitly makes them available as aggregations choices (and updates the OpenAPI enum). Faceting on datetime/high-cardinality fields can be expensive and often isn’t meaningful—if these fields are only intended for sorting, consider excluding them from aggregation_choices (e.g., maintain a separate aggregation allowlist) so the API surface stays tight.
| "next_start_date": "next_start_date", | ||
| "views": "views", | ||
| "created_on": "created_on", |
There was a problem hiding this comment.
Bug: Adding fields to QDRANT_RESOURCE_PARAM_MAP for sorting unintentionally exposes them as aggregation options. Aggregating on these non-KEYWORD fields will cause the entire search request to fail.
Severity: HIGH
Suggested Fix
Separate the list of fields available for sorting from those available for aggregation. Create a new constant for sortable fields that includes 'next_start_date', 'views', and 'created_on', and ensure aggregation_choices in the serializer is derived from a list containing only KEYWORD-indexed fields.
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.
Location: vector_search/constants.py#L50-L52
Potential issue: Adding 'next_start_date', 'views', and 'created_on' to
`QDRANT_RESOURCE_PARAM_MAP` makes them available as choices for the `aggregations`
parameter. The `async_qdrant_aggregations` function uses Qdrant's `facet()` method,
which is only compatible with `KEYWORD`-indexed fields. Since these new fields are
indexed as `INTEGER` or `DATETIME`, requesting an aggregation on any of them will cause
Qdrant to return an error. This error is not handled gracefully and will cause the
entire vector search API request to fail with a 4xx response.
Did we get this right? 👍 / 👎 to inform future reviews.
ChristopherChudzicki
left a comment
There was a problem hiding this comment.
I made a deliberate decision to only be able to sort by actual field names of payloads we index rather than alias things like "popular"== "sort by views descending" and "upcoming" == "sort by next_start_date" etc. we leave this translation on the frontend since doing this on the backend adds unnecessary complexity.
Makes sense 👍
Sorting & Cutoffs: I think this is a separate issue, but if it's not tracked, we may want to track it somewhere... Consider these queries:
- http://open.odl.local:8062/search?q=linear+algebra
- 36 results for me locally
- vs http://open.odl.local:8062/search?q=linear+algebra&vector_search=true
- 3000+ results for me locally
IIRC, our opensearch endpoint has a relevance cutoff. So if you search for "linear algebra", totally irrelevant things don't show up in the results.
When totally irrelevant things DO show up, it screws up the sorting. When I sort by "Popularity", I end with results that are popular but completely irrelevant to my search term.
Should we add a cutoff for the vector search, too? (I'm guessing we don't have one, based on results above).
One other question
| */ | ||
| const toVectorSearchParams = ( | ||
| params: ReturnType<typeof getSearchParams>, | ||
| params: ReturnType<typeof getSearchParams> & { sortby?: string }, |
There was a problem hiding this comment.
this type change is unnecessary, it's already in the returntype of getSearchParams
| } | ||
|
|
||
|
|
||
| QDRANT_CONTENT_FILES_SORTBY_FIELDS = [ |
There was a problem hiding this comment.
@shanbady Did you mean to add this? I saw below you said ContentFileVectorSearchRequestSerializer doesn't support sorting by design.
There was a problem hiding this comment.
yea we want to support sorting on INTEGERFIELD, FLOATFIELD (contentfile vector endpoint doesnt index any at the moment)
What are the relevant tickets?
Closes https://github.com/mitodl/hq/issues/10924
Description (What does it do?)
This PR adds a "sortby" parameter for the vector learning resources search endpoint so we can sort by different fields as we do with the regular opensearch endpoints.
How can this be tested?
python manage.py generate_embeddings --courses --skip-contentfiles(run this even if you have some embedded since there are new indexes)Additional Context
I made a deliberate decision to only be able to sort by actual field names of payloads we index rather than alias things like "popular"== "sort by views descending" and "upcoming" == "sort by next_start_date" etc. we leave this translation on the frontend since doing this on the backend adds unnecessary complexity.
At the end of the day, the sort by mapping to payload looks like this:
best -> This is the default order (by relevance) qdrant returns
popular -> Sort by "views" descending
upcoming -> Sort by next_start_date (ascending)
recently added -> Sort by "created_on" date