Skip to content

Conversation

@abeglova
Copy link
Contributor

@abeglova abeglova commented Sep 4, 2024

What are the relevant tickets?

closes https://github.com/mitodl/hq/issues/5351

Description (What does it do?)

Currently opensearch calculates term frequencies on a per index level which means that scores are not consistent between learning resource types and resource types with small indexes (Programs) are penalized

Setting search_type=dfs_query_then_fetch make OpenSearch make a pre-query to get search frequencies from across all the indexes used in the query, which will make programs be given more reasonable scores. However, the documentation warns that this is turned off by default because it's slower. I am adding use_dfs_query_then_fetch as a parameter for now so we can test the performance on RC before committing to the change. I have not noticed performance issues locally.

If this doesn't work we will probably need to get rid of the resource specific open search indexes and store all learning resources in one index. We might want to do that anyway - it will make queries simpler which might also make them faster. We had separate indexes in open-discussions because different resources had different data fields. Now
that learning resources are standardized there isn't really a good reason to have separate indexes by learning resource

How can this be tested?

Go to http://open.odl.local:8062/search/?q=Machine+Learning&resource_category=program
And verify that you see "Professional Certificate Program in Machine Learning & Artificial Intelligence" and "Machine Learning, Modeling, and Simulation: Engineering Problem-Solving in the Age of AI". If you don't run the backpopluate commands to populate them

Go to http://open.odl.local:8062/search/?q=Machine+Learning. You will not see programs in the first few pages. For me "Professional Certificate Program in Machine Learning & Artificial Intelligence" is on the third page and "Machine Learning, Modeling, and Simulation: Engineering Problem-Solving in the Age of AI" is not in the first 8 pages

Go to http://open.odl.local:8062/search/?q=machine+learning&use_dfs_query_then_fetch=True. Verify that you see the programs in the results. For me "Professional Certificate Program in Machine Learning & Artificial Intelligence" is on page one and "Machine Learning, Modeling, and Simulation: Engineering Problem-Solving in the Age of AI" is on page two

@abeglova abeglova force-pushed the ab/dfs_query_then_fetch branch from 4da5d95 to f92a62a Compare September 5, 2024 14:23
@abeglova abeglova marked this pull request as ready for review September 5, 2024 15:37
@mbertrand mbertrand self-assigned this Sep 5, 2024
Copy link
Member

@mbertrand mbertrand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Machine Learning, Modeling, and Simulation: Engineering Problem-Solving in the Age of AI" went from page 4 to page 3.

"Professional Certificate Program in Machine Learning" went from page 6 to 2.

👍

@abeglova abeglova force-pushed the ab/dfs_query_then_fetch branch from 0ba5660 to 74ff4e7 Compare September 5, 2024 16:55
@abeglova abeglova merged commit 875b53c into main Sep 5, 2024
This was referenced Sep 6, 2024
@rhysyngsun rhysyngsun deleted the ab/dfs_query_then_fetch branch February 7, 2025 20:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants