-
Notifications
You must be signed in to change notification settings - Fork 3
OCW completeness penalty in search #1512
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ChristopherChudzicki
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 worked as described. Left a few suggestions for consideration.
learning_resources_search/api.py
Outdated
| if ( | ||
| yearly_decay_percent | ||
| and yearly_decay_percent > 0 | ||
| and max_incompleteness_penalty | ||
| and max_incompleteness_penalty > 0 | ||
| ): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can drop the inequalities—they're redundant.
On the whole, rather than having if A and B / elif A / elif B, could we do
source = "_score"
params = {}
if yearly_decay_percent:
source = f"{source} * {completeness_term}"
params["max_incompleteness_penalty"] = max_incompleteness_penalty
if yearly_decay_percent:
source = f"{source} * {staleness_term}"
params["decay"] = 1 - (yearly_decay_percent / 100)
params["offset"] = "0"
params["scale"] = "354d"
params["origin"] = datetime.now(tz=UTC).strftime("%Y-%m-%dT%H:%M:%S.%fZ")
script_query["script_score"]["script"] = { "source": source, "params": params }| staleness_term = ( | ||
| "decayDateLinear(params.origin, params.scale, params.offset, params.decay," | ||
| " doc['resource_age_date'].value)" | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Below, we have a ternary for the staleness term based on doc['resource_age_date'].size(). Do we need that? I would assume decayDateLinear would be the appropriate value (1?) when doc['resource_age_date'].value is zero. So dropping the ternary shouldnt affect the value, right?
If we really do need the ternary, maybe we could put it here in the definition of staleness_term:
staleness_term = (
"("
" doc['resource_age_date'].size() == 0 ? 1 :"
" decayDateLinear(params.origin, params.scale, params.offset, params.decay,"
" doc['resource_age_date'].value)"
")"
)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do need the tertiary. resource_age_date is a date or null, not a number and decayDateLinear throws an error if doc['resource_age_date'].value is null. resource_age_date is null for resources with runs in the future. Checking for null with .size() is weird, but that's how opensearch painless scripts work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can move it to the staleness_term
learning_resources_search/api.py
Outdated
|
|
||
| yearly_decay_percent = search_params.get("yearly_decay_percent") | ||
| min_score = search_params.get("min_score") | ||
| max_incompleteness_penalty = search_params.get("max_incompleteness_penalty") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opinion: I think all the formulas etc are simpler with 0 to 1 values rather than percentages. Could consider doing
max_incompleteness_penalty = search_params.get("max_incompleteness_penalty", 0) / 100
learning_resources_search/api.py
Outdated
| source = "_score" | ||
| params = {} | ||
|
|
||
| if max_incompleteness_penalty and max_incompleteness_penalty > 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if max_incompleteness_penalty and max_incompleteness_penalty > 0: | |
| if max_incompleteness_penalty: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👆 here and elsewhere
5e7f0e4 to
2d1b396
Compare
2d1b396 to
00dd4fd
Compare
What are the relevant tickets?
closes https://github.com/mitodl/hq/issues/5259
Description (What does it do?)
This pr adds a slider to the admin dashboard that allows users to set the
max_incompleteness_penaltyto reduce the prominence of incomplete ocw courses in search results. Defaultmax_incompleteness_penaltyis set to zero for now because we need to recreate the index to add completeness and a non-zero default will break search for non-admin users while we are waiting for the reindex to finish.The score after the completeness adjustmet is given by
Where
completenessis a decimal from 0 to 1and
max_incompleteness_penaltyis a percent from 0 to 100Completeness was added in this PR #1461
How can this be tested?
Before you do anything else, verify that search works normally for non-admins (or non logged in users) and also for admins as long as max_incompleteness_penalty is not set
Run
./manage.py backpopulate_ocw_scores
./manage.py backpopulate_ocw_data --course-name sp-248-neet-ways-of-thinking-fall-2023
./manage.py recreate_index
Go to http://open.odl.local:8062/search/?q=ways+of+thinking
"NEET Ways of Thinking", an ocw course with low completeness will be the first result
Log in as an admin
In the admin section of the search params you should see the "Maximum Incompleteness Penalty" slider. As you increase the incompleteness penalty, "NEET Ways of Thinking" should move down in the results