OCW completeness penalty in search #1512

abeglova · 2024-09-03T22:13:30Z

What are the relevant tickets?

closes https://github.com/mitodl/hq/issues/5259

Description (What does it do?)

This pr adds a slider to the admin dashboard that allows users to set the max_incompleteness_penalty to reduce the prominence of incomplete ocw courses in search results. Default max_incompleteness_penalty is set to zero for now because we need to recreate the index to add completeness and a non-zero default will break search for non-admin users while we are waiting for the reindex to finish.

The score after the completeness adjustmet is given by

relevance_score * (completeness * max_incompleteness_penalty /100 + (100-max_incompleteness_penalty/100)

Where completeness is a decimal from 0 to 1
and max_incompleteness_penalty is a percent from 0 to 100

Completeness was added in this PR #1461

How can this be tested?

Before you do anything else, verify that search works normally for non-admins (or non logged in users) and also for admins as long as max_incompleteness_penalty is not set

Run
./manage.py backpopulate_ocw_scores
./manage.py backpopulate_ocw_data --course-name sp-248-neet-ways-of-thinking-fall-2023
./manage.py recreate_index

Go to http://open.odl.local:8062/search/?q=ways+of+thinking
"NEET Ways of Thinking", an ocw course with low completeness will be the first result

Log in as an admin
In the admin section of the search params you should see the "Maximum Incompleteness Penalty" slider. As you increase the incompleteness penalty, "NEET Ways of Thinking" should move down in the results

ChristopherChudzicki

👍 worked as described. Left a few suggestions for consideration.

ChristopherChudzicki · 2024-09-04T15:14:09Z

learning_resources_search/api.py

+        if (
+            yearly_decay_percent
+            and yearly_decay_percent > 0
+            and max_incompleteness_penalty
+            and max_incompleteness_penalty > 0
+        ):


We can drop the inequalities—they're redundant.

On the whole, rather than having if A and B / elif A / elif B, could we do

source = "_score" params = {} if yearly_decay_percent: source = f"{source} * {completeness_term}" params["max_incompleteness_penalty"] = max_incompleteness_penalty if yearly_decay_percent: source = f"{source} * {staleness_term}" params["decay"] = 1 - (yearly_decay_percent / 100) params["offset"] = "0" params["scale"] = "354d" params["origin"] = datetime.now(tz=UTC).strftime("%Y-%m-%dT%H:%M:%S.%fZ") script_query["script_score"]["script"] = { "source": source, "params": params }

ChristopherChudzicki · 2024-09-04T15:14:13Z

learning_resources_search/api.py

+        staleness_term = (
+            "decayDateLinear(params.origin, params.scale, params.offset, params.decay,"
+            " doc['resource_age_date'].value)"
+        )


Below, we have a ternary for the staleness term based on doc['resource_age_date'].size(). Do we need that? I would assume decayDateLinear would be the appropriate value (1?) when doc['resource_age_date'].value is zero. So dropping the ternary shouldnt affect the value, right?

If we really do need the ternary, maybe we could put it here in the definition of staleness_term:

staleness_term = ( "(" " doc['resource_age_date'].size() == 0 ? 1 :" " decayDateLinear(params.origin, params.scale, params.offset, params.decay," " doc['resource_age_date'].value)" ")" )

We do need the tertiary. resource_age_date is a date or null, not a number and decayDateLinear throws an error if doc['resource_age_date'].value is null. resource_age_date is null for resources with runs in the future. Checking for null with .size() is weird, but that's how opensearch painless scripts work

I can move it to the staleness_term

ChristopherChudzicki · 2024-09-04T15:21:08Z

learning_resources_search/api.py


    yearly_decay_percent = search_params.get("yearly_decay_percent")
    min_score = search_params.get("min_score")
+    max_incompleteness_penalty = search_params.get("max_incompleteness_penalty")


Opinion: I think all the formulas etc are simpler with 0 to 1 values rather than percentages. Could consider doing

max_incompleteness_penalty = search_params.get("max_incompleteness_penalty", 0) / 100

ChristopherChudzicki · 2024-09-04T18:33:50Z

learning_resources_search/api.py

+        source = "_score"
+        params = {}
+
+        if max_incompleteness_penalty and max_incompleteness_penalty > 0:


Suggested change

if max_incompleteness_penalty and max_incompleteness_penalty > 0:

if max_incompleteness_penalty:

👆 here and elsewhere

abeglova marked this pull request as ready for review September 4, 2024 12:28

abeglova changed the title ~~Ab/completeness discount~~ OCW completeness penalty in search Sep 4, 2024

ChristopherChudzicki self-assigned this Sep 4, 2024

ChristopherChudzicki approved these changes Sep 4, 2024

View reviewed changes

ChristopherChudzicki added the Waiting on author label Sep 4, 2024

ChristopherChudzicki approved these changes Sep 4, 2024

View reviewed changes

abeglova force-pushed the ab/completeness-discount branch from 5e7f0e4 to 2d1b396 Compare September 5, 2024 13:23

Add completeness discount to search

00dd4fd

abeglova force-pushed the ab/completeness-discount branch from 2d1b396 to 00dd4fd Compare September 5, 2024 13:53

abeglova merged commit 42d155f into main Sep 5, 2024

This was referenced Sep 5, 2024

Release 0.18.2 #1520

Closed

Release 0.18.2 #1523

Closed

Release 0.18.2 #1524

Merged

rhysyngsun deleted the ab/completeness-discount branch February 7, 2025 20:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

OCW completeness penalty in search #1512

OCW completeness penalty in search #1512

Uh oh!

abeglova commented Sep 3, 2024 •

edited

Loading

Uh oh!

ChristopherChudzicki left a comment

Uh oh!

ChristopherChudzicki Sep 4, 2024

Uh oh!

ChristopherChudzicki Sep 4, 2024 •

edited

Loading

Uh oh!

abeglova Sep 4, 2024 •

edited

Loading

Uh oh!

abeglova Sep 4, 2024

Uh oh!

ChristopherChudzicki Sep 4, 2024

Uh oh!

ChristopherChudzicki Sep 4, 2024

Uh oh!

ChristopherChudzicki Sep 4, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	if max_incompleteness_penalty and max_incompleteness_penalty > 0:
	if max_incompleteness_penalty:

OCW completeness penalty in search #1512

OCW completeness penalty in search #1512

Uh oh!

Conversation

abeglova commented Sep 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What are the relevant tickets?

Description (What does it do?)

How can this be tested?

Uh oh!

ChristopherChudzicki left a comment

Choose a reason for hiding this comment

Uh oh!

ChristopherChudzicki Sep 4, 2024

Choose a reason for hiding this comment

Uh oh!

ChristopherChudzicki Sep 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

abeglova Sep 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

abeglova Sep 4, 2024

Choose a reason for hiding this comment

Uh oh!

ChristopherChudzicki Sep 4, 2024

Choose a reason for hiding this comment

Uh oh!

ChristopherChudzicki Sep 4, 2024

Choose a reason for hiding this comment

Uh oh!

ChristopherChudzicki Sep 4, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

abeglova commented Sep 3, 2024 •

edited

Loading

ChristopherChudzicki Sep 4, 2024 •

edited

Loading

abeglova Sep 4, 2024 •

edited

Loading