Skip to content

Conversation

@abeglova
Copy link
Contributor

@abeglova abeglova commented Sep 3, 2024

What are the relevant tickets?

closes https://github.com/mitodl/hq/issues/5259

Description (What does it do?)

This pr adds a slider to the admin dashboard that allows users to set the max_incompleteness_penalty to reduce the prominence of incomplete ocw courses in search results. Default max_incompleteness_penalty is set to zero for now because we need to recreate the index to add completeness and a non-zero default will break search for non-admin users while we are waiting for the reindex to finish.

The score after the completeness adjustmet is given by

relevance_score * (completeness * max_incompleteness_penalty /100 + (100-max_incompleteness_penalty/100)

Where completeness is a decimal from 0 to 1
and max_incompleteness_penalty is a percent from 0 to 100

Completeness was added in this PR #1461

How can this be tested?

Before you do anything else, verify that search works normally for non-admins (or non logged in users) and also for admins as long as max_incompleteness_penalty is not set

Run
./manage.py backpopulate_ocw_scores
./manage.py backpopulate_ocw_data --course-name sp-248-neet-ways-of-thinking-fall-2023
./manage.py recreate_index

Go to http://open.odl.local:8062/search/?q=ways+of+thinking
"NEET Ways of Thinking", an ocw course with low completeness will be the first result

Log in as an admin
In the admin section of the search params you should see the "Maximum Incompleteness Penalty" slider. As you increase the incompleteness penalty, "NEET Ways of Thinking" should move down in the results

@abeglova abeglova marked this pull request as ready for review September 4, 2024 12:28
@abeglova abeglova changed the title Ab/completeness discount OCW completeness penalty in search Sep 4, 2024
@ChristopherChudzicki ChristopherChudzicki self-assigned this Sep 4, 2024
Copy link
Contributor

@ChristopherChudzicki ChristopherChudzicki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 worked as described. Left a few suggestions for consideration.

Comment on lines 570 to 575
if (
yearly_decay_percent
and yearly_decay_percent > 0
and max_incompleteness_penalty
and max_incompleteness_penalty > 0
):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can drop the inequalities—they're redundant.

On the whole, rather than having if A and B / elif A / elif B, could we do

        source = "_score"
        params = {}
        if yearly_decay_percent:
            source = f"{source} * {completeness_term}"
            params["max_incompleteness_penalty"] = max_incompleteness_penalty
        if yearly_decay_percent:
            source = f"{source} * {staleness_term}"
            params["decay"] = 1 - (yearly_decay_percent / 100)
            params["offset"] = "0"
            params["scale"] = "354d"
            params["origin"] = datetime.now(tz=UTC).strftime("%Y-%m-%dT%H:%M:%S.%fZ")

        script_query["script_score"]["script"] = { "source": source, "params": params }

Comment on lines 565 to 567
staleness_term = (
"decayDateLinear(params.origin, params.scale, params.offset, params.decay,"
" doc['resource_age_date'].value)"
)
Copy link
Contributor

@ChristopherChudzicki ChristopherChudzicki Sep 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Below, we have a ternary for the staleness term based on doc['resource_age_date'].size(). Do we need that? I would assume decayDateLinear would be the appropriate value (1?) when doc['resource_age_date'].value is zero. So dropping the ternary shouldnt affect the value, right?

If we really do need the ternary, maybe we could put it here in the definition of staleness_term:

        staleness_term = (
            "("
            " doc['resource_age_date'].size() == 0 ? 1 :"
            " decayDateLinear(params.origin, params.scale, params.offset, params.decay,"
            " doc['resource_age_date'].value)"
            ")"
        )

Copy link
Contributor Author

@abeglova abeglova Sep 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do need the tertiary. resource_age_date is a date or null, not a number and decayDateLinear throws an error if doc['resource_age_date'].value is null. resource_age_date is null for resources with runs in the future. Checking for null with .size() is weird, but that's how opensearch painless scripts work

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can move it to the staleness_term


yearly_decay_percent = search_params.get("yearly_decay_percent")
min_score = search_params.get("min_score")
max_incompleteness_penalty = search_params.get("max_incompleteness_penalty")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opinion: I think all the formulas etc are simpler with 0 to 1 values rather than percentages. Could consider doing

max_incompleteness_penalty = search_params.get("max_incompleteness_penalty", 0) / 100

source = "_score"
params = {}

if max_incompleteness_penalty and max_incompleteness_penalty > 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if max_incompleteness_penalty and max_incompleteness_penalty > 0:
if max_incompleteness_penalty:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👆 here and elsewhere

@abeglova abeglova force-pushed the ab/completeness-discount branch from 5e7f0e4 to 2d1b396 Compare September 5, 2024 13:23
@abeglova abeglova force-pushed the ab/completeness-discount branch from 2d1b396 to 00dd4fd Compare September 5, 2024 13:53
@abeglova abeglova merged commit 42d155f into main Sep 5, 2024
This was referenced Sep 5, 2024
@rhysyngsun rhysyngsun deleted the ab/completeness-discount branch February 7, 2025 20:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants