Use the median instead of the mean for recall costs and learn cost #107

Expertium · 2024-04-20T13:38:03Z

            recall_costs = recall_card_revlog.groupby(by="review_rating")[
                "review_duration"
            ].mean()

I suggest replacing the mean with the median here, as the median is not sensitive to outliers. Relevant problem: https://forums.ankiweb.net/t/clarify-what-optimal-retention-means/42803/50?u=expertium
This could help mitigate such problems when, for example, the user went away to make dinner and, as a result, the review time ended up being orders of magnitude greater than usual, skewing the mean. And don't forget to modify how the learn cost is calculated as well, the median should be used for all time costs.
Additionally, to make the estimate even more robust (granted, the median is already robust), we can remove all times >20 minutes before calculating the median, since obviously nobody spends that much time per card.

The text was updated successfully, but these errors were encountered:

user1823 · 2024-04-21T03:49:49Z

Additionally, we should remove all times that are exactly equal to 0 before calculating the median.

The reason is same as that given in

Skip reviews with time = 0 when calculating average answer times fsrs-rs#126

I am requesting this again because I think that skipping the entries during median calculation needs to be done in a way different than during mean calculation.

L-M-Sherlock · 2024-04-21T03:57:32Z

Should we also consider this?

user1823 · 2024-04-21T04:08:53Z

Should we also consider this?

If I understand this function correctly, the recorded answer times would never be greater than this setting. So, do we need to consider it?

L-M-Sherlock · 2024-04-21T04:18:59Z

If I understand this function correctly, the recorded answer times would never be greater than this setting. So, do we need to consider it?

I mean, if the values reach this limit, should we include them?

user1823 · 2024-04-21T04:27:47Z

During calculation of the median, the exact values of the lowest and highest values don't matter. So, I don't think that we need to remove the entries equal to the maximum limit.

Rather, removing those entries would cause the median to become unexpectedly small.

You might think that is contradictory to Expertium's suggestion of excluding answer times > 20 min. But, in that case, we are dealing with a specific situation where the user has set the maximum answer time to an unreasonably high value.

Expertium · 2024-04-21T10:09:37Z

I agree that all times equal to zero should be removed. As for "Maximum answer seconds", I think it's fine to keep capped values, but only of they don't exceed 20 minutes.

Expertium · 2024-04-22T14:26:39Z

@L-M-Sherlock just a reminder

L-M-Sherlock · 2024-04-22T16:09:57Z

I have checked the code which needs to update today, and found that it's a little harder than my initial vision. The hardest part is the forget_cost.

fsrs-optimizer/src/fsrs_optimizer/fsrs_optimizer.py

Lines 1210 to 1215 in 17d6aef

    
           if Relearning in state_count and Relearning in state_block: 
        
               forget_cost = round( 
        
                   state_duration[Relearning] / state_block[Relearning] / 1000 
        
                   + recall_cost, 
        
                   1, 
        
               )

I will re-design this part when I have more available time.

L-M-Sherlock linked a pull request Apr 25, 2024 that will close this issue

Feat/use median in calculating recall cost, forget cost and learn cost #109

Merged

L-M-Sherlock closed this as completed in #109 Apr 25, 2024

user1823 mentioned this issue May 17, 2024

A better outlier filter for "Compute minimum recommended retention" #112

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use the median instead of the mean for recall costs and learn cost #107

Use the median instead of the mean for recall costs and learn cost #107

Expertium commented Apr 20, 2024 •

edited

Loading

user1823 commented Apr 21, 2024 •

edited

Loading

L-M-Sherlock commented Apr 21, 2024

user1823 commented Apr 21, 2024

L-M-Sherlock commented Apr 21, 2024 •

edited

Loading

user1823 commented Apr 21, 2024 •

edited

Loading

Expertium commented Apr 21, 2024 •

edited

Loading

Expertium commented Apr 22, 2024

L-M-Sherlock commented Apr 22, 2024 •

edited

Loading

Use the median instead of the mean for recall costs and learn cost #107

Use the median instead of the mean for recall costs and learn cost #107

Comments

Expertium commented Apr 20, 2024 • edited Loading

user1823 commented Apr 21, 2024 • edited Loading

L-M-Sherlock commented Apr 21, 2024

user1823 commented Apr 21, 2024

L-M-Sherlock commented Apr 21, 2024 • edited Loading

user1823 commented Apr 21, 2024 • edited Loading

Expertium commented Apr 21, 2024 • edited Loading

Expertium commented Apr 22, 2024

L-M-Sherlock commented Apr 22, 2024 • edited Loading

Expertium commented Apr 20, 2024 •

edited

Loading

user1823 commented Apr 21, 2024 •

edited

Loading

L-M-Sherlock commented Apr 21, 2024 •

edited

Loading

user1823 commented Apr 21, 2024 •

edited

Loading

Expertium commented Apr 21, 2024 •

edited

Loading

L-M-Sherlock commented Apr 22, 2024 •

edited

Loading