Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review logs that requires special treatment when calculating the memory states from review history #63

Closed
L-M-Sherlock opened this issue Sep 11, 2023 · 37 comments · Fixed by #64

Comments

@L-M-Sherlock
Copy link
Member

L-M-Sherlock commented Sep 11, 2023

Reviews of the same card in the same day

FSRS doesn't accept interval < 1, so we need to filter out those review logs with the same date and only keep the first one.

Reviews in filtered deck when reschedule cards based on my answers in this deck is disabled.

These revlogs should be filtered out from review history.

These revlogs' review_kind is REVLOG_CRAM, so it's easy to filter out them.

Manual reset

There are two kinds of manual reset: Set Due Date and Forget. Their review_kind is REVLOG_RESCHED. But the fronter's ease (factor) is nonzero. The latter's ease (factor) is zero.

Set Due Date

These revlogs should be filtered out from review history, too. Because I think it's equal to CTRL+J (Reschedule repetition) in SuperMemo: https://supermemopedia.com/wiki/Ctrl%2BJ_vs._Ctrl%2BShift%2BR

Forget

We should remove the review logs in front of forget and forget itself. Then we need to treat the first review log after forget as the first learning, whose rating will be used to initialize the memory state.

Incomplete revlogs

Due to some legacies, The manual reset were not recorded in the old version of Anki/AnkiDroid. So we need to deal with this case via another way. Normally, the previous revlog of a REVLOG_LRN shouldn't be REVLOG_REV or REVLOG_RELRN. If so, we need to remove those revlogs. In a word, we should find out the the first REVLOG_LRN in the last successive REVLOG_LRNs

Besides, some users may use Transfer scheduling data from one card to another, which would generate more weird review history.

We need to deal with it case by case. Another more convenient method is to skip these cards.

@user1823 and @Expertium, if I miss anything, please let me know.

@dae
Copy link
Collaborator

dae commented Sep 11, 2023

I thought we were covering such cases in convert_to_fsrs_items() and the ported version in Anki. Is that not the case?

@L-M-Sherlock
Copy link
Member Author

L-M-Sherlock commented Sep 11, 2023

There are some difference between the filters in FSRS Optimizer (implemented in Pandas) and FSRS Helper (implemented in std lib). I cannot guarantee that convert_to_fsrs_items() has handled all situations.

@dae
Copy link
Collaborator

dae commented Sep 11, 2023

Well, perhaps we should :-) If we have one function that deals with all of this, we'll know our inputs are safe to feed into any of our API.

@L-M-Sherlock
Copy link
Member Author

Yeah. And it's tricky, because I'm not very familiar with the legacies of the system of revlogs. I will collect some cases for test.

@L-M-Sherlock
Copy link
Member Author

But I don't know the structure of input from Anki. Could you give me an example?

@dae
Copy link
Collaborator

dae commented Sep 11, 2023

@L-M-Sherlock
Copy link
Member Author

L-M-Sherlock commented Sep 11, 2023

Yes. Is it possbile to output the review history of a card in its Rust structure? I can write an add-on to print the review logs of a card, but it's a list of object in Python rather than Rust.

@dae
Copy link
Collaborator

dae commented Sep 11, 2023

There's partial info available via card stats (assuming card id 1):

mw.col.backend.card_stats(1).revlog

To see the full Rust structure, try the above after running Anki with the following patch:

diff --git a/rslib/src/stats/card.rs b/rslib/src/stats/card.rs
index d058c2233..7884a02c9 100644
--- a/rslib/src/stats/card.rs
+++ b/rslib/src/stats/card.rs
@@ -94,6 +94,7 @@ fn average_and_total_secs_strings(revlog: &[RevlogEntry]) -> (f32, f32) {
 fn stats_revlog_entry(
     entry: &RevlogEntry,
 ) -> anki_proto::stats::card_stats_response::StatsRevlogEntry {
+    dbg!(entry);
     anki_proto::stats::card_stats_response::StatsRevlogEntry {
         time: entry.id.as_secs().0,
         review_kind: entry.review_kind.into(),

These exceptions might be better handled not in the fsrs-rs crate, but in the Anki crate, since they're Anki-specific. Anki's port of the conversion code is here: https://github.com/ankitects/anki/blob/3742fa9f0c3118de1c23bf27198975eea62e09bb/rslib/src/scheduler/fsrs/weights.rs#L98

@L-M-Sherlock
Copy link
Member Author

I plan to write some temporary tests in fsrs-rs and remove them after you implement the conversion code in Anki.

@dae
Copy link
Collaborator

dae commented Sep 11, 2023

That's fine if you find it easier, but it might be more efficient to add the tests directly to the anki crate if you're able to build it. Up to you though - either way works for me. :-)

@L-M-Sherlock
Copy link
Member Author

@user1823
Copy link
Contributor

Jarrett: There are some difference between the filters in FSRS Optimizer (implemented in Pandas) and FSRS Helper (implemented in std lib). I cannot guarantee that convert_to_fsrs_items() has handled all situations.

Damien: Well, perhaps we should :-) If we have one function that deals with all of this, we'll know our inputs are safe to feed into any of our API.

@dae, I think that there must be some differences between the optimizer and the re-scheduler. The reason:

  • The output of the optimizer affects the scheduling of ALL the cards. So, we should remove the revlogs that can corrupt the final weights.
  • The output of the re-scheduler only affects that particular card. So, we can make some clever approaches to ensure that most of the cards get rescheduled (and not skipped) and that too appropriately.

@user1823
Copy link
Contributor

user1823 commented Sep 11, 2023

I made some comparisons of the approaches used in the optimizer and the rescheduler for dealing with the special cases. Here is the result of my analysis.

Reviews of the same card on the same day

Should be dealt similarly in the rescheduler and the optimizer - keep only the first review and filter out the rest

Reviews in filtered deck when reschedule cards based on my answers in this deck is disabled.

Should be dealt similarly in the rescheduler and the optimizer - Filter out these reviews
But, when reschedule cards based on my answers in this deck is enabled, these reviews should not be removed and treated as normal reviews.

Manual Scheduling

Should be dealt similarly in the rescheduler and the optimizer

  • Forget card:
    • Newer versions of Anki (manual revlog + Ease = 0): Remove all the reviews up to this rating
    • Older versions of Anki (no revlog, LEARN ratings seen after REVIEW/RELEARN ratings): Remove all the reviews before the last continuous group of LEARN ratings
      • Add-on code: https://github.com/open-spaced-repetition/fsrs4anki-helper/blob/a26bc62148f4c09db1c9dab508d8ea79413c0244/schedule/reschedule.py#L391-L401
      • Optimizer code (this repo):
        // Find the index of the first RevlogEntry in the last continuous group where review_kind = LEARNING
        // 寻找最后一组连续 review_kind = LEARNING 的第一个 RevlogEntry 的索引
        let mut index_to_keep = 0;
        let mut i = entries.len();
        while i > 0 {
        i -= 1;
        if entries[i].review_kind == LEARNING {
        index_to_keep = i;
        } else if index_to_keep != 0 {
        // Found a continuous group of review_kind = LEARNING, exit the loop
        // 找到了连续的 review_kind = LEARNING 的组,退出循环
        break;
        }
        }
        // Remove all entries before this RevlogEntry
        // 删除此 RevlogEntry 之前的所有条目
        entries.drain(..index_to_keep);

        and
        // Find the RevlogEntry with review_kind = LEARNING where the preceding RevlogEntry has review_kind of REVIEW or RELEARN, then remove it and all following RevlogEntries
        // 找到 review_kind = LEARNING 且前一个 RevlogEntry 的 review_kind 是 REVIEW 或 RELEARN 的 RevlogEntry,然后删除其及其之后的所有 RevlogEntry
        if let Some(index_to_remove) = entries.windows(2).enumerate().find_map(|(i, window)| {
        if (window[0].review_kind == REVIEW || window[0].review_kind == RELEARNING)
        && window[1].review_kind == LEARNING
        {
        // Return the index of the first RevlogEntry that meets the condition
        // 返回第一个符合条件的 RevlogEntry 的索引
        Some(i + 1)
        } else {
        None
        }
        }) {
        // Truncate from 0 to index_to_remove, removing all subsequent entries
        // 截取从 0 到 index_to_remove 的部分,删除其后的所有条目
        entries.truncate(index_to_remove);
        }

        I don't know the reason for this duplication.
  • Set Due Date: Filter out this MANUAL rating

First review type is not LEARN / Incomplete review history

Dealt differently in reschedule & optimize.

Suspended cards

Dealt differently in reschedule and optimizer

@L-M-Sherlock
Copy link
Member Author

L-M-Sherlock commented Sep 11, 2023

  • Optimizer code (this repo): Couldn't find

Here:

fn filter_out_cram(entries: Vec<RevlogEntry>) -> Vec<RevlogEntry> {
entries
.into_iter()
.filter(|entry| entry.review_kind != Filtered)
.collect()
}

I have refactored them in this branch: https://github.com/open-spaced-repetition/fsrs-rs/blob/Test/deal-with-revlog-from-Anki/src/temporary_test.rs

@L-M-Sherlock L-M-Sherlock linked a pull request Sep 12, 2023 that will close this issue
@dae
Copy link
Collaborator

dae commented Sep 12, 2023

Thanks, that's helpful. I've just pushed a draft of Anki integration you can play with, and I'll try return to this tomorrow.

@dae
Copy link
Collaborator

dae commented Sep 13, 2023

I think we may be ok now?

  • Suspended cards are excluded from review already, so that's not a problem. Users can optionally exclude them from training.
  • "Reviews in filtered deck when reschedule cards based on my answers in this deck is disabled.": ignored by the optimizer, and when reviewing the standard Anki review code will be used, and memory state not updated.
  • "First review type is not LEARN / Incomplete review history": I think this is already handled by the code that looks for the last learning entry.
  • The other cases are already covered.

@L-M-Sherlock
Copy link
Member Author

First review type is not LEARN / Incomplete review history

According to open-spaced-repetition/fsrs4anki-helper#16 (comment), @user1823 proposed that it is better to skip only those cards whose first review kind isn't learning AND which don't have any again rating.

For example:

img

@dae
Copy link
Collaborator

dae commented Sep 13, 2023

Do you mean the starting point should be min(last_learning_idx, first_again_idx)? For both training and review?

@L-M-Sherlock
Copy link
Member Author

I think it only works for review. Maybe @user1823 has more ideas. And I think it's OK to skip them both in training and review, because these cases are very rare.

@dae
Copy link
Collaborator

dae commented Sep 13, 2023

The main reason this might occur is for long-term / heavy Anki users, who have removed the oldest few years of their review history to keep their collection size down & stay inside the AnkiWeb sync limits. Quite rare, but might be best to deal with.

@L-M-Sherlock
Copy link
Member Author

Yeah. For these cases, we should treat the first relearning review as the first learning review.

@user1823
Copy link
Contributor

user1823 commented Sep 13, 2023

With the current version of FSRS (python optimizer and add-on), the approach is to

  • FILTER OUT ALL the reviews of such cards when optimizing
  • KEEP ALL the reviews of such cards if they contain any Again rating when rescheduling

The rationale:

  • The review history is incomplete and it can corrupt the trained weights. So, it is best to skip these cards while optimizing.
  • When rescheduling, only these cards are affected. So, the risk of including them is not very high. But, there are two types of cards to consider:
    • Cards that have no Again rating in the "known" review history. If these cards are rescheduled, it can lead to problems like this:

      The interval decreased from 1.04 years to just 2 days. So, we should not reschedule such cards.

    • Cards that have one or more Again ratings in their "known" review history.

      • If these cards are rescheduled without filtering out any reviews, I think that the memory states obtained are reasonable (based on personal experience). The helper add-on currently uses this approach and we can use it in Anki's implementation too.
      • If these cards are rescheduled after filtering out the reviews before the last rating, I am not sure what the final result would be. But, it might be reasonable as well.

You might wonder: What about the cards that don't have any Again rating in the known review history? Will FSRS never work for them?

The answer is simple. Eventually, these cards will be forgotten. Then, they will get an Again rating and FSRS will start working for them. The purpose of not rescheduling them before they get an Again rating is to prevent weird intervals.

@user1823
Copy link
Contributor

user1823 commented Sep 13, 2023

.filter(|entry| entry.review_kind != Filtered)

To filter out just the CRAM revlogs, doesn't the condition need to be the following?

entry.review_kind != Filtered || entry.ease_factor != 0

@L-M-Sherlock
Copy link
Member Author

entry.review_kind != Filtered || entry.ease_factor != 0

ankitects/anki#2656 (comment)

@user1823
Copy link
Contributor

user1823 commented Sep 24, 2023

The main reason this might occur is for long-term / heavy Anki users, who have removed the oldest few years of their review history to keep their collection size down & stay inside the AnkiWeb sync limits. Quite rare, but might be best to deal with.

@dae, is this case (incomplete review history) dealt with in the latest beta of Anki?

I suggest using this approach to deal with such cards. But obviously, if you have a better idea to deal with them, it would be fine too.

@dae
Copy link
Collaborator

dae commented Sep 25, 2023

I've changed the code in beta 2 so that a lack of learning steps no longer causes the memory state to be as if the card is new. If you have further suggestions, I'm open to them.

@user1823
Copy link
Contributor

user1823 commented Sep 25, 2023

a lack of learning steps no longer causes the memory state to be as if the card is new

If all such cards are rescheduled, it can lead to problem like this (where the interval decreased from 1.04 years to just 2 days).

Edit: I have also verified that this problem exists in Beta 2.
image

But, we also don't want to treat all such cards as new because then FSRS wouldn't work for them.

So, I suggest re-reading this comment of mine, where I have discussed the issue in a greater detail.

@user1823
Copy link
Contributor

Once Anki no longer calculates the memory states for such cards, we will also need to figure out how to deal with such cards when they come up for review.

I didn't face this problem till now because I do my reviews on AnkiDroid where FSRS is not supported and SM-2 schedules the cards instead. But when, FSRS is the only scheduler available, we will need to make a special function for such cards. Possible options:

  • Use SM-2
  • Use the convert_states function of FSRS Scheduler (js)

https://github.com/open-spaced-repetition/fsrs4anki/blob/5a34201f4d5b3a9cc1a5da4511b6bc32b7c6e909/fsrs4anki_scheduler.js#L228-L241
https://github.com/open-spaced-repetition/fsrs4anki/blob/5a34201f4d5b3a9cc1a5da4511b6bc32b7c6e909/fsrs4anki_scheduler.js#L133-L144

For a textual description of this function, refer to https://github.com/open-spaced-repetition/fsrs4anki/wiki/How-does-the-scheduler-work%3F#calculate-memory-states

@dae
Copy link
Collaborator

dae commented Sep 26, 2023

Re https://forums.ankiweb.net/t/anki-23-10-beta/34912/38 and comments above:

During inference, I think we may need to produce a memory state if the card has any reps on it, as it will be surprising to the user otherwise. If the user has cards with a truncated review history, or cards they used 'set due date' on to introduce without going through the new queue, excluding cards without learning steps would mean such cards could never have a memory state assigned, and the initial stability/difficulty would be used at each review based on the current rating, so intervals would never grow.

We could change the criteria from "must have learning" to "must have Again", but that would mean the ratings would not grow above the initial intervals until the user rates such cards Again, which will be confusing and look like a bug in FSRS/Anki. Wouldn't it be better to just use whatever review came first? In the set-due-date case that would correctly reflect what would have happened if the user had gone through the new queue, and in the truncated revlog case, the initial stability may be wrong, but is hopefully shifted back towards actual by the subsequent reviews in the history.

Regarding writing a separate function instead, that would be a bit tricky, as the current code assumes a missing memory state when a card is not new means the memory state needs to be calculated from the revlog. I think it might be simpler to try and handle such cards in the same way other cards are handled (i.e. giving them an approximate memory state). WDYT?

@L-M-Sherlock
Copy link
Member Author

giving them an approximate memory state

Here is a method to convert Anki's interval and ease into FSRS's stability and difficulty: https://github.com/open-spaced-repetition/fsrs4anki/wiki/How-does-the-scheduler-work%3F#interval-to-stability

@dae
Copy link
Collaborator

dae commented Sep 26, 2023

fsrs-js didn't have access to the revlog history, which we do here - do you still feel converting from SM-2 ease is safer than trying to infer the memory state from incomplete logs? No preference on my end, just curious.

@L-M-Sherlock
Copy link
Member Author

There are too many complex cases for incomplete logs. It's hard to deal with those corner cases one by one. So I prefer to convert memory state from SM-2 ease and interval.

@user1823
Copy link
Contributor

user1823 commented Sep 26, 2023

There are too many complex cases for incomplete logs. It's hard to deal with those corner cases one by one. So I prefer to convert memory state from SM-2 ease and interval.

I completely agree. With this, we can also get rid of the check for the presence of an Again rating in the known review history.

To summarise, we are going to deal with such cards in the following way:

  • Check if the first rating is of Learn type or not.
  • If the first rating is of Learn type, no issue. In this case, Anki can calculate the memory states the normal way.
  • If the first rating is not of Learn type, use SM-2 ease factor and interval at the first rating to estimate the memory states. Then, on subsequent ratings of the same card, update the memory states using the normal FSRS formulas.

@L-M-Sherlock, what about using this approach in the add-on too (for Anki versions up to 2.1.66)?

@user1823
Copy link
Contributor

Another question that needs to be answered:

Which interval is used in the following line of code? The interval before the first rating? Or the interval obtained after the first rating?

let stability = interval.max(0.1);

In the first case, we will obtain the memory states before the first rating, and we will have to calculate the memory states after the first rating using the regular FSRS formulas.
In the second case, we will obtain the memory states after the first rating, and we will have to calculate the memory states after the subsequent ratings using the regular FSRS formulas.

In my opinion, the first option is better because the memory states will tend to be more accurate (the earlier we can calculate the memory states, the more accurate the results will be) and it can deal with cards having no review history at the time of review (explained in the next para).

Consider the following card. I reviewed it today and it had no review history before today. Assume that I had started using native FSRS yesterday. Now, if Anki doesn't use the first approach (calculating memory states from interval and ease before the first rating), it won't be able to schedule this card.

However, I think that the interval before the first review will be available only during review (due to technical limitations). So, we might want to use a hybrid approach.

  • In the scheduler, calculate the memory states from interval and ease before the rating if the card doesn't have any review history (but it is not new).
  • In the re-scheduler, calculate the memory states from interval and ease after the first rating.

user1823 added a commit to user1823/fsrs4anki-helper that referenced this issue Sep 26, 2023
Context: open-spaced-repetition/fsrs-rs#63 (comment)

I think that I have added most of the code required to make it work. You will just need to do some final touches so that it can become functional.

I hope that editing this PR requires less effort than writing all the code yourself.
@dae
Copy link
Collaborator

dae commented Sep 27, 2023

The memory state is derived when saving weights or when reviewing a card without state, so it should work as you expect.

@user1823
Copy link
Contributor

user1823 commented Sep 27, 2023

Sorry for asking too many questions, but I want to ensure that the built-in FSRS works perfectly fine when it is released as stable. Also, because I am not comfortable in reading Rust code (especially when it is scattered across several files and repos), I can only rely on basic experimentation and such question/answers to ensure that things work fine.

For the above question, your answer was so brief that I couldn't make out what it exactly means. So, let me make the question more concise so that you can easily answer it in a way that clears my doubt.

In the above comment, I asked about a review card with no review history at the time of review. Can such cards be assigned memory states with the current code? If not, I proposed a solution in the same comment.

For an example, consider the following card. What will happen when it comes for review on 12th October?

image

@dae
Copy link
Collaborator

dae commented Sep 28, 2023

Cards without any review history don't get updated by the deck options, but when it comes time to review, the memory state is first derived from the ease factor and interval, and then that state is changed by the rating the user provided.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants