[Feature Request] Recall Matrix like in SuperMemo algorithms #271

Expertium · 2023-05-28T15:31:39Z

Which module is related to your feature request?
Scheduler, Optimizer

Is your feature request related to a problem? Please describe.
It's a way to correct out theroretical function using real data to obtain more accurate predictions in cases where the theoretical function makes poor predictions.

Describe the solution you'd like
Recall Matrix.docx
It's a pretty long description, but that's because it's a very sophisticated feature. However, it oculd potentially have a greater impact on the accuracy than any other change so far.

Additional context
While I tried to make the description as detailed and clear as possible, it's possible that the final implementation will be different. Still, this is about as good of a description as I can make. It's fine if this isn't your top priority right now, but I highly recommend implementing it in the foreseeable future.

L-M-Sherlock · 2023-06-05T15:24:20Z

Let's discuss this issue here.

The recall matrix should be a 3-dimensional matrix, where the dimensions are S, D, and theoretical R. Similar to the B-W heatmap, but with theoretical R as a new dimension.

Why use the theoretical R as an extra dimension? Why not just save the measured R in the matrix entry like this: R_M = RecallMatrix(D_rounded, S_rounded)

Expertium · 2023-06-05T15:28:26Z

Because the same S could correspond to different R because of t. If, for 2 predictions, S is the same but t is different, R will be different. Grouping them together will reduce accuracy.

L-M-Sherlock · 2023-06-05T15:30:45Z

OK. I will calculate the R-Matrix firstly.

L-M-Sherlock · 2023-06-05T15:46:28Z

B_W_Metric_raw['r_bin'] = B_W_Metric_raw['p'].map(lambda x: round(x * 4, 1) / 4)
R_Matrix = B_W_Metric_raw.groupby(by=['s_bin', 'd_bin', 'r_bin']).agg({'y': ['mean', 'count']}).reset_index()
R_Matrix.columns = ['S_rounded', 'D_rounded', 'R_t_rounded', 'R_measured', 'count']
R_Matrix[R_Matrix['count'] > 300]

Paste it after the B-W-Metric cell. Is it your need?

L-M-Sherlock · 2023-06-05T15:50:32Z

DiffMax = max(abs(R_Matrix['R_measured']-R_Matrix['R_t_rounded']))

Expertium · 2023-06-05T15:55:26Z

It doesn't show anything, other than an indicator that the code has been executed successfully.

L-M-Sherlock · 2023-06-05T16:00:44Z

You should print() them. I have updated the mini-batch.ipynb: https://colab.research.google.com/github/open-spaced-repetition/fsrs4anki/blob/Expt/new-baseline/experiment/mini-batch.ipynb

Expertium · 2023-06-05T16:04:39Z

Yes, looks good.

Expertium · 2023-06-08T12:18:15Z

I know it's too early to think about it, but I thought about whether we should

Allow users to have different sets of parameters for different decks and different recall matrices for each set
Allow users to have different sets of parameters for different decks but use the same recall matrix
Allow users to only have 1 set of parameters and 1 recall matrix, and remove the ability to specify different sets of parameters for different decks

I think 2 is the best option. If we assign a different recall matrix to each different set of parameters, those matrices will be very sparse. For example, suppose that the entire collection has 100 000 reviews, but the user decides to add a new set of parameters for a deck that only has 3 000 reviews. The recall matrix for the entire collection will have a lot of information and a lot of entries, whereas the recall matrix for that one deck will have very few entries with low counts, making it less useful. The more data, the better. There is no upside to using a sparse recall matrix based on little info.
At the same time, not allowing users to have different parameters for different decks would reduce the flexibility of FSRS. So yeah, 2 seems like the best option.

Expertium · 2023-06-17T10:49:34Z

@L-M-Sherlock seems like it's finally time to start working on this? I believe all other important issues have been resolved.

L-M-Sherlock · 2023-06-17T10:55:42Z

There are still some bugs in the Helper add-on. I need to deal with them firstly.

Expertium · 2023-06-19T15:51:36Z

Some more thoughts about the R-matrix:

While originally I proposed this weighting system

Now I think that this will do better

The problem with the original idea is that DiffMax is always going to be either 1 or close to 1. So instead let's use w_1 to decide the maximum weight that can be assigned to a matrix entry. We can test both, but I'm willing to bet that the idea of weighting both by the number of entries and by the difference between R_t and R_M will perform worse. It's mostly a gut feeling.
Keep in mind that the R-Matrix that will be stored on the user's PC has to be modified after each review; it's not read-only.
Here's the tricky thing: R-Matrix has to be fully recalculated when the user changes any set of parameters. Example:
Suppose the user has 2 sets of parameters (for different decks): A and B. He recalculated B and is now using B'. Now the correspondence between theoretical and measured R has changed, and the matrix has to be recalculated because of that. And all entries have to be recalculated since we don't keep track of which review from which deck has contributed to which entry. If an entry has x reviews, we don't know how many of them came from a deck that uses set A and how many came from a deck that uses set B (or B').
During the optimization we will probably have to clear the R-matrix after each epoch to avoid "double counting" reviews. Otherwise each review will be counted n times, where n - number of epochs.

L-M-Sherlock · 2023-06-29T04:33:24Z

Sorry for the late reply. I'm busy for attending an academic conference in few days later. And the feature requests is overflowing. The bug fixing is in high priority. I will focus on the development of R-metric after July 3.

L-M-Sherlock · 2023-06-29T07:46:24Z

I have a question: how to optimize $w_1$ and $w_2$?

Expertium · 2023-06-29T10:00:34Z

Oh yeah, about that. I believe we should completely change our loss function. I've seen a problem multiple times - changing formulas decreased log-loss by around 1%, but RMSE decreased by 5%/10%/20%.
I think we should minimize the difference between theoretical R and its corresponding R-matrix entry. In other words, the new loss will be WozLoss(Rt, Rm), or -ln(1-abs(Rt-Rm))
Although implementing that in the optimizer will be tricky. You will also have to add weighting by the number of reviews that the matrix entry is based on.

class WozLoss(nn.Module):
    
    def __init__(self):
        super(WozLoss, self).__init__()
    
    def forward(self, predictions, labels):
        loss = -torch.log(1-torch.abs(predictions-labels))
        loss = loss.clamp(0, 100)
        return loss.sum()

self.loss_fn = WozLoss()

Expertium · 2023-06-29T10:17:25Z

Wait, I just realized - we can't optimize w1 and w2 that way, since those are used in weighted R, and that loss above doesn't use weighted R.
Ok, forget what I said above then. Let's not change the loss function for now. Just make the algorithm output Rw instead of Rt whenever it predicts R.
By the way, once the R-matrix is done, every single calculation involving R (in the optimizer, scheduler and helper) will have to use weighted R. That will involve changing a lot of the code, especially for the Helper add-on.
So to answer your question - w1 and w2 will be optimized the same way as all the other parameters are optimized right now.

L-M-Sherlock · 2023-06-29T10:22:21Z

Before we use the R-Matrix, we should train the parameters of FSRS at first. How can we optimize w1 and w2 in the same way as other parameters in the same time?

Expertium · 2023-06-29T10:25:50Z

I don't see why we can't optimize them simultaneously. For each epoch we make the R-matrix, calculate weighted R and use it in the loss function, clear the R-matrix and repeat this for the next epoch.

L-M-Sherlock · 2023-06-29T10:32:18Z

I don't see why we can't optimize them simultaneously. For each epoch we make the R-matrix, calculate weighted R and use it in the loss function, clear the R-matrix and repeat this for the next epoch.

Which parameters should we use to generate R-matrix? The parameters at the start of the epoch?

Expertium · 2023-06-29T10:36:42Z

That's a good question, I haven't thought about that. I think using parameters from the previous epoch (init_w if this is the first epoch) is reasonable.

user1823 · 2023-06-30T10:06:01Z

@Expertium, what if the weight comes out to be very close to 1?

I think that the optimizer would try to do this because when weight is equal to 1, the loss becomes very small.

Expertium · 2023-06-30T10:09:53Z

W1 determines the maximum weight, and we will let the optimizer choose it. We'll see if that leads to such issues, for now it's hard to predict how it will behave. And even if sets w1 to 0.99 (that's how I plan to clamp w1), I don't think this is bad, since it will be 0.99 only for entries based on a lot of reviews, in other words, it will be 0.99 for R that is estimated very accurately, which is exactly what we want.
If the optimizer sets both w1 and w2 to their maximum values, yes, that will be a problem. We'll see how it goes, for now we don't know how it will behave.

user1823 · 2023-06-30T10:20:16Z

I thought about it more thoroughly and realized that the model would still need to accurately classify the cards according to their stability and difficulty in order to reap good benefits of the R-matrix. So, there shouldn't be any major problem. But, let's see how it goes.

L-M-Sherlock · 2023-07-04T06:20:47Z

https://colab.research.google.com/github/open-spaced-repetition/fsrs4anki/blob/Expt/new-baseline/experiment/mini-batch_R-Matrix.ipynb

Expertium · 2023-07-04T09:22:32Z

I would advise you to add this code to 4.0 alpha, since that's what I'm using as the baseline, and I want to have a fair comparison (same goes for the hybrid S0 curve-fit/pre-train method in the other issue).
In my .docx document I provided bins for S. I think we should use those. The reason we should use pre-determined bins is to avoid a circular dependency between S and the R-matrix, where the value of S depends on the R-matrix entry which depends on the value of S which depends on the R-matrix entry which depends...you get the idea.
Here, I'll provide them in a more convenient form:

bins_S = [0.1, 0.3, 0.9, 1.3, 1.8, 2.5, 3.5, 4.9, 6.9, 9.8, 13.9, 19.7, 27.9, 39.5, 55.9, 79.1, 111.9, 158.3, 223.9,
          316.6, 447.7, 633.1, 895.3, 1266.1, 1790.5, 2532.1, 3580.9, 5064.2, 7161.9, 10128.5]

We might need a few more epochs for it to converge, since the R-matrix is always kinda "lagging behind" due to being calculated using the parameters of the previous epoch.
While you have implemented R_w, you haven't used it to its full potential. As described in my .docx document, Rw should be used to calculate a new estimate of the current S, and also in the formula for new_s.
In other words, R_w should be calculated before this:
new_s = torch.where(condition, self.stability_after_success(state, new_d, r), self.stability_after_failure(state, new_d, r))

L-M-Sherlock · 2023-07-04T10:22:22Z

4. While you have implemented R_w, you haven't used it to its full potential. As described in my .docx document, Rw should be used to calculate a new estimate of the current S, and also in the formula for new_s.

The new estimate of the current S is not differentiable because we calculate it via indexing the R-Matrix. The gradient descent method is blocked by it.

L-M-Sherlock · 2023-07-04T10:26:10Z

I would advise you to add this code to 4.0 alpha, since that's what I'm using as the baseline, and I want to have a fair comparison (same goes for the hybrid S0 curve-fit/pre-train method in the other issue).

The weight to average R-Matrix has achieved 0.99, which means the model structure of FSRS is unimportant. The output of FSRS is ignored. So it would output the same result whatever I implement it in any other baselines.

L-M-Sherlock · 2023-07-04T10:28:10Z

2. In my .docx document I provided bins for S. I think we should use those. The reason we should use pre-determined bins is to avoid a circular dependency between S and the R-matrix, where the value of S depends on the R-matrix entry which depends on the value of S which depends on the R-matrix entry which depends...you get the idea.

There is not circular dependency in my current implementation, because it is an unidirectional relation between S and R-matrix.

Expertium · 2023-07-09T14:50:17Z

Here are the results on my collection, using https://colab.research.google.com/github/open-spaced-repetition/fsrs4anki/blob/Expt/new-baseline/candidate/R-Matrix.ipynb (the version I copied and edited myself produced different results for some reason)
The intervals are a little weird, I agree. Although yours are much more weird.

Expertium · 2023-07-09T16:14:04Z

So, Sherlock, do you have any ideas how to speed up optimization? Also, I recommend re-opening this issue.

user1823 · 2023-07-09T16:23:13Z

The intervals are a little weird,

Do you think that this is a "little" wierd? The intervals not increasing beyond 1.7 months is a matter of significant concern.

Also, I recommend re-opening this issue.

Reopening the issue is worthwhile only if we can think of a solution for these wierd intervals.

I don't want to sound rude. I just wanted to assert that the R-matrix is not performing well and I think that the time has come to junk this idea.

Expertium · 2023-07-09T16:25:35Z

Do you think that this is a "little" wierd? The intervals not increasing beyond 1.7 months is a matter of significant concern.

Fair. I'm just more surprised by Sherlock's results, he gets the shortest max interval when "Good" is the first rating. That is really puzzling.

L-M-Sherlock · 2023-07-09T16:27:23Z

So, Sherlock, do you have any ideas how to speed up optimization?

The main bottleneck is querying R-Matrix when FSRS predicts the memory states from the entire review history. If the sequence has 10 repetitions, it requires 10 queries. I have no idea how to speed it up.

Expertium · 2023-07-11T14:07:07Z

@L-M-Sherlock let's give this another try, but in a different way.
Instead of grouping reviews by D, S, and R, we will group them by delta_t (interval length), n (number of reviews), and average grade (just a simple arithmetic mean of grades). The advantage is that such a matrix can be precomputed from raw data, and we don't need any model of memory to compute it. It will be less accurate that way, but hopefully it will have fewer issues. The weighting system can remain the same.

L-M-Sherlock · 2023-07-12T02:58:11Z

Instead of grouping reviews by D, S, and R, we will group them by delta_t (interval length), n (number of reviews), and average grade (just a simple arithmetic mean of grades).

How to index this matrix during training? delta_t and n existed for the single review. But where is average grade?

Expertium · 2023-07-12T09:26:39Z

You can take the sum of all grades and pass them as state[:,2], and then you can divide the sum by the number of reviews when you need to calculate the average.

sum = X[:,1] + state[:,2]
torch.stack([new_s, new_d, sum], dim=1)

Or you can use the number of lapses, if that makes things easier. If you decide to use the number of lapses, then choose some maximum number, like 6 or 8, such that any card with more lapses than that gets grouped into the same group. Like this:
Group 1: 1 lapse
Group 2: 2 lapses
Group 3: 3-4 lapses
Group 4: 5-6 lapses
Group 5: 6-7 lapses
Group 6: 8 or more lapses

Grouping by n should be similar. That's because if the number of groups is the same as the number of unique values of n, some of them will be very sparse, with a very small number of reviews in them.

Expertium · 2023-07-12T17:52:29Z

I have an idea how to improve accuracy even further by introducing another matrix, S-matrix, but we need to test this new R-matrix first, to see if it works. At the very least it should be faster since it only needs to be filled once, unlike the previous version which had to be recalculated during training.

L-M-Sherlock · 2023-07-13T12:11:53Z

https://github.com/open-spaced-repetition/fsrs4anki/blob/Expt/r-matrix/archive/candidate/r-matrix-t-n-g.ipynb

I implement the simple version based on 4.0.0. Could you help me test it?

Expertium · 2023-07-13T12:14:39Z

I can't open it in collab

L-M-Sherlock · 2023-07-13T12:22:21Z

https://colab.research.google.com/github/open-spaced-repetition/fsrs4anki/blob/Expt/r-matrix/archive/candidate/r-matrix-t-n-g.ipynb

Expertium · 2023-07-13T12:31:53Z

L-M-Sherlock · 2023-07-13T12:39:16Z

I'm updating the code. You can replace

df['lapse'] = df['r_history'].str.count('1')

with

df['lapse'] = df['r_history'].astype(str).str.count('1')

to solve it.

Expertium · 2023-07-13T12:44:21Z

Now I'm getting the same error as in the beta version.

L-M-Sherlock · 2023-07-13T13:26:06Z

Maybe you can switch to another deck?

Expertium · 2023-07-13T13:54:39Z

I mean, yes, I can, but it's still a problem that has to be addressed. Ok, I will test it on other decks.

Expertium · 2023-07-13T14:15:14Z

The optimization on my entire collection isn't done yet, but I can already tell that it behaves differently compared to the previous implementation: the parameter that determines the maximum amount of weight that can be allocated to the R-matrix entry is nowhere near 0.99. It actually seems pretty fairly balanced.

Expertium · 2023-07-13T14:16:56Z

@user1823 you should try the new version too: #271 (comment)
I'm curious to see if you will get any kind of strange or unexpected results.

Expertium · 2023-07-13T16:03:04Z

For my collection RMSE went down only by about 13% compared to 4.0.0 beta. from 0.0589 to 0.0515; nowhere near as much as I was expecting.
EDIT: obviously this grouping isn't as good as grouping by D, S and R, but still.

L-M-Sherlock · 2023-07-13T16:06:05Z

I mean, yes, I can, but it's still a problem that has to be addressed. Ok, I will test it on other decks.

I fixed this problem in the main branch and r-matrix-t-n-g.ipynb.

Just remove this line of code:

df.drop(df[df['create_date'].dt.year < 2006].index, inplace=True)

Expertium · 2023-07-13T16:10:16Z

I think grouping should be improved.

Use the same (or slightly modified) formula that you used to round S to round delta_t. If one card has delta_t=353, and the other delta_t=352, it doesn't make much sense to put them into different groups.
Also group lapses differently. For example, lapses=7 and lapses=8 can safely be put into the same group.

Expertium · 2023-07-13T23:13:19Z

I tested this version, it performs better than v4 by around 9-10%, and it's statistically significant. I bet it can be improved with better grouping (see my comment above).

I could describe how to implement the S-matrix, my new idea, but considering that Sherlock already made up his mind about releasing v4 as is, I don't see the point of working on the S-matrix right now. I think we should do some final tests of this version of R-matrix, with better grouping, and leave it until Sherlock decides to start working on v5. Of course, if Sherlock decides to postpone the release of v4 to implement matrices, I would have nothing against it; quite the opposite, if I were in his shoes, I wouldn't release v4 before exhaustively trying everything related to matrices. But that would also require very major changes to the code of the helper add-on. And if it's true that FSRS will soon be supported on mobile, then this will increase the amount of headache tenfold or even make using matrices outright impossible, meaning that the mobile will need its own "FSRS Mobile" version with no matrices.

L-M-Sherlock · 2023-08-15T03:09:15Z

According my recent experiments, FSRS has reached the level of SM-15:

https://github.com/open-spaced-repetition/fsrs-vs-sm15/blob/main/evaluate.ipynb

And as we known, SM-15 employs many matrixes to fit the memory of user. FSRS v4 only uses 17 parameters for that. I think the benefit of matrixes couldn't compensate the cost to implement them.

If you no longer insist on implementing matrices in FSRS, I will close this issue.

Expertium · 2023-08-15T05:25:37Z

We still haven't properly implemented them though. For example, the last attempt had a major flaw with grouping reviews.

That being said, if FSRS is already more accurate than SM algorithms, we can put off the matrices and work on them in the future.

Expertium · 2023-11-22T20:34:24Z

While I would like to try a proper implementation of the R-matrix*, even if it performs well, I still don't know how to solve the problem with non-monotonicity of average R (predicted + R-matrix entry), and even if we somehow solve that, the matrix will be difficult to implement into Anki. So I suppose we can close this issue.

*with improved grouping. Last time cards with delta_t different by 1 day, like delta_t=365 and delta_t=365, were being grouped into distinct groups for no good reason.

Expertium added the enhancement New feature or request label May 28, 2023

Expertium mentioned this issue Jun 5, 2023

[Feature Request] Sharing ideas for further improvement of the algorithm #239

Closed

Expertium mentioned this issue Jul 11, 2023

[Feature Request] Improving the algorithm, continuation #282

Closed

L-M-Sherlock reopened this Jul 13, 2023

Expertium mentioned this issue Sep 14, 2023

[Feature Request] Ideas to further improve the accuracy of the algorithm #461

Closed

Expertium closed this as completed Nov 22, 2023

[Feature Request] Recall Matrix like in SuperMemo algorithms #271

[Feature Request] Recall Matrix like in SuperMemo algorithms #271

Comments

Expertium commented May 28, 2023 • edited Loading

L-M-Sherlock commented Jun 5, 2023

Expertium commented Jun 5, 2023

L-M-Sherlock commented Jun 5, 2023

L-M-Sherlock commented Jun 5, 2023 • edited Loading

L-M-Sherlock commented Jun 5, 2023

Expertium commented Jun 5, 2023 • edited Loading

L-M-Sherlock commented Jun 5, 2023

Expertium commented Jun 5, 2023

Expertium commented Jun 8, 2023

Expertium commented Jun 17, 2023

L-M-Sherlock commented Jun 17, 2023

Expertium commented Jun 19, 2023 • edited Loading

L-M-Sherlock commented Jun 29, 2023

L-M-Sherlock commented Jun 29, 2023

Expertium commented Jun 29, 2023 • edited Loading

Expertium commented Jun 29, 2023 • edited Loading

L-M-Sherlock commented Jun 29, 2023

Expertium commented Jun 29, 2023

L-M-Sherlock commented Jun 29, 2023

Expertium commented Jun 29, 2023

user1823 commented Jun 30, 2023

Expertium commented Jun 30, 2023 • edited Loading

user1823 commented Jun 30, 2023

L-M-Sherlock commented Jul 4, 2023

Expertium commented Jul 4, 2023 • edited Loading

L-M-Sherlock commented Jul 4, 2023

L-M-Sherlock commented Jul 4, 2023

L-M-Sherlock commented Jul 4, 2023

Expertium commented Jul 9, 2023 • edited Loading

Expertium commented Jul 9, 2023 • edited Loading

user1823 commented Jul 9, 2023 • edited Loading

Expertium commented Jul 9, 2023

L-M-Sherlock commented Jul 9, 2023

Expertium commented Jul 11, 2023

L-M-Sherlock commented Jul 12, 2023

Expertium commented Jul 12, 2023 • edited Loading

Expertium commented Jul 12, 2023

L-M-Sherlock commented Jul 13, 2023

Expertium commented Jul 13, 2023

L-M-Sherlock commented Jul 13, 2023

Expertium commented Jul 13, 2023

L-M-Sherlock commented Jul 13, 2023

Expertium commented Jul 13, 2023

L-M-Sherlock commented Jul 13, 2023

Expertium commented Jul 13, 2023

Expertium commented Jul 13, 2023 • edited Loading

Expertium commented Jul 13, 2023

Expertium commented Jul 13, 2023 • edited Loading

L-M-Sherlock commented Jul 13, 2023 • edited Loading

Expertium commented Jul 13, 2023 • edited Loading

Expertium commented Jul 13, 2023

L-M-Sherlock commented Aug 15, 2023

Expertium commented Aug 15, 2023

Expertium commented Nov 22, 2023

Expertium commented May 28, 2023 •

edited

Loading

L-M-Sherlock commented Jun 5, 2023 •

edited

Loading

Expertium commented Jun 5, 2023 •

edited

Loading

Expertium commented Jun 19, 2023 •

edited

Loading

Expertium commented Jun 29, 2023 •

edited

Loading

Expertium commented Jun 29, 2023 •

edited

Loading

Expertium commented Jun 30, 2023 •

edited

Loading

Expertium commented Jul 4, 2023 •

edited

Loading

Expertium commented Jul 9, 2023 •

edited

Loading

Expertium commented Jul 9, 2023 •

edited

Loading

user1823 commented Jul 9, 2023 •

edited

Loading

Expertium commented Jul 12, 2023 •

edited

Loading

Expertium commented Jul 13, 2023 •

edited

Loading

Expertium commented Jul 13, 2023 •

edited

Loading

L-M-Sherlock commented Jul 13, 2023 •

edited

Loading

Expertium commented Jul 13, 2023 •

edited

Loading