-
-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] Recall Matrix like in SuperMemo algorithms #271
Comments
Let's discuss this issue here.
Why use the theoretical R as an extra dimension? Why not just save the measured R in the matrix entry like this: |
Because the same S could correspond to different R because of t. If, for 2 predictions, S is the same but t is different, R will be different. Grouping them together will reduce accuracy. |
OK. I will calculate the R-Matrix firstly. |
DiffMax = max(abs(R_Matrix['R_measured']-R_Matrix['R_t_rounded'])) |
You should print() them. I have updated the mini-batch.ipynb: https://colab.research.google.com/github/open-spaced-repetition/fsrs4anki/blob/Expt/new-baseline/experiment/mini-batch.ipynb |
Yes, looks good. |
I know it's too early to think about it, but I thought about whether we should
I think 2 is the best option. If we assign a different recall matrix to each different set of parameters, those matrices will be very sparse. For example, suppose that the entire collection has 100 000 reviews, but the user decides to add a new set of parameters for a deck that only has 3 000 reviews. The recall matrix for the entire collection will have a lot of information and a lot of entries, whereas the recall matrix for that one deck will have very few entries with low counts, making it less useful. The more data, the better. There is no upside to using a sparse recall matrix based on little info. |
@L-M-Sherlock seems like it's finally time to start working on this? I believe all other important issues have been resolved. |
There are still some bugs in the Helper add-on. I need to deal with them firstly. |
Sorry for the late reply. I'm busy for attending an academic conference in few days later. And the feature requests is overflowing. The bug fixing is in high priority. I will focus on the development of R-metric after July 3. |
I have a question: how to optimize |
Oh yeah, about that. I believe we should completely change our loss function. I've seen a problem multiple times - changing formulas decreased log-loss by around 1%, but RMSE decreased by 5%/10%/20%.
|
Wait, I just realized - we can't optimize w1 and w2 that way, since those are used in weighted R, and that loss above doesn't use weighted R. |
Before we use the R-Matrix, we should train the parameters of FSRS at first. How can we optimize w1 and w2 in the same way as other parameters in the same time? |
I don't see why we can't optimize them simultaneously. For each epoch we make the R-matrix, calculate weighted R and use it in the loss function, clear the R-matrix and repeat this for the next epoch. |
Which parameters should we use to generate R-matrix? The parameters at the start of the epoch? |
That's a good question, I haven't thought about that. I think using parameters from the previous epoch (init_w if this is the first epoch) is reasonable. |
@Expertium, what if the weight comes out to be very close to 1? I think that the optimizer would try to do this because when weight is equal to 1, the loss becomes very small. |
W1 determines the maximum weight, and we will let the optimizer choose it. We'll see if that leads to such issues, for now it's hard to predict how it will behave. And even if sets w1 to 0.99 (that's how I plan to clamp w1), I don't think this is bad, since it will be 0.99 only for entries based on a lot of reviews, in other words, it will be 0.99 for R that is estimated very accurately, which is exactly what we want. |
I thought about it more thoroughly and realized that the model would still need to accurately classify the cards according to their stability and difficulty in order to reap good benefits of the R-matrix. So, there shouldn't be any major problem. But, let's see how it goes. |
|
The new estimate of the current S is not differentiable because we calculate it via indexing the R-Matrix. The gradient descent method is blocked by it. |
The weight to average R-Matrix has achieved 0.99, which means the model structure of FSRS is unimportant. The output of FSRS is ignored. So it would output the same result whatever I implement it in any other baselines. |
There is not circular dependency in my current implementation, because it is an unidirectional relation between S and R-matrix. |
|
So, Sherlock, do you have any ideas how to speed up optimization? Also, I recommend re-opening this issue. |
Do you think that this is a "little" wierd? The intervals not increasing beyond 1.7 months is a matter of significant concern.
Reopening the issue is worthwhile only if we can think of a solution for these wierd intervals. I don't want to sound rude. I just wanted to assert that the R-matrix is not performing well and I think that the time has come to junk this idea. |
The main bottleneck is querying R-Matrix when FSRS predicts the memory states from the entire review history. If the sequence has 10 repetitions, it requires 10 queries. I have no idea how to speed it up. |
@L-M-Sherlock let's give this another try, but in a different way. |
How to index this matrix during training? |
You can take the sum of all grades and pass them as state[:,2], and then you can divide the sum by the number of reviews when you need to calculate the average.
Or you can use the number of lapses, if that makes things easier. If you decide to use the number of lapses, then choose some maximum number, like 6 or 8, such that any card with more lapses than that gets grouped into the same group. Like this: Grouping by n should be similar. That's because if the number of groups is the same as the number of unique values of n, some of them will be very sparse, with a very small number of reviews in them. |
I have an idea how to improve accuracy even further by introducing another matrix, S-matrix, but we need to test this new R-matrix first, to see if it works. At the very least it should be faster since it only needs to be filled once, unlike the previous version which had to be recalculated during training. |
I implement the simple version based on 4.0.0. Could you help me test it? |
I'm updating the code. You can replace df['lapse'] = df['r_history'].str.count('1') with df['lapse'] = df['r_history'].astype(str).str.count('1') to solve it. |
Maybe you can switch to another deck? |
I mean, yes, I can, but it's still a problem that has to be addressed. Ok, I will test it on other decks. |
@user1823 you should try the new version too: #271 (comment) |
For my collection RMSE went down only by about 13% compared to 4.0.0 beta. from 0.0589 to 0.0515; nowhere near as much as I was expecting. |
I fixed this problem in the main branch and r-matrix-t-n-g.ipynb. Just remove this line of code: df.drop(df[df['create_date'].dt.year < 2006].index, inplace=True) |
I tested this version, it performs better than v4 by around 9-10%, and it's statistically significant. I bet it can be improved with better grouping (see my comment above). I could describe how to implement the S-matrix, my new idea, but considering that Sherlock already made up his mind about releasing v4 as is, I don't see the point of working on the S-matrix right now. I think we should do some final tests of this version of R-matrix, with better grouping, and leave it until Sherlock decides to start working on v5. Of course, if Sherlock decides to postpone the release of v4 to implement matrices, I would have nothing against it; quite the opposite, if I were in his shoes, I wouldn't release v4 before exhaustively trying everything related to matrices. But that would also require very major changes to the code of the helper add-on. And if it's true that FSRS will soon be supported on mobile, then this will increase the amount of headache tenfold or even make using matrices outright impossible, meaning that the mobile will need its own "FSRS Mobile" version with no matrices. |
According my recent experiments, FSRS has reached the level of SM-15: https://github.com/open-spaced-repetition/fsrs-vs-sm15/blob/main/evaluate.ipynb And as we known, SM-15 employs many matrixes to fit the memory of user. FSRS v4 only uses 17 parameters for that. I think the benefit of matrixes couldn't compensate the cost to implement them. If you no longer insist on implementing matrices in FSRS, I will close this issue. |
We still haven't properly implemented them though. For example, the last attempt had a major flaw with grouping reviews. That being said, if FSRS is already more accurate than SM algorithms, we can put off the matrices and work on them in the future. |
While I would like to try a proper implementation of the R-matrix*, even if it performs well, I still don't know how to solve the problem with non-monotonicity of average R (predicted + R-matrix entry), and even if we somehow solve that, the matrix will be difficult to implement into Anki. So I suppose we can close this issue. *with improved grouping. Last time cards with delta_t different by 1 day, like delta_t=365 and delta_t=365, were being grouped into distinct groups for no good reason. |
Which module is related to your feature request?
Scheduler, Optimizer
Is your feature request related to a problem? Please describe.
It's a way to correct out theroretical function using real data to obtain more accurate predictions in cases where the theoretical function makes poor predictions.
Describe the solution you'd like
Recall Matrix.docx
It's a pretty long description, but that's because it's a very sophisticated feature. However, it oculd potentially have a greater impact on the accuracy than any other change so far.
Additional context
While I tried to make the description as detailed and clear as possible, it's possible that the final implementation will be different. Still, this is about as good of a description as I can make. It's fine if this isn't your top priority right now, but I highly recommend implementing it in the foreseeable future.
The text was updated successfully, but these errors were encountered: