-
-
Notifications
You must be signed in to change notification settings - Fork 121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Beta Testing] FSRS4Anki 4.0.0 Scheduler, Optimizer and Helper. #348
[Beta Testing] FSRS4Anki 4.0.0 Scheduler, Optimizer and Helper. #348
Comments
I suggest capping requested R between 0.75 and 0.97 in the scheduler code. This is needed to prevent the user from choosing insanely high or low values. |
I agree. But the optimizer sometime suggests R=0.7. For some users with bad memory, they would not achieve 0.75. Besides, the scheduler would still store the input R. And the helper add-on should also cap it when getting the parameters from the custom scheduling code. |
Sherlock, I recommend you to run the new optimizer on all collections submitted to you via the Google form and average (weighted by the number of reviews) the optimal parameters. This would achieve 2 things:
|
Also, it would be great if you implemented what I suggested here. |
As @user1823 suggested, the helper-add on should be compatible with both v3 and v4. In other words, it should be backward-compatible, to ensure that if someone is still using v3 their add-on won't break. Is this implemented? |
Of course. You can test it with v3 and v4 both. |
Do you install the new helper add-on? |
Could you share the related deck file with me? |
I find that your cards' cids are weird. They are too small. Normally, the cid is the timestamp of the create date. In the 4.0.0, the optimizer will filter out all cards whose cid is early than 2006 (the year the first Anki version was released). |
I don't know, it's a pre-made deck, I didn't make these cards. I guess whoever made them somehow messed it up, and the creation date is now the earliest date in Unix time. |
OK, I will remove the filter in the next patch. |
I have same question. But all cards were made by myself. I sure that all cards were created in 2023.
|
Your deck only has 400+ reviews, which are not enough to optimize the model. I will add some information about that instead of error raised by the code. |
Speaking of which, that's one of the reasons why I suggested this. |
I got confused by that too, it's definitely unnecessary. |
@L-M-Sherlock for some reason when I review new cards, the "Easy" interval is much higher than the corresponding S0 parameter. All decks have this problem. |
I think it's necessary to use a relatively low R in incremental writing; after all, it takes a relatively full dose of forgetting to allow for more creativity. So there's no need to put a limit on the R entered by the user. |
What's your requested retention? And could you share the full configuration with me? |
Done in #350 |
No, I agree with user1823 here. You can exclude outliers from training, but they are still there, and users will still encounter them in practice. We need to ensure that whatever we are doing makes the results better (on average) for the whole dataset rather than just for the part of the dataset that has no outliers. If some method makes results better when there sre no outliers, but makes results worse overall when outliers are present, then it's not a good method. Remember, you can't remove them during the actual execution of FSRS in Anki. |
Btw, Sherlock, I noticed that you included MAE in the stats. I thought you were against using MAE in favor of RMSE? |
If we consider the outliers in evaluation, the graph shown in #348 (comment) will cancel the more rational initial stability. But s=256.18 is ridiculous in that case. Could you explain it?
I am just curious about that metric. |
Perhaps that person is studying outside of Anki more than "inside". In other words, he sees this material very often and as a result he already knows the material really well. I've tested a different sigma for S0 on this person's collection, and the more reasonable values actually increased RMSE, suggesting that what we think is "reasonable" doesn't work for that person. |
I am not very sure what you mean by "cancel". But, even if the RMSE increases by including the outliers in evaluation, it makes sense because the model couldn't predict that those cards would be recalled after 400+ days just after one review. I agree that this is not a fault of the model, but the evaluation should still include all the reviews to be "fair". Also, if the evaluation data is not the same for both the optimizations, the RMSEs are not comparable. |
I mean the worse RMSE would not support the rational prediction.
I think it depends. If those outliers are not representative, it is safe to exclude them. And it is meaningless to evaluate the performance in those outliers.
IMO, the outlier filter is independent from optimization. We can apply the filter before optimization. |
In his data, most outliers have lengthy intervals, likely due to backlog. I guess the story is: Usually, users are inclined to assign high rating while handling backlog, as it aids in clearing the backlog faster. This inclination leads to inflated retention rates for reviews with long intervals. So filtering out these outliers (cheat reviews) is rational. And it is unnecessary to evaluate the model with these outliers. |
In #374, I calculate 30+ set of parameters from datasets collected in https://forms.gle/KaojsBbhMCytaA7h8. I use the median for the new default global parameters. It's time to release the stable version for FSRS v4. |
It is somewhat concerning that the choice of the initial set of parameters (6559dc3) has a noticeable impact (4.4%) on the RMSE even for @L-M-Sherlock's collection, which has a significant number of reviews. |
@L-M-Sherlock, don't you think that 15 days as the maximum limit of initial stability (when total_count < 1000) is too small? For comparison, the initial stability for Good in my collection is 15.84 days. I think that the maximum limit should be at least 40 days, though 60 days is also fine. |
I haven't seen that change. |
I'm afraid that 60 days would scare new users. |
How about 30? |
OK. I will update it in the next patch. |
Maybe it is due the sensitivity of RMSE? As I mentioned in #342 (comment), the RMSE for all last ratings decreased. But the total RMSE increased. |
But, my point is that the final parameters (and thus, the RMSE) should be the same regardless of what initial parameters are used if the data is sufficient. |
It indicates there is an unique set of parameters with the global minimum point of loss. But it is likely to be groups of parameters with a flatlands of minimum loss. |
Do you mean that there might be multiple sets of parameters that achieve the minimum possible values of the loss function? If so, then it really doesn't seem that way to me. In neural networks, yes, multiple equivalent minima are possible. But in neural networks, all neurons are identical, it's not like each neuron has its own special feature. In FSRS, each formula is different, and one parameter cannot replace the other. For example, if you switched w[9] and w[16] in the code below:
the result would be nonsensical. So from a theoretical point of view, it seems that in FSRS there can only be one global minima, in other words, there can only be one specific set of parameters that achieves the minimum possible value of the loss function, there cannot be multiple "equivalent" sets. |
But FSRS is not a linear model. There are many non-linear components in FSRS. So it is possible that the optimization has several minimum. If it is a convex optimization, I figure out a solution. We can increase the batch size epoch by epoch. When the batch size is equal to the size of training set, the gradient of loss function will point to the global minimum. |
That's interesting, I would like to see it and test. |
So Sherlock, what are you working on right now? |
I'm on a business trip, so not available in this week. |
By the way, I asked my schoolmate major in Computational Mathematics whether the optimizer could reach the global minimum in FSRS. He mentioned that the optimization for multivariable function (in FSRS) is very hard to analysis. For example, the saddle point is the point where the gradient is zero. But it is not a local/global minimum. Not any optimization algorithm could ensure that the optimizer could find out the global minimum in FSRS. |
@L-M-Sherlock I think in the current version of the optimizer it's possible that a value for "Good" will be larger than for "Easy" if "Good" has more datapoints. |
The text was updated successfully, but these errors were encountered: