Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Beta Testing] FSRS4Anki 4.0.0 Scheduler, Optimizer and Helper. #348

Closed
L-M-Sherlock opened this issue Jul 13, 2023 · 90 comments · Fixed by #349, open-spaced-repetition/fsrs4anki-helper#165, #353, #357 or #358
Labels
community Community building

Comments

@L-M-Sherlock
Copy link
Member

L-M-Sherlock commented Jul 13, 2023

@Expertium
Copy link
Collaborator

I suggest capping requested R between 0.75 and 0.97 in the scheduler code. This is needed to prevent the user from choosing insanely high or low values.

@L-M-Sherlock
Copy link
Member Author

This is needed to prevent the user from choosing insanely high or low values.

I agree. But the optimizer sometime suggests R=0.7. For some users with bad memory, they would not achieve 0.75.

Besides, the scheduler would still store the input R. And the helper add-on should also cap it when getting the parameters from the custom scheduling code.

@Expertium
Copy link
Collaborator

Sherlock, I recommend you to run the new optimizer on all collections submitted to you via the Google form and average (weighted by the number of reviews) the optimal parameters. This would achieve 2 things:

  1. A better set of initial parameters could improve convergence during optimization
  2. If the user doesn't have enough data, they will just get average parameters. It should be good enough at first. I suggest introducing a hard limit - 1000 reviews, excluding same-day reviews. If user has more reviews than that, run the optimizer. Otherwise print a warning, like "Insufficient data", and print average parameters.

@Expertium
Copy link
Collaborator

Also, it would be great if you implemented what I suggested here.

@Expertium
Copy link
Collaborator

As @user1823 suggested, the helper-add on should be compatible with both v3 and v4. In other words, it should be backward-compatible, to ensure that if someone is still using v3 their add-on won't break. Is this implemented?

@L-M-Sherlock
Copy link
Member Author

As @user1823 suggested, the helper-add on should be compatible with both v3 and v4. In other words, it should be backward-compatible, to ensure that if someone is still using v3 their add-on won't break. Is this implemented?

Of course. You can test it with v3 and v4 both.

@cjdduarte
Copy link

cjdduarte commented Jul 13, 2023

When I insert this new code "fsrs4anki_scheduler.js"

image

and when running the helper this message appears below

image

@Expertium
Copy link
Collaborator

image
I'm genuinely curious how I'm getting this error, considering that curve_fit should only run if there are more than 100 reviews for that grade.

@L-M-Sherlock
Copy link
Member Author

and when running the helper this message appears below

Do you install the new helper add-on?

@L-M-Sherlock
Copy link
Member Author

I'm genuinely curious how I'm getting this error, considering that curve_fit should only run if there are more than 100 reviews for that grade.

Could you share the related deck file with me?

@Expertium
Copy link
Collaborator

@L-M-Sherlock
Copy link
Member Author

https://drive.google.com/file/d/1WyYRkoZZLllbESz5eCXq7hxFaHAIfrAJ/view?usp=sharing

image

I find that your cards' cids are weird. They are too small. Normally, the cid is the timestamp of the create date. In the 4.0.0, the optimizer will filter out all cards whose cid is early than 2006 (the year the first Anki version was released).

@Expertium
Copy link
Collaborator

I don't know, it's a pre-made deck, I didn't make these cards. I guess whoever made them somehow messed it up, and the creation date is now the earliest date in Unix time.

@L-M-Sherlock
Copy link
Member Author

I don't know, it's a pre-made deck, I didn't make these cards. I guess whoever made them somehow messed it up, and the creation date is now the earliest date in Unix time.

OK, I will remove the filter in the next patch.

@perhydrol
Copy link

I have same question. But all cards were made by myself. I sure that all cards were created in 2023.
My deck:
LinearAlgebra.zip

ValueError                                Traceback (most recent call last)
[c:\Users\telly\Documents\project\fsrs4anki\fsrs4anki\fsrs4anki_optimizer.ipynb](file:///C:/Users/telly/Documents/project/fsrs4anki/fsrs4anki/fsrs4anki_optimizer.ipynb) 单元格 11 in 1
      [1](vscode-notebook-cell:/c%3A/Users/telly/Documents/project/fsrs4anki/fsrs4anki/fsrs4anki_optimizer.ipynb#X13sZmlsZQ%3D%3D?line=0) """
      [2](vscode-notebook-cell:/c%3A/Users/telly/Documents/project/fsrs4anki/fsrs4anki/fsrs4anki_optimizer.ipynb#X13sZmlsZQ%3D%3D?line=1) w[0]: initial_stability_for_again_answer
      [3](vscode-notebook-cell:/c%3A/Users/telly/Documents/project/fsrs4anki/fsrs4anki/fsrs4anki_optimizer.ipynb#X13sZmlsZQ%3D%3D?line=2) w[1]: initial_stability_step_per_rating
   (...)
     [16](vscode-notebook-cell:/c%3A/Users/telly/Documents/project/fsrs4anki/fsrs4anki/fsrs4anki_optimizer.ipynb#X13sZmlsZQ%3D%3D?line=15) https://github.com/open-spaced-repetition/fsrs4anki/wiki/Free-Spaced-Repetition-Scheduler
     [17](vscode-notebook-cell:/c%3A/Users/telly/Documents/project/fsrs4anki/fsrs4anki/fsrs4anki_optimizer.ipynb#X13sZmlsZQ%3D%3D?line=16) """
     [18](vscode-notebook-cell:/c%3A/Users/telly/Documents/project/fsrs4anki/fsrs4anki/fsrs4anki_optimizer.ipynb#X13sZmlsZQ%3D%3D?line=17) optimizer.define_model()
---> [19](vscode-notebook-cell:/c%3A/Users/telly/Documents/project/fsrs4anki/fsrs4anki/fsrs4anki_optimizer.ipynb#X13sZmlsZQ%3D%3D?line=18) optimizer.pretrain()
     [20](vscode-notebook-cell:/c%3A/Users/telly/Documents/project/fsrs4anki/fsrs4anki/fsrs4anki_optimizer.ipynb#X13sZmlsZQ%3D%3D?line=19) print(optimizer.init_w)

File [c:\Users\telly\Documents\project\fsrs4anki\fsrs4anki\fsrs4anki_optimizer.py:518](file:///C:/Users/telly/Documents/project/fsrs4anki/fsrs4anki/fsrs4anki_optimizer.py:518), in Optimizer.pretrain(self)
    515 def S0_rating_curve(rating, a, b, c):
    516     return np.exp(a + b * rating) + c
--> 518 params, covs = curve_fit(S0_rating_curve, list(rating_stability.keys()), list(rating_stability.values()), sigma=1/np.sqrt(list(rating_count.values())), method='dogbox', bounds=((-15, 0.03, 0), (15, 7, 30)))
    519 print('Weighted fit parameters:', params)
    520 predict_stability = S0_rating_curve(np.array(list(rating_stability.keys())), *params)

File [c:\Users\telly\Documents\project\fsrs4anki\.conda\lib\site-packages\scipy\optimize\_minpack_py.py:820](file:///C:/Users/telly/Documents/project/fsrs4anki/.conda/lib/site-packages/scipy/optimize/_minpack_py.py:820), in curve_fit(f, xdata, ydata, p0, sigma, absolute_sigma, check_finite, bounds, method, jac, full_output, **kwargs)
    817         xdata = np.asarray(xdata, float)
    819 if ydata.size == 0:
--> 820     raise ValueError("`ydata` must not be empty!")
    822 # Determine type of sigma
    823 if sigma is not None:

ValueError: `ydata` must not be empty!

图片

https://drive.google.com/file/d/1WyYRkoZZLllbESz5eCXq7hxFaHAIfrAJ/view?usp=sharing

image

I find that your cards' cids are weird. They are too small. Normally, the cid is the timestamp of the create date. In the 4.0.0, the optimizer will filter out all cards whose cid is early than 2006 (the year the first Anki version was released).我发现你的卡片的 cid 很奇怪。它们太小了。通常,cid 是创建日期的时间戳。在 4.0.0 中,优化器将过滤掉 cid 早于 2006 年(第一个 Anki 版本发布的那一年)的所有卡。

@L-M-Sherlock
Copy link
Member Author

I have same question. But all cards were made by myself. I sure that all cards were created in 2023.

Your deck only has 400+ reviews, which are not enough to optimize the model. I will add some information about that instead of error raised by the code.

@L-M-Sherlock L-M-Sherlock linked a pull request Jul 13, 2023 that will close this issue
@Expertium
Copy link
Collaborator

Your deck only has 400+ reviews, which are not enough to optimize the model. I will add some information about that instead of error raised by the code.

Speaking of which, that's one of the reasons why I suggested this.

@L-M-Sherlock
Copy link
Member Author

Your deck only has 400+ reviews, which are not enough to optimize the model. I will add some information about that instead of error raised by the code.

Speaking of which, that's one of the reasons why I suggested this.

Yeah. And I have add some useful warning in #349:

image

@Expertium
Copy link
Collaborator

Expertium commented Jul 13, 2023

image
I think these features should be turned on by default, rather than off.
Also, remove "requires Fuzz" from the description

@user1823
Copy link
Collaborator

user1823 commented Jul 13, 2023

I have some feedback to share about the optimizer. (mostly cosmetic changes)

  1. Hide the results of pre-training by default but allow them to become visible if the user desires.
  2. Don't print the init_w after pre-train. The user can wrongly assume that it is the optimized set of parameters.
  3. The S0_rating_curve shouldn't execute at all if the initial stability for all the 4 ratings was calculated accurately (i.e. the data was adequate for all the 4 ratings).
  4. Shift the pre-training to section 2.2. It is more logical to place it there.
  5. Remove the following. They are of no use to the user.
    image
    image
  6. Hide the intermediate values of the parameters during the training process. Again, you can allow them to become visible if the user desires. Do the same thing for the graphs in the same section.

@Expertium
Copy link
Collaborator

2. Don't print the init_w after pre-train. The user can wrongly assume that it is the optimized set of parameters.

I got confused by that too, it's definitely unnecessary.

@Expertium
Copy link
Collaborator

@L-M-Sherlock for some reason when I review new cards, the "Easy" interval is much higher than the corresponding S0 parameter. All decks have this problem.
image

@Diplproveably
Copy link

This is needed to prevent the user from choosing insanely high or low values.

I agree. But the optimizer sometime suggests R=0.7. For some users with bad memory, they would not achieve 0.75

Besides, the scheduler would still store the input R. And the helper add-on should also cap it when getting the parameters from the custom scheduling code.

I think it's necessary to use a relatively low R in incremental writing; after all, it takes a relatively full dose of forgetting to allow for more creativity. So there's no need to put a limit on the R entered by the user.

@L-M-Sherlock
Copy link
Member Author

@L-M-Sherlock for some reason when I review new cards, the "Easy" interval is much higher than the corresponding S0 parameter. All decks have this problem.

What's your requested retention? And could you share the full configuration with me?

@L-M-Sherlock
Copy link
Member Author

I have some feedback to share about the optimizer. (mostly cosmetic changes)

Done in #350

@Expertium
Copy link
Collaborator

I don't think outliers should be included in the calculation of evaluation metrics.

No, I agree with user1823 here. You can exclude outliers from training, but they are still there, and users will still encounter them in practice. We need to ensure that whatever we are doing makes the results better (on average) for the whole dataset rather than just for the part of the dataset that has no outliers. If some method makes results better when there sre no outliers, but makes results worse overall when outliers are present, then it's not a good method. Remember, you can't remove them during the actual execution of FSRS in Anki.

@Expertium
Copy link
Collaborator

Expertium commented Jul 20, 2023

Btw, Sherlock, I noticed that you included MAE in the stats. I thought you were against using MAE in favor of RMSE?

@L-M-Sherlock
Copy link
Member Author

No, I agree with user1823 here.

If we consider the outliers in evaluation, the graph shown in #348 (comment) will cancel the more rational initial stability. But s=256.18 is ridiculous in that case. Could you explain it?

Btw, Sherlock, I noticed that you included MAE in the stats. I thought you were against using MAE in favor of RMSE?

I am just curious about that metric.

@Expertium
Copy link
Collaborator

Could you explain it?

Perhaps that person is studying outside of Anki more than "inside". In other words, he sees this material very often and as a result he already knows the material really well. I've tested a different sigma for S0 on this person's collection, and the more reasonable values actually increased RMSE, suggesting that what we think is "reasonable" doesn't work for that person.
image

@user1823
Copy link
Collaborator

user1823 commented Jul 20, 2023

If we consider the outliers in evaluation, the graph shown in #348 (comment) will cancel the more rational initial stability. But s=256.18 is ridiculous in that case. Could you explain it?

I am not very sure what you mean by "cancel". But, even if the RMSE increases by including the outliers in evaluation, it makes sense because the model couldn't predict that those cards would be recalled after 400+ days just after one review. I agree that this is not a fault of the model, but the evaluation should still include all the reviews to be "fair".

Also, if the evaluation data is not the same for both the optimizations, the RMSEs are not comparable.

@L-M-Sherlock
Copy link
Member Author

I am not very sure what you mean by "cancel".

I mean the worse RMSE would not support the rational prediction.

I agree that this is not a fault of the model, but the evaluation should still include all the reviews to be "fair".

I think it depends. If those outliers are not representative, it is safe to exclude them. And it is meaningless to evaluate the performance in those outliers.

Also, if the evaluation data is not the same for both the optimizations, the RMSEs are not comparable.

IMO, the outlier filter is independent from optimization. We can apply the filter before optimization.

@L-M-Sherlock
Copy link
Member Author

L-M-Sherlock commented Jul 20, 2023

Perhaps that person is studying outside of Anki more than "inside". In other words, he sees this material very often and as a result he already knows the material really well. I've tested a different sigma for S0 on this person's collection, and the more reasonable values actually increased RMSE, suggesting that what we think is "reasonable" doesn't work for that person.

In his data, most outliers have lengthy intervals, likely due to backlog. I guess the story is: Usually, users are inclined to assign high rating while handling backlog, as it aids in clearing the backlog faster. This inclination leads to inflated retention rates for reviews with long intervals. So filtering out these outliers (cheat reviews) is rational. And it is unnecessary to evaluate the model with these outliers.

@L-M-Sherlock L-M-Sherlock removed a link to a pull request Jul 20, 2023
@L-M-Sherlock L-M-Sherlock linked a pull request Jul 20, 2023 that will close this issue
@L-M-Sherlock
Copy link
Member Author

In #374, I calculate 30+ set of parameters from datasets collected in https://forms.gle/KaojsBbhMCytaA7h8. I use the median for the new default global parameters. It's time to release the stable version for FSRS v4.

@user1823
Copy link
Collaborator

It is somewhat concerning that the choice of the initial set of parameters (6559dc3) has a noticeable impact (4.4%) on the RMSE even for @L-M-Sherlock's collection, which has a significant number of reviews.

image

@user1823
Copy link
Collaborator

@L-M-Sherlock, don't you think that 15 days as the maximum limit of initial stability (when total_count < 1000) is too small? For comparison, the initial stability for Good in my collection is 15.84 days. I think that the maximum limit should be at least 40 days, though 60 days is also fine.
image
image

@Expertium
Copy link
Collaborator

Expertium commented Jul 21, 2023

I haven't seen that change.
I agree with user1823, 15 days is way too small. I suggest 40.

@L-M-Sherlock
Copy link
Member Author

I'm afraid that 60 days would scare new users.

@Expertium
Copy link
Collaborator

How about 30?

@L-M-Sherlock
Copy link
Member Author

OK. I will update it in the next patch.

@L-M-Sherlock
Copy link
Member Author

It is somewhat concerning that the choice of the initial set of parameters (6559dc3) has a noticeable impact (4.4%) on the RMSE even for @L-M-Sherlock's collection, which has a significant number of reviews.

Maybe it is due the sensitivity of RMSE? As I mentioned in #342 (comment), the RMSE for all last ratings decreased. But the total RMSE increased.

@user1823
Copy link
Collaborator

Maybe it is due the sensitivity of RMSE? As I mentioned in #342 (comment), the RMSE for all last ratings decreased. But the total RMSE increased.

But, my point is that the final parameters (and thus, the RMSE) should be the same regardless of what initial parameters are used if the data is sufficient.

@user1823
Copy link
Collaborator

user1823 commented Jul 21, 2023

Also, the latest optimizer filters out my data for first rating = 3 and delta_t = 4 (and above). But, the initial stability for Good for my collection is more than 10 days (exact value depends on which version of the optimizer I use). This means that the optimizer is not using my newer data for training (unless the count with high delta_t eventually becomes so large that it fits into the threshold).

You might say that there is not much data for delta_t > 10 in the following image. This is because of two reasons:

  1. My request retention is 0.94. So, stability of 10 days means an interval of 5.74 days.
  2. This data is from when I was using FSRS v3. There, initial stability for Good was likely somewhat smaller.
image

As I said before, the reason that a large number of my cards (3944) have the first interval = 1d (with first rating = 3) is that 1d was my second learning step when I was using the default Anki algorithm. But, this should not be a factor to decide how well the optimizer works for me.

@L-M-Sherlock
Copy link
Member Author

But, my point is that the final parameters (and thus, the RMSE) should be the same regardless of what initial parameters are used if the data is sufficient.

It indicates there is an unique set of parameters with the global minimum point of loss. But it is likely to be groups of parameters with a flatlands of minimum loss.

@Expertium
Copy link
Collaborator

Expertium commented Jul 21, 2023

Do you mean that there might be multiple sets of parameters that achieve the minimum possible values of the loss function? If so, then it really doesn't seem that way to me. In neural networks, yes, multiple equivalent minima are possible. But in neural networks, all neurons are identical, it's not like each neuron has its own special feature. In FSRS, each formula is different, and one parameter cannot replace the other. For example, if you switched w[9] and w[16] in the code below:

        hard_penalty = torch.where(rating == 2, self.w[15], 1)
        easy_bonus = torch.where(rating == 4, self.w[16], 1)
        new_s = state[:,0] * (1 + torch.exp(self.w[8]) *
                        (11 - new_d) *
                        torch.pow(state[:,0], -self.w[9]) *
                        (torch.exp((1 - r) * self.w[10]) - 1) * 
                        hard_penalty * 
                        easy_bonus)

the result would be nonsensical. So from a theoretical point of view, it seems that in FSRS there can only be one global minima, in other words, there can only be one specific set of parameters that achieves the minimum possible value of the loss function, there cannot be multiple "equivalent" sets.

@L-M-Sherlock
Copy link
Member Author

L-M-Sherlock commented Jul 21, 2023

Do you mean that there might be multiple sets of parameters that achieve the minimum possible values of the loss function? If so, then it really doesn't seem that way to me. In neural networks, yes, multiple equivalent minima are possible. But in neural networks, all neurons are identical, it's not like each neuron has its own special feature. In FSRS, each formula is different, and one parameter cannot replace the other.

But FSRS is not a linear model. There are many non-linear components in FSRS. So it is possible that the optimization has several minimum.

If it is a convex optimization, I figure out a solution. We can increase the batch size epoch by epoch. When the batch size is equal to the size of training set, the gradient of loss function will point to the global minimum.

Don't Decay the Learning Rate, Increase the Batch Size

@Expertium
Copy link
Collaborator

We can increase the batch size epoch by epoch. When the batch size is equal to the size of training set, the gradient of loss function will point to the global minimum.

That's interesting, I would like to see it and test.

@Expertium
Copy link
Collaborator

So Sherlock, what are you working on right now?

@L-M-Sherlock
Copy link
Member Author

I'm on a business trip, so not available in this week.

@L-M-Sherlock
Copy link
Member Author

By the way, I asked my schoolmate major in Computational Mathematics whether the optimizer could reach the global minimum in FSRS. He mentioned that the optimization for multivariable function (in FSRS) is very hard to analysis. For example, the saddle point is the point where the gradient is zero. But it is not a local/global minimum. Not any optimization algorithm could ensure that the optimizer could find out the global minimum in FSRS.

@Expertium
Copy link
Collaborator

@L-M-Sherlock I think in the current version of the optimizer it's possible that a value for "Good" will be larger than for "Easy" if "Good" has more datapoints.
params, _ = curve_fit(power_forgetting_curve, delta_t, recall, sigma=1/np.sqrt(count), bounds=((0.1), (30 if total_count < 1000 else 365)))
You should probably add some kind of extra cap to ensure that S0 for "Good" cannot be greater than S0 for "Easy" even if total_count is greater than 1000 for "Good" and less than 1000 for "Easy".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment