-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Request for comment: Ebisu v3 API #58
Comments
(Placeholder for edits, to-do lists, etc.) |
Is this still being worked on? I'm putting Ebisu in my project, and I'd like to chip in. |
@BieniekAlexander thanks for the comment! Yes haha, I am doing a lot of work on Ebisu these days and should have posted an update here sooner. The proposed model above (Gamma on half-life with boost) is good. It produces good predictions for recall probability on real data, and I've written detailed tests and a lot of the README for it (they're in the v3 branch where you can read the README), but I felt it was too complicated in a few places which made me revisit the literature from Mozer et al (their 2009 paper on MCM and their 2014 paper on DASH). Their emphasis on power law decay instead of exponential was inspiring so I designed a totally different proposal for v3 😅, based on Mozer's team's work called "leaky integrators". I've been pretty happy with it, I've mostly completed the README and the code in this branch: https://github.com/fasiha/ebisu/tree/v3-leaky-integrators#ebisu-intelligent-quiz-scheduling I'm going to resist trying to describe it here, but here's a direct link to the data model section of the README that I think is quite concise. It's simpler than the Gamma-on-halflife proposal above, it's faster, on average it seems to produce just as good predictions of quiz results as the Gamma-on-halflife, plus it doesn't suffer from #52. Writing the details of a model's math and code like this (in Knuth's "literate programming" style) is powerful because it makes you confront the rough edges, the infelicities in your design and implementation choices 🫠. In the last couple of days I've been thinking of how to potentially simplify this leaky integrator proposal even further. I know the risk of the never-finished project so I'm weighing the value between continuing doing research on potential improvements/alternatives versus just publishing what I have. |
I'm excited to see a new path to memory with SRS. It's awesome to model the law of memory with math. I'm asking this here because it is a small question and related to the roadmap (and vision) of Ebisu v3. I really hope there will be many good options of scheduler in Anki. Anki users will also provide effective feedback and even participate in the contribution. |
Thanks for writing @jyboy! Hmm, for sure I can help port Ebisu to the Anki custom scheduler, once v3 is released. I see that Anki is still attached to the concept of a "scheduled review" ( For the visually impaired, here's the conversation in the screenshot:
Honestly, I agree with this—I really doubt flashcard scheduling is the reason users quit Anki/Memrise or stay with Duolingo, it's really all about the content, presentation, and quiz interactivity. There have been some correlative studies that note students' test scores rise as a result of practicing with a certain SRS (MaiMemo example/paper, DASH example, Duolingo HLR paper, etc.) but I'd love to see third-party replications, especially with something more meaningful than semester exam scores? But until then I'm confident that SRS is just a secondary or tertiary effect on motivation and performance. But I am confused by comparisons of probabilistic algorithms like Ebisu or half-life regression or DASH to Anki SM-2—their APIs are so different. Specifically, consider that Anki's scheduler outputs just a single value, the future due date. "This fact needs to be reviewed in 14.1 days." When you inevitably over-review (review a flashcard sooner than it's due date) or under-review (skip reviewing on that due date), there are finely-tuned rules to decide what to happen—rules that take time to develop and test and refine. Meanwhile, consider that Ebisu's output is a function, a mapping from time to recall probability. In crude terms, rather than a single number ("14.1 days"), Ebisu outputs a very long array of Having a model is why Ebisu is able to seamlessly handle more complex quiz types as inputs:
The only input Anki SM-2 has is fail/hard/easy/pass, but I am sure the Anki/Supermemo communities can and likely have figured out rules to adapt their algorithms to a similarly rich array of quizzes needed by reading apps and correlated cloze deletions—those communities are energetic and brilliant and motivated. However, I don't think it's controversial to say that tuning SM-2 rules for the above types of quizzes takes time and effort. Ebisu and similar probabilistic algorithms, by virtue of leveraging statistical models, can often make it a lot easier to quickly innovate new quiz formats/content structures/learning styles, by offloading the bookkeeping of memory tracking. Now. It's entirely plausible, like the commenter says, that carefully-tuned SM2-like rules will eventually outperform probabilistic models, but hopefully in the interval, libraries like Ebisu allow quiz app authors to explore a huge chunk of the space of all possible quiz apps. As quiz app authors ask, "how can I incorporate sound recordings for pronunciation practice in my SRS", having a statistical model in place and ready to tap the large library of applied math tools will give some of those quiz app authors a head start. Aaaaaall that said… please don't be too impressed with math 😆, math can lead you very astray even when it's not wrong, because it's so often the case that simple ad hoc methods (like SM-2!) wind up being more effective than complicated mathematical algorithms! Conor McBride recently described this quite graphically but accurately on Mastodon:
🤣 gross but true! I have worried that Ebisu will fall into this bucket since day one, and it continues to be a fear, but so far I've been really pleased with the kinds of apps Ebisu is helping folks build 😁. Sorry for the long boring post, hopefully it was more helpful than confusing. |
Quick update on Ebisu v3: this is coming along nicely! It's a much simpler model than the Gammas-on-boost-and-halflife described above but just as capable! I will probably close this issue and either open a new request for comment (RFC) issue or, if I can wrap up the README and tests quickly, just publish a release candidate (RC) soon—hopefully in a week or two. |
@fasiha I'm only now trying to catch up on algorithms and models being used for recall training - the reason that I moved my project's implementation from sm2_anki to ebisu is because sm2 doesn't seem to have a clean answer to pull-based fact studying. Now that I'm using Ebisu for my project, the outstanding concern that I have is Ebisu v2's propensity to converge to a specific half-life (rather than discounting the influence of earlier study sessions over time) - I'm excited to see if v3 has an answer to this problem! |
@BieniekAlexander do you have an example of this? It might help redirect me but here's what I tentatively think.
In this RFC I did propose a I'll think about how to maybe build a preference for newer data into the algorithm. It's not clear to me how to do this right now, so I'm happy to get advice. I know this is probably not what you wanted to hear but I hope this makes sense at least? |
Actually, I may very well be wrong about that assertion that I made. Let me cite some things that led me to believe my claim. I don't have an extremely strong stats background, so feel free to question my points. To restate the claim: I have the impression that as the user studies more, the beta prior will converge on some ideal half-life, which would also mean that a fact's half-life will become harder and harder to update as it is studied more and more. Here are my references;
In summary, when I talk about the model converging to some sort of half-life, I think that's a corollary to your statement here that "Ebisu v2 assumed that your memory of a flashcard was a fixed but unknown probability" (though maybe I'm mincing words). Anyway, you mention a |
Thanks for the detailed reply @BieniekAlexander! The baseball analogy is useful! The classic Bayesian intuition for the Beta distribution is that it's indeed a model of a weighted coin whose underlying probability of coming up heads you don't know. As you flip the coin, α and β represent the number of times it comes up heads or tails respectively—Ebisu v2 treats "heads" as "quiz success", and adds the extra twist of exponential decay. If you initialize a Ebisu v2 model with initial half-life of one week and do quizzes every week, the (α, β) that Ebisu estimates are exactly this, and will "converge" to the odds that you get the quiz right after a one week delay. Of course, since your memory is actually strengthening every week, the model will never converge 😅 and Ebisu v2 won't really settle down (and that's without adding the complexity of exponential decay of memories and the fact that you do these quizzes at arbitrary intervals, not just a week). To your second point, I totally agree Any kind of temporal deweighting has to be done carefully because:
All that said, I think v3, which tries to capture the fact that your memory of a fact improves each time you see it, will not get locked into a stale half-life. It remains to be seen whether v3's model for memory improvement is good 😆. |
Yeah, I am incorrect and it won't converge because you're hopefully repeatedly telling the model that you're right, so it'll continue pushing forward the halflife. I suppose the thing I'm trying to caputre instead, then, (and what I had in mind when I used the term "converge") is that the rate at which the recall model updates updates in a manner that equally weights your studying in the past to your studying at the current time. It's been a minute, and I don't recall why I had this intuition, so what I should do is experiment with the model a bit and compare how the halflife actually changes against how I might expect it to change. I'll play around with it soon! |
@fasiha I threw together some visualizations, and I think my suspicions were correct. First, the plots: In this first example, the fact is updated after the previous halflife has elapsed. The term is guessed correctly some N times, incorrectly some N times, correctly some N times, etc. In this second example, the fact is updated 24 hours after the previous study session. The guess correctness is identical to that of the previous session. I didn't put too much thought into the generation of study timing and study success, but I don't think they're important. I also don't understand how the recall updating works, but I think I'm seeing the following property of the updates:
With this behavior, my concern is: If I struggle to recall a fact at some point in my study history such that halflife updates are minor, future halflife updates of the fact will always be small. I think it makes sense that the model does not update the recall as much when they have been failing to recall the term a lot. However, if for some reason I greatly improve my memory of a fact, my fact's halflife will still only increase very slowly. |
I've been following this project for quite a while now and I'm happy to see v3 finally happen. I will see if I can port it to rust and from there, compiling it as a wasm function so it can be used from any language implementing a wasm runtime. I guess the first step is generating test data to ensure it has the same behaviour as the original. @fasiha are there any non-obvious edge-cases I should be aware of that aren't checked for in the code? as in, "this function will return nonsensical results if param_a is below param_b". |
Thanks for the note @ckoshka! Just to make sure I understand your question, you are hoping to port v3 to Rust, is that right? If so, let me try and wrap up the behavior of the algorithm and generate some input/output test data to measure correctness against. (Version 2 has this, but I think you're more interested in porting v3 rather than v2?) The current v3 algorithm is at in the But I will persevere! Thank you for your kind note and your patience! |
v3, that's right. Though I could start with v2 since it might help me understand the changes in context. And ya, I think I know that feeling. Sometimes when I'm just experimenting and iterating on something I land on a combination of params that seem to just work for whatever reason via trial and error. No biggie, I can sit tight for the time being 😅 |
This RFC is superseded by #62 😁 see you all there! |
Ebisu v3 request for comment (RFC)
Introduction
This issue contains a proposal for a future v3 release of Ebisu, with the goal of inviting feedback to improve the design before it is released.
V2 recap
A quick recap of the current version of Ebisu: v2 models each flashcard's probability of recall as a probability distribution that is valid at a certain time in the future. (Stats nerd note: when you first learn a card, its probability of recall
t
hours in the future is a Beta distribution with parametersa
andb
.) At any given time in the future (not justt
), you can callebisu.predictRecall
to get the estimated recall probability for this flashcard. Doing this for each flashcard lets you pick which flashcards have the lowest recall probability, and you present one of those to the learner. Then, you callebisu.updateRecall
with the quiz's result (binary pass/fail, binomial passes and fails, noisy pass/fail), which updates the Beta distribution in light of the quiz result.V3's goal
The main goal of v3 is to address the issue that various folks have raised over the years but that I finally understood thanks to @cyphar in #43 (which has links to the original Reddit thread): Ebisu v2 has a fundamental flaw in how it models your memory, in that it ignores the fact that, in the real world, the act of reviewing a flashcard actually changes the underlying strength of the memory. Instead, Ebisu v2 assumed that your memory of a flashcard was a fixed but unknown probability, and updated its belief about that probability's distribution after each quiz. This made it appear that it was strengthening or weakening the memory model but when we tried to infer the ideal initial halflife for real-world flashcard histories, Ebisu v2 insisted on absurdly high estimates of initial halflife.
In practice, this means that the actual recall probabilities predicted by Ebisu v2 were extremely pessimistic, and flashcard apps that attempted to schedule reviews based on a minimum acceptable recall probability had to set those thresholds unrealistically low. I didn't realize this was a problem because flashcard apps I wrote on top of Ebisu just used its predicted recall to rank flashcards, and ignored their actual values; and since the predicted recall went up or down as I passed or failed quizzes, I didn't notice that the halflife was growing very slowly. Again, thanks to @cyphar and others who patiently explained this to me repeatedly until I finally saw the problem 🙏!
Boost
Ebisu v3 is a total rewrite. It introduces the concept of a "boost" to Ebisu, which is very similar to Anki's "ease factor": each time you review a card, Ebisu will apply a Bayesian update based on the quiz's result, but then for each successful quiz it will boost the resulting probability distribution by a flashcard-specific scalar.
For example, suppose the halflife of a flashcard was a day, and you pass a quiz after two days, and suppose Ebisu v2 would simply update the halflife to four days. The Ebisu v3 model for this flashcard would have not only a probability distrubtion for the halflife (one day) but also for the boost, so if the mean boost value for this flashcard was 1.5, under Ebisu v3,
updateRecall
would yield a model with halflife4 * 1.5
or six days. (I'm being very hand-wavy here for purposes of illustration.)An important part of boosting is that it is applied only for successful quizzes after a big enough delay. This should make sense: imagine if you reviewed that flashcard with one day halflife after just ten minutes, and that Ebisu v2 would update its halflife to, say, 1.01 days. We wouldn't want Ebisu v3 to boost the halflife by 1.5 leaving us with a halflife of
1.01 * 1.5
! We'd want v3 to keep the halflife around 1.01, because it's unlikely that a quiz after ten minutes would significantly alter our neural memory of this fact.Ebisu v3 therefore has the following nominal psuedocode:
where
lerp([x1, x2], [y1, y2], x)
is linear interpolation ofx
along the line between points(x1, y1)
and(x2, y2)
. In words:LEFT
andRIGHT
are constants that you pick to control what counts as "short" vs "long" relative to the old halflife. In practice, I've been experimenting withLEFT=0.3
andRIGHT=1.0
and these work well.This RFC
With now two things to keep track of for each flashcard (its halflife and its boost), the
update
step is more complicated in v3 than v2: it is broken down into two functions that app authors will need to call, because of the computational burden. However, in v3, thepredict
step (predictRecall
) is a lot faster than in v2 and in fact can be done in SQL. Hopefully the more complex mathematics and API are worth it for more realistic estimates.In the remainder of this issue, I'll
My main goal is to get feedback on the second point above from app authors who use Ebisu. Does the proposed API feel good? Are there flaws in it? Are there other nice-to-haves that I haven't thought of? Do you want things renamed?
V3 statistical model
This section can be skipped by folks who don't care about the math. For those that do, this section will only be a sketch because I can't use LaTeX math in a GitHub issue. The details will be spelled out on the main Ebisu website when v3 is released, but I'm of course happy to answer questions about this now during the RFC phase as well.
Before getting to v3's statistical model, recall that v2's statistical model is, for each flashchard:
that is, a Beta random variable parameterized by
a
andb
.In v3, we instead go a level higher and place the prior probability distribution on the halflife directly:
We also place a probability distribution on boost:
That is, we have two Gamma-distributed random variables.
Predict
The predicted recall probability at time
t
hours in the future is a very simple arithmetic expression:Note that the predict recall step is really simple. To find the flashcard with the lowest predicted recall, find the flashchard with lowest
-t / halflife
, which can be done in SQL and cached and etc.Update step 1: update halflife
When we have a quiz result, after time
t
has elapsed since the last review, we can apply a Bayesian update on halflife:halflife ~ Gamma(a, b)
halflife | quiz after t hours ~ SomeComplicatedDistribution
(we have moments of this, not the exact density)halflife | quiz after t hours ~ Gamma(a2, b2)
So far this is the exact same architecture as Ebisu v2, except with prior on halflife instead of recall probability (and therefore different integrals).
However, at this stage we apply the boost:
mean posterior halflife = E[halflife | quiz after t hours]
,boostValue = clampLerp([LEFT * E[prior halflife], RIGHT * E[prior halflife]], [1, E[boost]], t)
, whereclampLerp
is the linear interpolation clamped so1 <= boostValue <= E[boost]
, before finallyboostValue * posterior ~ Gamma(a2, boostValue * b2)
That is, we've updated our probability distribution on the halflife in response to a quiz and applied a boost (a little bit of boost for small
t
, the full boost value for larget
).In this way, we can see that we're just updating our probability distribution around the initial halflife, before the first quiz, and after each successive quiz, we update our distribution about that initial halflife. The halflife after several quizzes is a random variable that's fully specified by the initial halflife random variable after it's been scaled by a sequence of these
clampLerp
ed boost values.However, this leaves open the question: how do we estimate boost? How do we update our probability distribution from
boost ~ Gamma(aBoost, bBoost)
?Update step 2: update halflife and boost
This section is the primary innovation of Ebisu v3 over v2, mathematically. After we have three or more quizzes (though you could do this after just two quizes), we can update our probability distributions about both (a) the initial halflife and (b) the boost.
To do this, we use two simple techniques in sequence.
First, curve fit. We evaluate the posterior
initial halflife, boost | several quizzes
for a grid of values in theinitial halflife
×boost
plane and curve fit this two-dimensional posterior surface to two independent Gamma random variables. This is readily doable because we assume each quiz is independent, so the overall posterior is simply the product of individual likelihoods and priors after each quiz.The curve fit is just weighted least-squares, i.e., a very fast solution to an overdetermined system of equations. It is essentially solving
A x = b
for a tall and skinny matrixA
, and where the unknownsx
are the four parameters of the two Gamma random variables (initial halflife and boost).However, the resulting curve fit often fails to properly capture the behavior of the posterior in its tails, so badly that prediction performance drops if we just use these two Gammas as our updated probability distributions.
Therefore, second, we use importance sampling, a Monte Carlo method to obtain moments of a probability distribution which we can analytically evaluate, given samples of some other distribution (the latter is called the "proposal distribution").
Specifically, we use importance sampling get accurate estimates for the moments of the bivariate posterior, given samples from the curve-fit bivariate Gamma distribution we estimated. With relatively few samples (on the order of ~1000), we can get good estimates of the posterior's moments and moment-match these to two independent Gammas, on initial halflife and boost respectively.
Therefore, this update step allows us to update our estimate of the boost factor in a data-driven way, unlike Anki's ease factor which is hardcoded. Given a sequence of quiz results, we can answer the questions, "What was the halflife of this fact when I began learning it? What boost factor has been strenghtening it each time I review it?"
We turn to the API secion next. Again, the above mathematics has been terse because I only have ASCII to describe it and I'm hoping to primary get feedback on the API, but I hope it is sufficiently detailed to give a flavor of the computational burdens involved.
Tutorial on importance sampling
Here's a super quick script to show you the value of importance sampling, plus how it's used.
Suppose you want to estimate the mean of the square of the unit uniform distribution, i.e., if
u ~ Unif(0, 1)
, what isE[u**2]
?, and you can't do calculus so you want to do Monte Carlo. Super-easy:You generate a million samples uniformly between 0 and 1, square them, and call
np.mean
to get your estimate of the mean. Then you can callnp.var
to get the variance, i.e., accuracy, of that estimate: the lower the variance, the more accurate your estimate, and the more efficient your sampling is (those million samples are doing a good job).This is not importance sampling, it's just normal Monte Carlo sampling.
Suppose though that you for some reason cannot sample the uniform distribution. Maybe you only have a library to generate Normally-distributed (Gaussian) random samples, so you have
randn
but norand
. Could you estimateE[u**2]
with justrandn
? Yes. Importance sampling lets you:Here we generate a million samples from the unit Normal distribution,
n ~ Normal(0, 1)
, and again square them, but before callingnp.mean
to get our estimate, we weight each sample by its importance. That's what the crucialuniformrv.pdf(n) / normrv.pdf(n)
factor there is doing: it's giving more weight to samples that are likely to have come from our target distribution (Unif(0,1)
) and less weight to samples that aren't.This will actually work: your
indirect
estimate ofE[u**2]
will be correct but (much) higher variance than when you could sample from the Uniform distribution directly.But we know the unit Normal distribution is an obviously bad proposal distribution for the unit Uniform distribution—over half the samples will just be thrown away, since negative samples get an importance factor of
0
. We can try one more experiment: let's useNormal(0.5, σ=0.5)
as our proposal. This should be a good deal more accurate than the unit Normal:Printing out the estimated mean and the variance (inaccuracy) of each estimate:
yields
As expected, the middle column all report the same estimate, 1/3. However, whereas the straightforward Monte Carlo estimator had a low variance of 0.09, which is the best you're going to get given you had access to
rand
, the crappy importance sampling estimator using justrandn
(the unit Normal) as its proposal had a much higher variance, almost 7× worse. But by usingNormal(0.5, σ=0.5)
as the proposal, we reduce the inaccuracy of the estimator, whose variance is just ~2.5× the direct estimator.So, of course ideally you'd be able to do the integral to find
E[u**2]
(zero estimator variance 🙌!), and failing that, hopefully you can userand
to draw uniform samples and calculate the estimate directly. But failing both, importance sampling gives you a way to convert samples from some rando distribution into moment estimates of the distribution you do care about.Hopefully the above convinces you that there's nothing magical about using it to improve our posterior fits. The curve-fit posterior is like the "better proposal" above. The "crappy proposal" for importance sampling might be the original priors on halflife and boost (except after a dozen quizzes, the variance of the estimates would be enormous without billions and trillions of samples).
V3 API
Model
Ebisu v2 had a very compact data model. Each flashcard was just a 3-tuple,
[float, float, float]
: two numbers representing the Beta random variable's parameters and the elapsed time at which that Beta applied. It didn't keep track of past quizzes because that 3-tuple was a "sufficient statistic": as far as the math was concerned, everything about the past was encoded in that 3-tuple.Because the Ebisu v3 algorithm described above needs to know about past quiz times and results to re-estimate boost, the Ebisu v3 data model is much bigger, and now includes past quizzes.
See Python `dataclass`es for Ebisu v3 model
While the key-value structure of the data model is probably an implementation detail and not relevant for the majority of readers of this RFC, for completeness I include it here, with the caveat that it may change.
The top-level model has three sections:
One section for a list of quiz results and times:
A second section for the parameters of the probability distributions:
And a final section for computed values that are useful for
predictRecall
(whether that's done in Python or in SQL or whatever):For
logStrength
andstartStrengths
, see discussion below on reinforcement strength.Initialize
You're expected to provide either
initHlPrior
(a 2-tuple[α, β]
of the Gamma random variable) or bothinitHlMean
andinitHlStd
representing the mean and the standard deviation of the Gamma random variable representining the initial halflife.Similarly for boost.
Predict
There are actually two functions here:
The official
predictRecall
returns2**(-t / (mean halflife))
, which is an approximation of the recall probability at timet
in the future. It's mathematically inaccurate but very fast and convenient to find the mean of the halflife and then put it through Ebbinghaus' exponential forgetting function. I expect this function to be portable to SQL, etc., because it's just a fast algebraic expression (addition, multiplication, division).For fun, I plan to include
_predictRecallBayesian
which computes the exact expected recall probability (like Ebisu v2 does). The leading_
means I don't expect this function to be in ports to other languages. This function computes an expensive Bessel function of the second kind, the analytical solution to the integral that yields the exact recall probability.Update halflife
I've decided to reuse the
updateRecall
name in Ebisu v3 for the "quick" update step that assumes the boost is fixed and just applies a quiz result:Note that like Ebisu v2, the above function supports noisy quizzes and binomial/binary quizzes. Also note two new v3-only parameters:
left
andright
, which control how much boost to apply. Nominally, ifelapsed < left * (current halflife)
, no boost is applied. Ifelapsed > right * (current halflife)
, the full boost is applied (whatever it may be, recall it has a probability distribution that you initialized or that was computed from data). For values ofelapsed
in between, the boost is scaled linearly.This function needs to be called whenever the student completes a quiz. But as alluded above, it only updates the halflife, and takes the boost as fixed.
Update halflife and boost
The "non-quick" update function will update its belief about both the halflife and the boost.
If we had unlimited computing power, we'd run
updateRecall
and then immediately runupdateRecallHistory
so we always had the most accurate estimates. However, thisupdateRecallHistory
is an expensive function, much moreso than the old v2updateRecall
, because it is looping over all quizzes. Specifically, as described in the math section above, it'ssize
argument).All this should be totally feasible on mobile/browser, but I do expect apps will opt to run this as a batch function, for example,
If you initialized the boost prior reasonably, there shouldn't be a pressing need to rerun this function after each quiz.
In the future, some smart person might enlighten me how we can make this estimation step better (or how we can change the entire model to have a better estimation step), but for now, I feel that this is a reasonable balance between model accuracy, prediction speed, and update complexity.
Reset halflife
Ebisu v2 supports manually rescaling the halflife (
rescaleHalflife
in v2) of a model, for those situations where you just know the model has over- or underestimated your memory. We'd like to support this in v3 as well.In Ebisu v3, since a flashcard's data model has all past quiz results, you may think that it'd be straightforward to just tweak the initial halflife's priors and re-run
updateRecallHistory
, but after you get several quizzes, the initial priors are often relatively powerless, with the posteriors almost entirely driven by the data (the quizzes). This is a well-known outcome in Bayesian statistics.Therefore, the proposed function is called "reset" because it will literally start a new chapter in your history with this flashcard with a new prior on halflife:
The model will still contain all past quizzes, but this function sets those aside and reinitializes a new list of elapsed times and quizzes, so all future quizzes get the initial halflife you ask for here.
This way, if after several quizzes, you need to rescale/reset the halflife of a flashcard, this lets you do that. No hard feelings.
That's it. That's the API.
Reinforcement strength
Throughout the discussion above, I've alluded to
startStrength
andlogStrength
andreinforcement
. This is to support partial reconsolidation or reinforcement, raised in #51 by @jasonsparc. The idea is, Ebbinghaus' exponential decay assumes that, after each review, your memory of some fact jumps to 100%: probability of recall2**(-t / halflife) = 1.0
whent=0
(right after you've reviewed this fact, probability 1 = 100% recall probability). @jasonsparc's idea is: what if this isn't true? What if you know that the quiz only partially reconsolidated the memory, i.e., that att=0.0001
(a millisecond after your last quiz), the probability of recall is not .9999 but something much less, maybe 0.5, maybe 0.1?Ebisu v3 tentatively supports this by allowing you to specify how much reconsolidation you think has happened every time you call
updateRecall
. This number is taken as deterministic and fixed, just like the quiz result or elapsed time, and it flows through the update process as if it was known. We'll want to experiment with this to make sure it's working. For users who aren't interested in playing with this feature, the API defaults to sane values.Conclusion
The code and tests are written. You can find them (along with a ton of irrelevant code I wrote while getting to this point) at https://github.com/fasiha/ebisu-likelihood-analysis/, specifically ebisu3.py and test_ebisu3.py.
I've prepared a script that can
collection.anki2
files (if you export your Anki deck to anapkg
file and unzip that, thiscollection.anki2
file, which is a SQLite database file, will be inside), andThe Stan model will likely mostly be useful for those interested in changing the model or putting hyperpriors on
left
andright
and other parameters, or verifying Ebisu v3's correctness (there's a difference in the output between these two that I'm working on tracking down).But mainly I'm hoping that interested parties provide feedback on the API above. And of course any other feedback (the mathematical model for example) is also most welcome.
Many thanks for your patience with v3. This is several months late and I am grateful for the support and feedback I've received so far 🙇.
The text was updated successfully, but these errors were encountered: