Why not always rebalance? #31

ttencate · 2020-06-13T14:22:43Z

Hi, it's me again :) If I understand correctly, the rebalancing step in updateRecall is only done if alpha and beta differ by more than a factor of 2 "to save a few computations". After rebalancing, alpha = beta (approximately).

But why not simply rebalance at the end of every updateRecall? There are several advantages:

Beta is now always equal to alpha, so we need to store only alpha. The model goes from 3 parameters to only 2.
Both of these now have an intuitive interpretation: t is our best guess of the halflife, alpha is the certainty that we are right.
It becomes possible to sort and compare cards by their t value to see how well they have been learned. This could be used e.g. to show a score or rating for each card.
The code becomes simpler. No more tback and rebalance arguments.
It could actually save computations to do this work up front. If we can assume a = b in predictRecall, then the coefficient gamma(a + b) / (gamma(a) * gamma(b)) simplifies to gamma(2a) / gamma(a)^2, saving one evaluation of the gamma function. And saving time in predictRecall is far more important than in updateRecall, because updateRecall is only called once per quiz, whereas predictRecall is typically called once per quiz for each card. (This mostly applies to the Java implementation; the Python implementation calls betaln from scipy directly, so it can't do this optimization. And scipy's betaln appears to be implemented in Fortran (!) so it's probably faster than trying to code around it.)
rescaleHalflife becomes so trivial that you might not even need to add it to the API.

(And while we're on the topic of "saving computations": I'm concerned about the betaln cache (Python) and gammaln cache (Java) which are keyed on floating point numbers. Due to the non-discrete nature of all these variables, these caches are probably not very effective at saving time, and they can grow without bounds causing a memory leak. If you think this is a valid concern, I can open a separate issue.)

The text was updated successfully, but these errors were encountered:

fasiha · 2020-06-13T20:02:29Z

Great questions! I'm impressed, you've carefully studied the math and the various implementations—I'm very grateful for your time and attention!

the rebalancing step in updateRecall is only done if alpha and beta differ by more than a factor of 2 "to save a few computations".

I'll make this more clear but no, we rebalance because when alpha is much bigger or smaller than beta, we're more likely to encounter numerical precision issues in gammaln when predicting & updating. I realize now though that I don't have any hard evidence for this, just a memory that this was an issue in the Ebisu 0.x version.

But why not simply rebalance at the end of every updateRecall? There are several advantages:

You're absolutely right, there are a lot of benefits to alpha=beta—Robert Kern suggested this in #5 too. We could totally do this—find the tback (equivalently, et or ε in the math) such that the posterior's mean is 0.5. The main disadvantage is that there's no closed-form expression for this. In the binary quiz case (the simplest case that Ebisu will support), we have to solve

Beta(a+x, b) / Beta(a, b) = 0.5

or equivalently

Gamma(a + x) / Gamma(a + b + x) = c

for positive real a, b, c, and x. As far as I know, there's no easy way to solve this, so we need a one-dimensional optimization like root_scalar in Scipy or Scijs' golden section minimizer. This in turn requires multiple (potentially tens?) of calls to the above two gammalns (for the case of binary quizzes) as we search for the halflife. There will be more complex expressions for noisy-binary or binomial cases.

As a proxy to the cost (only a proxy since calculating the halflife from the true posterior directly is different than doing so of the posterior's Beta fit), I print out the output of root_ scalar in modelToPercentileDecay and also call that in updateRecall. Here's the histogram of the number of function calls it took to find the halflife when I run Ebisu's test suite:

      1  function_calls: 17
      5  function_calls: 16
      7  function_calls: 7
     14  function_calls: 15
     68  function_calls: 14
     71  function_calls: 8
    213  function_calls: 9
    354  function_calls: 13
    493  function_calls: 10
   2006  function_calls: 12
   2795  function_calls: 11

where the first number is the number of times in the test suite we encountered that number of iterations (this is the output of uniq -c shell command). 7 to 16 iterations of the betaln (or pair of gammalns) is actually not that bad.

(Another concern, that is very minor because it has a straightforward workaround, is that you might not want to require all models to be balanced, perhaps there's some regime where you want to specifically prescribe the Beta prior? But this is readily addressed by using a dictionary for the model instead of a list of numbers, e.g., if the model was dict(a=a, t=t) then we can assume the missing b is equal to a. I knew we'd have to some day move from a list to a dictionary, it might as well be now 😆.)

It becomes possible to sort and compare cards by their t value to see how well they have been learned. This could be used e.g. to show a score or rating for each card.

I hadn't thought of this!, specifically, that if you didn't want to run predictRecall, but you had halflives for all your cards, you have a very easy alternative.

It could actually save computations to do this work up front

We could make predictRecall just two gammalns by precomputing gammaln(alpha + beta) - gammaln(alpha) and storing it in the model, especially if we make the model a dict instead of a list. We could do this now, even without switching to alpha=beta always-rebalance 🙃.

I'm concerned about the betaln cache (Python) and gammaln cache (Java) which are keyed on floating point numbers

So I try to only use the cached betaln when calculating the non-time-dependent part of predictRecall (same thing we talked about above, storing gammaln(alpha + beta) - gammaln(alpha) in the model), idea being, you might call predictRecall on a hundred models for your first quiz, and then when you find your second quiz, 99 models will have exactly the same _cachedBetaln(alpha, beta). Precomputing and explicitly storing in the model is a much more elegant solution though.

Sooo in summary. I'm looking at the table of function calls that the optimization needs to find the halflife via modelToPercentileDecay. Let me see if optimizing the true posterior's mean is much more time-consuming than optimizing its Beta fit, so we have all the facts we need to decide whether the cost of doing always rebalancing is worth the benefits.

Again, many thanks for your thoughtful questions and attention to detail, this will make the project better!

ttencate · 2020-06-14T11:23:31Z

Great questions! I'm impressed, you've carefully studied the math and the various implementations—I'm very grateful for your time and attention!

Well... I didn't study the math very carefully; statistics was never my strong suit. I get the gist of it, but the details are still quite hazy. I did study the Java implementation so that I could port it to Dart, as my app is done with Flutter. (I started out basing it on the Python version, but porting the methods from scipy turned out to be tricky.) I had overlooked the excellent discussion on #5, thanks for that link!

I knew we'd have to some day move from a list to a dictionary, it might as well be now 😆.

If you're going to make an API-breaking change anyway, I would strongly suggest not to move from a tuple to a dictionary, but to start using proper encapsulation instead. ebisu.Model could be an actual class, with predictRecall, updateRecall and modelToPercentileDecay as methods. Any caches can then be implemented as private variables. updateRecall could be implemented to modify the current instance, or return a new one (keeping the class immutable, but then the method should be named differently).

Many thanks for developing this algorithm and making it so accessible! I filed this particular issue because I was trying a simpler model: exponential decay where right answers multiply the halflife by a hardcoded factor >1, like 1.5, and wrong answers by a different factor <1, like 0.5. But I was unable to find a combination of factors that (subjectively) did a better job than Ebisu, so I decided that the bit of extra complexity is worth it even if we can't make Ebisu itself simpler 😄

fasiha · 2020-06-15T13:13:39Z

I am doing more digging to answer the questions above but, also relevant to speeding up predictRecall is #4 which shows how to use Gautschi's inequality to bound Gamma(a) / Gamma(a+b).

fasiha · 2020-06-18T02:17:59Z

I would strongly suggest not to move from a tuple to a dictionary, but to start using proper encapsulation instead

The reason I hesitate to do this is because I wanted the Python implementation to be as transparent as possible to invite commentary and to make it easy to port to other potentially non-OOP languages. I recognize I muddied the waters there by adding the rebalancing step 😅. But definitely I will consider this input as I decide what to do on this!

fasiha · 2021-04-11T08:26:21Z

@ttencate if you're still interested in Ebisu and specifically the question of rebalancing, the findings in #18 (comment) have certainly given me a lot to think about!

fasiha · 2021-04-16T06:46:26Z

Thanks for kicking off this discussion! Pushed to PyPI 😄!

fasiha mentioned this issue Jun 18, 2020

A probability distribution on the halflife itself? #32

Closed

fasiha mentioned this issue Aug 31, 2020

Half-life does not meaningfully increase after reps in some conditions #35

Open

matomatical mentioned this issue Jan 23, 2021

Approximate closed-form rebalancing for faster updates and more meaningful half-lives #41

Open

fasiha mentioned this issue Mar 29, 2021

non-boolean quiz results #23

Closed

fasiha mentioned this issue Apr 11, 2021

Look at accuracy of Beta-posterior when rebalancing? #18

Closed

fasiha closed this as completed in df1a526 Apr 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why not always rebalance? #31

Why not always rebalance? #31

ttencate commented Jun 13, 2020 •

edited

Loading

fasiha commented Jun 13, 2020

ttencate commented Jun 14, 2020

fasiha commented Jun 15, 2020

fasiha commented Jun 18, 2020

fasiha commented Apr 11, 2021

fasiha commented Apr 16, 2021

Why not always rebalance? #31

Why not always rebalance? #31

Comments

ttencate commented Jun 13, 2020 • edited Loading

fasiha commented Jun 13, 2020

ttencate commented Jun 14, 2020

fasiha commented Jun 15, 2020

fasiha commented Jun 18, 2020

fasiha commented Apr 11, 2021

fasiha commented Apr 16, 2021

ttencate commented Jun 13, 2020 •

edited

Loading