Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ebisu 1.0 new method usage #2

Closed
dbof10 opened this issue Oct 31, 2019 · 53 comments
Closed

Ebisu 1.0 new method usage #2

dbof10 opened this issue Oct 31, 2019 · 53 comments

Comments

@dbof10
Copy link

dbof10 commented Oct 31, 2019

Based on the discussion here fasiha/ebisu#11 . Api method
predictRecall and updateRecallModel are mostly used. Since you updated to 1.0, there are some new methods like modelToPercentileDecay, how do we really use this inside a quiz app?

@fasiha
Copy link
Owner

fasiha commented Oct 31, 2019

fasiha/ebisu#11 was actually opened after 1.0 was published 😇, so all discussion therein applies to this Java port. In fact, 1.0 didn't change how you use Ebisu from a quiz app—the two methods you care about are still predictRecall and updateRecall, and the new public method modelToPercentileDecay is only really useful to appease your curiosity (my apps don't use modelToPercentileDecay since I don't care what a flashcard's halflife is).

The howto IPython Notebook at https://github.com/fasiha/ebisu/blob/gh-pages/EbisuHowto.ipynb should I hope give you a good start on using Ebisu in your quiz app.

In Java: when you learn a flashcard, you create a new EbisuModel(0.25) and store that object, and a timestamp, in some database. "0.25" means you think this flashcard's memory has a halflife of 0.25 hours (fifteen minutes). The units are entirely up to you, you just have to be consistent.

(I just pushed ec76b5b adding this one-argument, and two-argument, constructor to match JS/Python's defaultModel functionality.)

Then at any point in time, you can call Ebisu.predictRecall on an EbisuModel object to get the current probability of recall. These probabilities will decay over time!

Your app can either

  1. quiz users when the probability of recall for any card dips below some factor like 0.5 (or 0.05 or 0.9, etc.), or
  2. have two separate modes, one for learning and one for quizzing, and when the user is in "quiz" mode, you rerun predictRecall on all EbisuModels to find the one with the smallest probability. (I like this number 2 better whereas number 1 is more Anki-like.)

If you hate doing all this computation—calling updateRecall on each object every time you want to find out what card to review—then there are some caching strategies you can use with modelToPercentileDecay, but to make your Minimum Viable Product quiz app, this should be enough to get you started.

What can I explain further?

@dbof10
Copy link
Author

dbof10 commented Oct 31, 2019

What I'm doing is

When user clicks learn a fact, I construct a default model, then insert into db with a structure of
dict(factID=1, model=defaultModel, lastTest=timeStamp)
After a certain amount of time, I query all the facts that have predictRecall below a threshold. I will convert those facts into quizzes. On each quiz, depends on how user answers correctly or fail, I will update updateRecall. Does that sound like the second approach as you mentioned above?

@fasiha
Copy link
Owner

fasiha commented Oct 31, 2019

That sounds like a very nice approach to option number 1 in my earlier comment, because you're using a threshold, and facts with recall probability > threshold won't be quizzed. That's a perfectly fine way to do it, Ebisu will help you with that!

Option number 2 has no threshold: you can over-review as much as you want.

@dbof10 dbof10 closed this as completed Oct 31, 2019
@fasiha
Copy link
Owner

fasiha commented Nov 1, 2019

Hey, I pushed v1.1.0, which changes the order of the 3-argument constructor of EbisuModel:

  • OLD public EbisuModel(double alpha, double beta, double time)
  • NEW public EbisuModel(double time, double alpha, double beta) (note time moved from last to first argument).

I have good reasons for this, but it is a breaking change that might break your app if you upgrade without changing your code 😭!

@dbof10
Copy link
Author

dbof10 commented Nov 1, 2019

May I know the next step for this repo?

@fasiha
Copy link
Owner

fasiha commented Nov 1, 2019

I believe this repo now has all the features that the other two implementations have, and now it should now be ready to import into any JVM language right? So I think it's done for now. I may create an ebisu-kotlin repo to port to Kotlin, possibly deprecating ebisu-java in the process, but I'm not sure.

Does that help answer the question?

@dbof10
Copy link
Author

dbof10 commented Nov 1, 2019

Yes, your answer helps.
Currently, I'm working on a fork porting to Kotlin and remove Java 8 feature like IntStream and Apache Math.

Any plan to improve the algorithm? Or you may abandon these projects in the future

@fasiha
Copy link
Owner

fasiha commented Nov 1, 2019

Awesome, I look forward to seeing your port! The two things I use Apache Math for are logGamma and BisectionSolver. logGamma is a bit tricky to implement but Ebisu.js uses gamma.js that should be straightforward to port to Kotlin. BisectionSolver should be straightforward to rewrite too (search for where a function evaluates to zero). It will be nice to get rid of Apache Math because that library seems like it's undergoing a very slow but very major reorganization.

It's conceivable that we find improvements to the algorithm. The 1.0 algorithm was a big improvement over the 0.x version. The only thing I'm not super-happy about is the need to constantly rerun predictRecall for all cards, or to cache its results in some kind of table for fast lookup ("for the next ten minutes, these are the hundred cards we should review, in this order; in ten minutes, recompute"). Its possible we find some simplification here… but whether we do or not, the existing 1.0 algorithm will remain quite usable.

I very much doubt I'll abandon these projects, but that's probably because they're pretty much complete: there's not much to add, they will likely enter maintenance mode soon, where I just keep them updated to run on newest language versions, etc. We'll of course build libraries/apps on top of these projects, and port to new languages, but I doubt the code in the existing repos will have to change a lot.

@dbof10
Copy link
Author

dbof10 commented Nov 1, 2019

For kotlin port, if I remove the Apache Math, the tests pass.

  1. The port should be similar to the original one?
  2. Since you use ConcurrentHashMap which means Ebisu can be run in multiple thread at the same time. Do we really need the cache for computation?
  3. If it reaches stable phase, we can put it in maintenance mode

@fasiha
Copy link
Owner

fasiha commented Nov 1, 2019

One long-term enhancement might be, imagine you have two memory models, but instead of them being independent, there's some correlation between them. Example: memory of 学校 versus 学生 is probably not independent because shared "学" (with the same pronunciation, in Japanese at least).

We might be able to detect correlations like this given a long history of practices, or somehow quantify what that correlation is based on things like linguistic similarity. Then updating one model after a quiz might also modify the other model. Maybe passing the quiz for one will delay the quiz for the other (which will be nice, less quizzing, more time for life). Or maybe the opposite, maybe it'll accelerate the quiz for the other, in case there's higher chance of confusing the two…

@fasiha
Copy link
Owner

fasiha commented Nov 1, 2019

For kotlin port, if I remove the Apache Math, the tests pass.

Wow, you've already got this done?

1. The port should be similar to the original one?

I don't have any expectation that the ports be similar. As long as the tests pass and the implementation is idiomatic in the target language, that's all you can ask for right :)

2. Since you use `ConcurrentHashMap` which  means Ebisu can be run in multiple thread at the same time. Do we really need the cache for computation?

I'm a bit worried about how the easiest thing to do to schedule the next card is loop through all cards and call predictRecall, so I wanted to make that as fast as possible, even when you have thousands of cards. Did I pick a bad datastructure with ConcurrentHashMap? Recall I'm huge newbie at Java, it's entirely possible a different caching/memoizing structure should be used. I admit I haven't benchmarked whether it's faster with the cache than not for a large list of EbisuModels

But the short answer is no, you of course don't need to cache the calculation. You'll notice that predictRecall caches only two calls to logGamma, for which the arguments are independent of time.

Hmm.

This is Java and OOP. You know what we should do—instead of a cache, we should just store logGamma(this.alpha) and logGamma(this.beta) as member variables! I should have an improved design within eighteen hours.

3. If it reaches stable phase, we can put it in maintenance mode

Nice!

@dbof10
Copy link
Author

dbof10 commented Nov 1, 2019

ConcurrentHashMap is meant for concurrent environment, read on thread, write on the other.
Are logGamma(this.alpha) and logGamma(this.beta) expensive computation? if so even logGamma(this.alpha) is idempotent, we don't really need ConcurrentHashMap, we can just use HashMap, else we don't need cache at all

@fasiha
Copy link
Owner

fasiha commented Nov 1, 2019

Gotcha, thanks for explaining concurrent versus non-concurrent HashMap, I wanted to allow the objects to work in multi-threaded apps but I’m going to remove the cache and store the two numbers in the object itself.

logGamma is expensive compared to multiply-add but not too much: looking at the source for gamma.js, it’s fifteen divide-adds and three Math.logs, so it’s fixed cost. But it is the dominant cost of predictRecall, which means it could be the dominant cost of your entire app if you run a thousand predictRecall before each quiz. Caching the calculation in the object will reduce the number of logGamma by half.

Question: if I add two member variables in EbisuModel, how will that impact your storing them in a database? Are you using an ORM? Or are you extracting the three parameters in them now and storing them in database? (I ask because I’m unfamiliar with Java. In JavaScript I’d store the object (JS objects are just dictionaries) and have the method fill in the new field if it’s missing in the data store. But Java is more rigid about these things right?)

@dbof10
Copy link
Author

dbof10 commented Nov 1, 2019

Java is type safe. I store an EbisuModel as 3 columns. However, ORM helps in this case

@dbof10
Copy link
Author

dbof10 commented Nov 1, 2019

Here is the port of Kotlin
https://github.com/dbof10/ebisu
Currently, I'm working on Kotlin Native which can run on iOS as well
The thing is it's hard for me to keep track of your latest algo.

@dbof10
Copy link
Author

dbof10 commented Nov 1, 2019

Hi, I thought my port has some bugs, but when I run your tests, they are all failed

@dbof10
Copy link
Author

dbof10 commented Nov 1, 2019

Take a look at https://blog.codefx.org/libraries/junit-5-setup/#Eclipse

@SelectPackages({ "org.codefx.demo.junit5" })
public class TestWithJUnit5 { }

You need to add @RunWith(JUnitPlatform.class) like the above sample else they all passes

@fasiha
Copy link
Owner

fasiha commented Nov 1, 2019

Here is the port of Kotlin
https://github.com/dbof10/ebisu
Currently, I'm working on Kotlin Native which can run on iOS as well

Cool! It looks very similar to Java and yet native—amazing!

The thing is it's hard for me to keep track of your latest algo.

How can I help with this?

You need to add @RunWith(JUnitPlatform.class) like the above sample else they all passes

I don't understand this at all—you need to do this in your Kotlin Native project otherwise all tests incorrectly pass? Do I nee to do anything in this repo?

@fasiha
Copy link
Owner

fasiha commented Nov 2, 2019

I ported substack's gamma.js to Java: https://github.com/fasiha/gamma-java

@dbof10
Copy link
Author

dbof10 commented Nov 2, 2019

Thank for the java gamma port.

  1. I think I will check the repo more frequently to get the latest of it.

  2. I run all tests in your repo. They all failed. Because currently you are using junit 5, you need to add @RunWith(JUnitPlatform.class) on top of your test class, let's try assertEquals(0,1) it still passes, if you don't add the annotation @RunWith(JUnitPlatform.class). all tests just pass

@fasiha
Copy link
Owner

fasiha commented Nov 2, 2019

2. I run all tests in your repo. They all failed. Because currently you are using junit 5, you need to add `@RunWith(JUnitPlatform.class)` on top of your test class, let's try `assertEquals(0,1)` it still passes, if you don't add the annotation `@RunWith(JUnitPlatform.class)`. all tests just pass

Hmm, that's not the case for me. If I add assertEquals(0,1) to testHalflife in EbisuTest.java and run mvn test, I see

[ERROR] Failures:
[ERROR]   EbisuTests.testHalflife:78 expected: <0> but was: <1>
[INFO]
[ERROR] Tests run: 5, Failures: 1, Errors: 0, Skipped: 0

It took a fair amount of effort to get all tests to pass, I've seen a lot of test failures, a lot of red text, last few days!

Is there some configuration that Maven does that makes all this work without the RunWith decorator?

@dbof10
Copy link
Author

dbof10 commented Nov 2, 2019

Nowadays, people use Gradle not Maven anymore. Let's me try again

@fasiha
Copy link
Owner

fasiha commented Nov 2, 2019

FYI, I just pushed (but haven't tagged—I want to sleep on this) a commit that gets rid of the explicit cache and moves it into the EbisuModel objects themselves. If you have comments on whether this design (of making a default method in the interface, and overriding it in the class implementing the interface) is good or bad, I would appreciate them!:

9e9b5e2

@dbof10
Copy link
Author

dbof10 commented Nov 2, 2019

actually, I still prefer cache than this design, you put logGamma function in constructor,
constructing EbisuModel becomes very expensive

@fasiha
Copy link
Owner

fasiha commented Nov 2, 2019

actually, I still prefer cache than this design, you put logGamma function in constructor,
constructing EbisuModel becomes very expensive

Ok, great point, what about this commit, where those two gammas are computed only the first time it's needed and not at construction: bf5789e

(I'm sure this whole design is awfully overengineered, with an interface and a class for storing the data, and then a non-instantiable class for the calculations 🙄…)

@fasiha
Copy link
Owner

fasiha commented Nov 3, 2019

Also, I think I saw an email about implementing BisectionSolver, I use this JavaScript library in Ebisu.js: https://github.com/scijs/minimize-golden-section-1d and was thinking about porting it to Java. It's pretty fancy, in that it can handle the case where you don't provide one or two endpoints, but this is the simple function that just handles the case where you give both edges: https://github.com/scijs/minimize-golden-section-1d/blob/master/src/golden-section-minimize.js (though note this is a function minimizer not a function root-finder, so we'll have to add a Math.abs to the function being optimized in Ebisu). The algorithm is really really simple (wikipedia pseudocode).

The bisection solver is used only when updating (and that too only sometimes), or when explicitly calling modelToPercentileDecay, so it doesn't need a very fancy algorithm. That said, it might be nice to use a faster algorithm like Brent's. I see this https://www.cs.jhu.edu/~blake/javadocs/edu/jhu/bme/smile/commons/optimize/BrentMethod1D.html, I'll just make a note of it for future reference.

@dbof10
Copy link
Author

dbof10 commented Nov 3, 2019

Since you understand Ebisu best, can you help pick a library working best for Bisection? Are there any chances, Ebisu provides the same endpoint and Apache math doesnt handle those cases?

@dbof10
Copy link
Author

dbof10 commented Nov 3, 2019

I think bf5789e is not good still. I still prefer the caching design, Ebisu model should be a plain object, it should do any logic, since you have an Optional, it's hard to store it in db as well

@fasiha
Copy link
Owner

fasiha commented Nov 3, 2019

I think bf5789e is not good still. I still prefer the caching design, Ebisu model should be a plain object, it should do any logic, since you have an Optional, it's hard to store it in db as well

I believe you should be able to still store just alpha, beta, and time in the db, since those are the only arguments to the constructor. Am I missing something? Should I make EbisuModel implement java.io.Serializable interface to make it easy to serialize/deserialize to your database?

I'm happy to roll back to the caching design if it allows idiomatic Java, but I was thinking of simplifying the library's design by making EbisuInterface contain the three public methods in Ebisu class (predictRecall, updateRecall, and modelToDecayPercentile), and putting the implementations of those three methods in EbisuModel, thereby getting rid of the entire Ebisu non-instantiable class. I can also make these implement Serializable, or some other interface to help with putting them into database?

Or should I abandon all this fanciness and just go back to the version with the cache hashmap 😅?

Since you understand Ebisu best, can you help pick a library working best for Bisection? Are there any chances, Ebisu provides the same endpoint and Apache math doesnt handle those cases?

No worries, I'll port bisection. No, Ebisu's modelToPercentileDecay only picks "good" endpoints, so BisectionSolver will always find a solution. Unless percentile is something really weird like 2e-16 or something…

@dbof10
Copy link
Author

dbof10 commented Nov 5, 2019

I think go back to the original design is the best option for now. Just change CorruntHashMap to HashMap. or no need to cache anyhting. It should be fine.

When will you have Bisection Java? 😆

@fasiha
Copy link
Owner

fasiha commented Nov 5, 2019

I think go back to the original design is the best option for now. Just change CorruntHashMap to HashMap. or no need to cache anyhting. It should be fine.

Ooookay, I’ll revert my last push and move to another branch 🤪

When will you have Bisection Java? 😆

Hopefully in twelve hours 😅!, thanks for prodding me!

@fasiha
Copy link
Owner

fasiha commented Nov 6, 2019

When will you have Bisection Java? 😆

Ok!, pushed https://github.com/fasiha/minimize-golden-section-java

Integrated it to this repo, version 1.1.1 https://github.com/fasiha/ebisu-java/releases!!!

@fasiha
Copy link
Owner

fasiha commented Nov 6, 2019

What database are you storing Ebisu models in?

@dbof10
Copy link
Author

dbof10 commented Nov 6, 2019

I'm using SQLite

@dbof10
Copy link
Author

dbof10 commented Nov 6, 2019

Now we can use bisection java and remove apache math. Ebisu is now very lightweight to use 🎉

@fasiha
Copy link
Owner

fasiha commented Nov 6, 2019

Now we can use bisection java and remove apache math. Ebisu is now very lightweight to use 🎉

Yes, the latest version I tagged doesn’t have Apache Commons Math anywhere in it 😁!

@dbof10
Copy link
Author

dbof10 commented Nov 6, 2019

Now is there anyway to pick a good section of alpha, beta, t? How can we optimize this either by math or machine learning? What do you think Ebisu compared to Machine Learning? I really want to push Ebisu beyond math and use ML

@fasiha
Copy link
Owner

fasiha commented Nov 6, 2019

Now is there anyway to pick a good section of alpha, beta, t? How can we optimize this either by math or machine learning? What do you think Ebisu compared to Machine Learning? I really want to push Ebisu beyond math and use ML

Just pick initial alpha=beta=3, and for t pick whatever you think the flashcard’s half-life is, that is, after what time interval you think your memory of it drops to 50%. I like using t=0.25 hours but I also let users adjust that since some cards are easier or harder—if you know a flash card, but still want it in SRS, make t=168 hours (one week). See https://fasiha.github.io/ebisu/#choice-of-initial-model-parameters and this project’s readme for more info, and let me know how I can improve this documentation.

About machine learning. Usually people use machine learning because they can’t find a reasonable mathematical model that runs in reasonable time for whatever they’re trying to model. So they use a bunch of data to try to infer the relationships they care about, and typically the more data you have, the simpler algorithms you can use to find the model.

The main Ebisu doc has some background on Mozer and Duolingo’s ML-style approach and how expensive they are to update, and how they may make sense only after you have a ton of data, from multiple people reviewing the same flash card. When you have that, you can start finding clusters in the data, like, you can find out that for this flash card, there’s some students that know it well already because of some prior background but most don’t, so maybe you can automatically infer the starting t factor, etc.

When you have a ton of users for the same course, you can also start trying to figure out how cards are semantically related to each other, like which cards are mutually-reinforcing and which cards are interfering. My hope is we can do this with linguistic data (for language learning courses), and not a ton of review data though.

I’m happy to be proven wrong but

  • I don’t think ML with a lot of data will significantly improve quiz scheduling decisions beyond Ebisu or SM2 or even Leitner scheduler. I expect the improvement will be incremental, not revolutionary.
  • It would also take a lot of work in feature engineering—you might need to know a lot of personal details about the students, and take into account when they study and practice etc.
  • The resulting trained ML classifier/regressor might be sensitive to the course and not easily generalizable.
  • You likely won’t be able to do an ML update on a mobile device, you’ll need a big computer if not a GPU to retrain the algorithm on the newest quiz results.

I’m also happy to advise anyone who wants to do an academic study using machine learning to improve on existing algorithms. There’s a good amount of literature (two papers which I mention at https://fasiha.github.io/ebisu/#how-it-works) and likely a good Masters or PhD topic.

The core Ebisu algorithm, with analytical predict and update, though will likely still have a place in more advanced algorithms.

@dbof10
Copy link
Author

dbof10 commented Nov 6, 2019

that sounds good. When I have lots of data, I may get back to you with ML approach.

Can you take a look at https://github.com/Networks-Learning/memorize

@dbof10
Copy link
Author

dbof10 commented Nov 6, 2019

Is the t in ebisu model updated after a quiz? since a student may perform on a quiz of a fact. Is there a way to predict that he internalize that fact, I don't want to create a quiz by that fact anymore.

@fasiha
Copy link
Owner

fasiha commented Nov 6, 2019

Is the t in ebisu model updated after a quiz? since a student may perform on a quiz of a fact. Is there a way to predict that he internalize that fact, I don't want to create a quiz by that fact anymore.

Yes, t factor is (like alpha and beta) updated by updateRecall after quiz.

If you start with some finite t, and after a while you don't want to review that fact any more, you'll have to either delete the model, or in your app keep a list of facts you don't want to review.

Or what you can do at that time is reset the model to a new one, like new EbisuModel(3, 3, 24*365) which sets the halflife to a year, so it comes up much less frequently for review.

The promise of SRS is that you'll never forget something you've learned: even if you've internalized a fact now, in ten years you might have forgotten it, so to avoid that, you do increasingly-distant reviews. If you start with t=0.25 hours, after a few successful quizzes, the halflife of the quiz increases to a day, and keeps increasing as long as you get them right. (If you get them wrong, the halflife drops, depending on how long ago you last studied. There are charts on the main doc (second chart after https://fasiha.github.io/ebisu/#how-it-works).

@fasiha
Copy link
Owner

fasiha commented Nov 6, 2019

Can you take a look at https://github.com/Networks-Learning/memorize

I encountered this when someone sent me a link to https://www.reddit.com/r/Anki/comments/awr9ql/memorize_an_optimal_algorithm_for_spaced/ (there was some discussion in that thread about Ebisu), and I'll try to read the paper and see if I can understand it enough to comment on it.

@fasiha
Copy link
Owner

fasiha commented Nov 7, 2019

Can you take a look at https://github.com/Networks-Learning/memorize

I took a look at this. Academic writing is 🤦‍♀️🤦‍♂️, and their code repo simply contains scripts, rather than a Python library that we could install and import and start using (like ahem some other libraries).

But! It's actually cool! It outsources the details of the memory model to some other external library (in their paper, to Duolingo's halflife regression; in our case, to Ebisu 💪), and is a potentially smarter way to schedule quizzes far into the future:

Memorize can run over each flashcard and assign it a review due date. When you review that flashcard on/after that due date, Memorize can be rerun on that flashcard to schedule it again. That's all it handles—when you do your review, Ebisu takes care of updating the model; and it relies on Ebisu to produce the probability of recall at each step of its scheduling algorithm.

You can also do something cool: you can say "this flashcard is more important than other flashcards so feel free to schedule it sooner than you otherwise would", and you can numerically quantify that. This might be useful if a student is studying for an exam that covers certain topics, e.g.

The one thing that's weird and I don't know how much I like it—it's a stochastic algorithm, meaning if you rerun it on the exact same inputs, it produces different due dates. You can read the Python implementation here to see what I mean: this sampler function returns the time in the future to schedule this quiz's due date, and it draws random numbers. Over an infinite number of schedules, the algorithm is guaranteed to be optimal (balancing forgetting versus over-quizzing) buuuuut any one actual realized schedule will be nothing special.

Still, cool, it'll be good to make a repo to implement this. Ideally it should abstract over the actual model: a user should be able to pass in any function that produces recall probabilities, and the Memorize library should return the due date. The sole weird math it needs, drawing from an Exponential distribution, is very simple to implement, so it can be a stand-alone library in Java/Python/JavaScript.

Do you want to implement it 😇?

@dbof10
Copy link
Author

dbof10 commented Nov 7, 2019

Memorize can run over each flashcard and assign it a review due date. Can I say like Memorize will help Ebisu generate an Ebisu model with a better of alpha, beta, t?

this flashcard is more important than other flashcards so feel free to schedule it sooner than you otherwise would sounds awesome but how do we know if it's important?

the Memorize library should return the due date it's similar to
double halflife = Ebisu.modelToPercentileDecay(updatedModel); // 50% recall
with current model you can predict a halftime life, schedule duedate = currenttime + halflife return by modelToPercentileDecay

Do you want to implement it 😇? 👍 yesssss let's do it

@fasiha
Copy link
Owner

fasiha commented Nov 7, 2019

Memorize can run over each flashcard and assign it a review due date. Can I say like Memorize will help Ebisu generate an Ebisu model with a better of alpha, beta, t?

No, I don't think so, the memory model is I think entirely outsourced, all Memorize needs is a function to calculate the probability of recall at different times. Memorize is just telling you when to schedule a quiz. Not how you computed the probability of recall, or update after the quiz.

this flashcard is more important than other flashcards so feel free to schedule it sooner than you otherwise would sounds awesome but how do we know if it's important?

It'd probably be an app-level decision. "I'm studying chapter 4, assign more weight to these flashcards."

the Memorize library should return the due date it's similar to
double halflife = Ebisu.modelToPercentileDecay(updatedModel); // 50% recall
with current model you can predict a halftime life, schedule duedate = currenttime + halflife return by modelToPercentileDecay

Not quite, if you read their website, they say this quite clearly:

“this is different from scheduling a review after the probability of recall hits a certain threshold. That is a deterministic heuristic policy which attempts to prevent the probability of recall from falling below a threshold. Our policy, on the other hand, is stochastic in nature and is provably optimal for our loss function.”

What you're suggesting is a deterministic due date, something like "schedule a review when recall probability drops to 50%". What they're saying is that Memorize will give you a better balance between forgetting and too many quizzes. There's no magic number like 50% in Memorize.

Do you want to implement it 😇? 👍 yesssss let's do it

I'm playing with the algorithm now. I can't figure out its behavior in some edge cases… some stupid bug somewhere.

@dbof10
Copy link
Author

dbof10 commented Nov 7, 2019

anw I feel like Memorize is a missing piece for Ebisu 😍 😍😍

@fasiha
Copy link
Owner

fasiha commented Nov 7, 2019

https://github.com/fasiha/ebisu/blob/memorize/memorize.py has my edits to the original memorize.py, cleaning up the API, adding docstrings, and showing a basic example using Ebisu. I'm not sure what the license is so I may have to rewrite the Poisson-thinning process (their sample function).

@dbof10
Copy link
Author

dbof10 commented Nov 7, 2019

sounds good. can you add a doc how to combine with Ebisu as well?

@fasiha
Copy link
Owner

fasiha commented Nov 7, 2019

I'll probably make this separate repos, called ebisu-memorize.

@dbof10
Copy link
Author

dbof10 commented Nov 8, 2019

1.Do we have docs now?
2. Yesterday we talked about Importance of a flashcard should it be part of Ebisu or Memorize, or Ebisu-Memorize?

@fasiha
Copy link
Owner

fasiha commented Nov 8, 2019

1.Do we have docs now?

Not yet.

  1. Yesterday we talked about Importance of a flashcard should it be part of Ebisu or Memorize, or Ebisu-Memorize?

None of the above, I think the parent quiz app would keep track of which flashcards the user wanted to emphasize and then pass that as a variable into ebisu-memorize. I have a package prepared, I just want the Memorize people to comment on some issues I opened in their repo.

@fasiha
Copy link
Owner

fasiha commented Nov 10, 2019

I published https://github.com/fasiha/memorize-py

It's weird. The algorithm has a q parameter that trades off between too much reviewing and too much forgetting. The soonest quiz it will schedule is sqrt(q), so if you have quizes that you've almost forgotten, there's a floor under how soon it'll schedule a quiz (i.e., if q=1.0, it will schedule quizzes in >=1 time unit, which I usually use hours). You might make this small, like q=.01 so the smallest review time is now sqrt(0.01)=0.1 hours but then the quizzes with high probability of recall also get scheduled sooner.

I feel the algorithm needs a tweak for the low-probability quizzes. Let me think about how to do it. The author of the repo hasn't answered my questions but maybe I'll try reaching out to them to see if there's a nice mathematical way of getting quiz times scheduled sooner than sqrt(q) for low-recall-probability flashcards.

See correction below

@fasiha
Copy link
Owner

fasiha commented Nov 10, 2019

I published https://github.com/fasiha/memorize-py

Ugh, I'm an idiot, I confused 'mean' for 'min'... the algorithm is fine, it is capable of scheduling quizzes with low-recall-probabilities.

I will try to write ebisu-memorize tomorrow or next week. Until then feel free to schedule using your technique of finding the time when predictRecall drops below X percent!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants