-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ebisu 1.0 new method usage #2
Comments
fasiha/ebisu#11 was actually opened after 1.0 was published 😇, so all discussion therein applies to this Java port. In fact, 1.0 didn't change how you use Ebisu from a quiz app—the two methods you care about are still The howto IPython Notebook at https://github.com/fasiha/ebisu/blob/gh-pages/EbisuHowto.ipynb should I hope give you a good start on using Ebisu in your quiz app. In Java: when you learn a flashcard, you create a (I just pushed ec76b5b adding this one-argument, and two-argument, constructor to match JS/Python's Then at any point in time, you can call Your app can either
If you hate doing all this computation—calling What can I explain further? |
What I'm doing is When user clicks learn a fact, I construct a default model, then insert into db with a structure of |
That sounds like a very nice approach to option number 1 in my earlier comment, because you're using a threshold, and facts with recall probability > threshold won't be quizzed. That's a perfectly fine way to do it, Ebisu will help you with that! Option number 2 has no threshold: you can over-review as much as you want. |
Hey, I pushed v1.1.0, which changes the order of the 3-argument constructor of
I have good reasons for this, but it is a breaking change that might break your app if you upgrade without changing your code 😭! |
May I know the next step for this repo? |
I believe this repo now has all the features that the other two implementations have, and now it should now be ready to import into any JVM language right? So I think it's done for now. I may create an ebisu-kotlin repo to port to Kotlin, possibly deprecating ebisu-java in the process, but I'm not sure. Does that help answer the question? |
Yes, your answer helps. Any plan to improve the algorithm? Or you may abandon these projects in the future |
Awesome, I look forward to seeing your port! The two things I use Apache Math for are logGamma and BisectionSolver. logGamma is a bit tricky to implement but Ebisu.js uses gamma.js that should be straightforward to port to Kotlin. BisectionSolver should be straightforward to rewrite too (search for where a function evaluates to zero). It will be nice to get rid of Apache Math because that library seems like it's undergoing a very slow but very major reorganization. It's conceivable that we find improvements to the algorithm. The 1.0 algorithm was a big improvement over the 0.x version. The only thing I'm not super-happy about is the need to constantly rerun I very much doubt I'll abandon these projects, but that's probably because they're pretty much complete: there's not much to add, they will likely enter maintenance mode soon, where I just keep them updated to run on newest language versions, etc. We'll of course build libraries/apps on top of these projects, and port to new languages, but I doubt the code in the existing repos will have to change a lot. |
For kotlin port, if I remove the Apache Math, the tests pass.
|
One long-term enhancement might be, imagine you have two memory models, but instead of them being independent, there's some correlation between them. Example: memory of 学校 versus 学生 is probably not independent because shared "学" (with the same pronunciation, in Japanese at least). We might be able to detect correlations like this given a long history of practices, or somehow quantify what that correlation is based on things like linguistic similarity. Then updating one model after a quiz might also modify the other model. Maybe passing the quiz for one will delay the quiz for the other (which will be nice, less quizzing, more time for life). Or maybe the opposite, maybe it'll accelerate the quiz for the other, in case there's higher chance of confusing the two… |
Wow, you've already got this done?
I don't have any expectation that the ports be similar. As long as the tests pass and the implementation is idiomatic in the target language, that's all you can ask for right :)
I'm a bit worried about how the easiest thing to do to schedule the next card is loop through all cards and call But the short answer is no, you of course don't need to cache the calculation. You'll notice that Hmm. This is Java and OOP. You know what we should do—instead of a cache, we should just store
Nice! |
|
Gotcha, thanks for explaining concurrent versus non-concurrent HashMap, I wanted to allow the objects to work in multi-threaded apps but I’m going to remove the cache and store the two numbers in the object itself. logGamma is expensive compared to multiply-add but not too much: looking at the source for gamma.js, it’s fifteen divide-adds and three Math.logs, so it’s fixed cost. But it is the dominant cost of Question: if I add two member variables in EbisuModel, how will that impact your storing them in a database? Are you using an ORM? Or are you extracting the three parameters in them now and storing them in database? (I ask because I’m unfamiliar with Java. In JavaScript I’d store the object (JS objects are just dictionaries) and have the method fill in the new field if it’s missing in the data store. But Java is more rigid about these things right?) |
Java is type safe. I store an |
Here is the port of Kotlin |
Hi, I thought my port has some bugs, but when I run your tests, they are all failed |
Take a look at https://blog.codefx.org/libraries/junit-5-setup/#Eclipse
You need to add |
Cool! It looks very similar to Java and yet native—amazing!
How can I help with this?
I don't understand this at all—you need to do this in your Kotlin Native project otherwise all tests incorrectly pass? Do I nee to do anything in this repo? |
I ported substack's gamma.js to Java: https://github.com/fasiha/gamma-java |
Thank for the java gamma port.
|
Hmm, that's not the case for me. If I add
It took a fair amount of effort to get all tests to pass, I've seen a lot of test failures, a lot of red text, last few days! Is there some configuration that Maven does that makes all this work without the |
Nowadays, people use |
FYI, I just pushed (but haven't tagged—I want to sleep on this) a commit that gets rid of the explicit cache and moves it into the EbisuModel objects themselves. If you have comments on whether this design (of making a default method in the interface, and overriding it in the class implementing the interface) is good or bad, I would appreciate them!: |
actually, I still prefer cache than this design, you put logGamma function in constructor, |
Ok, great point, what about this commit, where those two gammas are computed only the first time it's needed and not at construction: bf5789e (I'm sure this whole design is awfully overengineered, with an interface and a class for storing the data, and then a non-instantiable class for the calculations 🙄…) |
Also, I think I saw an email about implementing BisectionSolver, I use this JavaScript library in Ebisu.js: https://github.com/scijs/minimize-golden-section-1d and was thinking about porting it to Java. It's pretty fancy, in that it can handle the case where you don't provide one or two endpoints, but this is the simple function that just handles the case where you give both edges: https://github.com/scijs/minimize-golden-section-1d/blob/master/src/golden-section-minimize.js (though note this is a function minimizer not a function root-finder, so we'll have to add a Math.abs to the function being optimized in Ebisu). The algorithm is really really simple (wikipedia pseudocode). The bisection solver is used only when updating (and that too only sometimes), or when explicitly calling |
Since you understand Ebisu best, can you help pick a library working best for Bisection? Are there any chances, Ebisu provides the same endpoint and Apache math doesnt handle those cases? |
I think |
I believe you should be able to still store just I'm happy to roll back to the caching design if it allows idiomatic Java, but I was thinking of simplifying the library's design by making EbisuInterface contain the three public methods in Or should I abandon all this fanciness and just go back to the version with the cache hashmap 😅?
No worries, I'll port bisection. No, Ebisu's |
I think go back to the original design is the best option for now. Just change CorruntHashMap to HashMap. or no need to cache anyhting. It should be fine. When will you have Bisection Java? 😆 |
Ooookay, I’ll revert my last push and move to another branch 🤪
Hopefully in twelve hours 😅!, thanks for prodding me! |
Ok!, pushed https://github.com/fasiha/minimize-golden-section-java Integrated it to this repo, version 1.1.1 https://github.com/fasiha/ebisu-java/releases!!! |
What database are you storing Ebisu models in? |
I'm using SQLite |
Now we can use bisection java and remove apache math. Ebisu is now very lightweight to use 🎉 |
Yes, the latest version I tagged doesn’t have Apache Commons Math anywhere in it 😁! |
Now is there anyway to pick a good section of alpha, beta, t? How can we optimize this either by math or machine learning? What do you think Ebisu compared to Machine Learning? I really want to push Ebisu beyond math and use ML |
Just pick initial alpha=beta=3, and for t pick whatever you think the flashcard’s half-life is, that is, after what time interval you think your memory of it drops to 50%. I like using t=0.25 hours but I also let users adjust that since some cards are easier or harder—if you know a flash card, but still want it in SRS, make t=168 hours (one week). See https://fasiha.github.io/ebisu/#choice-of-initial-model-parameters and this project’s readme for more info, and let me know how I can improve this documentation. About machine learning. Usually people use machine learning because they can’t find a reasonable mathematical model that runs in reasonable time for whatever they’re trying to model. So they use a bunch of data to try to infer the relationships they care about, and typically the more data you have, the simpler algorithms you can use to find the model. The main Ebisu doc has some background on Mozer and Duolingo’s ML-style approach and how expensive they are to update, and how they may make sense only after you have a ton of data, from multiple people reviewing the same flash card. When you have that, you can start finding clusters in the data, like, you can find out that for this flash card, there’s some students that know it well already because of some prior background but most don’t, so maybe you can automatically infer the starting t factor, etc. When you have a ton of users for the same course, you can also start trying to figure out how cards are semantically related to each other, like which cards are mutually-reinforcing and which cards are interfering. My hope is we can do this with linguistic data (for language learning courses), and not a ton of review data though. I’m happy to be proven wrong but
I’m also happy to advise anyone who wants to do an academic study using machine learning to improve on existing algorithms. There’s a good amount of literature (two papers which I mention at https://fasiha.github.io/ebisu/#how-it-works) and likely a good Masters or PhD topic. The core Ebisu algorithm, with analytical predict and update, though will likely still have a place in more advanced algorithms. |
that sounds good. When I have lots of data, I may get back to you with ML approach. Can you take a look at https://github.com/Networks-Learning/memorize |
Is the t in ebisu model updated after a quiz? since a student may perform on a quiz of a fact. Is there a way to predict that he internalize that fact, I don't want to create a quiz by that fact anymore. |
Yes, If you start with some finite Or what you can do at that time is reset the model to a new one, like The promise of SRS is that you'll never forget something you've learned: even if you've internalized a fact now, in ten years you might have forgotten it, so to avoid that, you do increasingly-distant reviews. If you start with |
I encountered this when someone sent me a link to https://www.reddit.com/r/Anki/comments/awr9ql/memorize_an_optimal_algorithm_for_spaced/ (there was some discussion in that thread about Ebisu), and I'll try to read the paper and see if I can understand it enough to comment on it. |
I took a look at this. Academic writing is 🤦♀️🤦♂️, and their code repo simply contains scripts, rather than a Python library that we could install and import and start using (like ahem some other libraries). But! It's actually cool! It outsources the details of the memory model to some other external library (in their paper, to Duolingo's halflife regression; in our case, to Ebisu 💪), and is a potentially smarter way to schedule quizzes far into the future: Memorize can run over each flashcard and assign it a review due date. When you review that flashcard on/after that due date, Memorize can be rerun on that flashcard to schedule it again. That's all it handles—when you do your review, Ebisu takes care of updating the model; and it relies on Ebisu to produce the probability of recall at each step of its scheduling algorithm. You can also do something cool: you can say "this flashcard is more important than other flashcards so feel free to schedule it sooner than you otherwise would", and you can numerically quantify that. This might be useful if a student is studying for an exam that covers certain topics, e.g. The one thing that's weird and I don't know how much I like it—it's a stochastic algorithm, meaning if you rerun it on the exact same inputs, it produces different due dates. You can read the Python implementation here to see what I mean: this Still, cool, it'll be good to make a repo to implement this. Ideally it should abstract over the actual model: a user should be able to pass in any function that produces recall probabilities, and the Memorize library should return the due date. The sole weird math it needs, drawing from an Exponential distribution, is very simple to implement, so it can be a stand-alone library in Java/Python/JavaScript. Do you want to implement it 😇? |
|
No, I don't think so, the memory model is I think entirely outsourced, all Memorize needs is a function to calculate the probability of recall at different times. Memorize is just telling you when to schedule a quiz. Not how you computed the probability of recall, or update after the quiz.
It'd probably be an app-level decision. "I'm studying chapter 4, assign more weight to these flashcards."
Not quite, if you read their website, they say this quite clearly: “this is different from scheduling a review after the probability of recall hits a certain threshold. That is a deterministic heuristic policy which attempts to prevent the probability of recall from falling below a threshold. Our policy, on the other hand, is stochastic in nature and is provably optimal for our loss function.” What you're suggesting is a deterministic due date, something like "schedule a review when recall probability drops to 50%". What they're saying is that Memorize will give you a better balance between forgetting and too many quizzes. There's no magic number like 50% in Memorize.
I'm playing with the algorithm now. I can't figure out its behavior in some edge cases… some stupid bug somewhere. |
anw I feel like Memorize is a missing piece for Ebisu 😍 😍😍 |
https://github.com/fasiha/ebisu/blob/memorize/memorize.py has my edits to the original |
sounds good. can you add a doc how to combine with Ebisu as well? |
I'll probably make this separate repos, called |
1.Do we have docs now? |
Not yet.
None of the above, I think the parent quiz app would keep track of which flashcards the user wanted to emphasize and then pass that as a variable into ebisu-memorize. I have a package prepared, I just want the Memorize people to comment on some issues I opened in their repo. |
I published https://github.com/fasiha/memorize-py
See correction below |
Ugh, I'm an idiot, I confused 'mean' for 'min'... the algorithm is fine, it is capable of scheduling quizzes with low-recall-probabilities. I will try to write ebisu-memorize tomorrow or next week. Until then feel free to schedule using your technique of finding the time when |
Based on the discussion here fasiha/ebisu#11 . Api method
predictRecall
andupdateRecallModel
are mostly used. Since you updated to 1.0, there are some new methods likemodelToPercentileDecay
, how do we really use this inside aquiz app
?The text was updated successfully, but these errors were encountered: