# rmoorman/gwern.net forked from briansniffen/gwern.net

Fetching contributors…
Cannot retrieve contributors at this time
595 lines (387 sloc) 116 KB
Because it does work. Sort of. Cramming is a trade-off: you trade a strong memory now for weak memory later. (Very weak[^Stahl1].) And tests are usually of all the new material, with occasional old questions, so this strategy pays off! That's the damnable thing about it - its memory longevity & quality are, in sum, less than that of spaced repetition, but cramming delivers its goods *now*[^vacha]. So cramming is a rational, if short-sighted, response, and even SRS software recognize its utility & support it to some degree[^cramming]. (But as one might expect, if the testing is continuous and incremental, then the learning tends to also be long-lived[^conway]; I do not know if this is because that kind of testing is a disguised accidental spaced repetition system, or the students/subjects simply studying/acting differently in response to small-stakes exams.) In addition to this short-term advantage, there's a subjective *illusion* that the gains persist[^Stahl2][^Kornell2010]; from [Kornell 2009](http://sites.williams.edu/nk2/files/2011/08/Kornell.2009b.pdf)'s study of GRE vocab (emphasis added): > "Across experiments, *spacing* was more effective than massing for 90% of the participants, yet after the first study session, 72% of the participants believed that *massing* had been more effective than spacing....When they do consider spacing, they often exhibit the illusion that massed study is more effective than spaced study, even when the reverse is true ([Dunlosky & Nelson, 1994](http://dept.kent.edu/psychology/MClab/files/pubbies/D&N%201994.pdf); [Kornell & Bjork, 2008a](http://faculty.kutztown.edu/rryan/classes/theories/objectiv/applying_cog_principles_to_ed_resources/Interleaving-Shuffling/Kornell%20&%20Bjork%202008.pdf); [Simon & Bjork, 2001](http://bjorklab.psych.ucla.edu/pubs/Simon_RBjork_2001.pdf); [Zechmeister & Shaughnessy, 1980](http://www.willatworklearning.com/2005/11/research_review.html))." [^Stahl1]: ["Play it Again: The Master Psychopharmacology Program as an Example of Interval Learning in Bite-Sized Portions"](http://www.cnsspectrums.com/aspx/articledetail.aspx?articleid=2783), Stahl et al 2010: > "Since Ebbinghaus' time, a voluminous amount of research has confirmed this simple but important fact: the retention of new information degrades rapidly unless it is reviewed in some manner. A modern example of this loss of knowledge without repetition is a study of cardiopulmonary resuscitation (CPR) skills that demonstrated rapid decay in the year following training. By 3 years post-training only 2.4% were able to perform CPR successfully.^[6](http://onlinelibrary.wiley.com/doi/10.1111/j.2044-8325.1985.tb00186.x/abstract)^ Another recent study of physicians taking a tutorial they rated as very good or excellent showed mean knowledge scores increasing from 50% before the tutorial to 76% immediately afterward.^[7](http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2517967/)^ However, score gains were only half as great 3-8 days later and incredibly, there was no significant knowledge retention measurable at all at 55 days.^7^ Similar results have been reported by us in follow-up studies of knowledge retention from continuing medical education programs.^1^ [Stahl SM, Davis RL. _Best Practices for Medical Educators_. Carlsbad, CA: NEI Press; 2009] > > ...This may be due to the fact that lectures with assigned reading are the easiest for teachers. Also, medical learning is rarely measured immediately after a lecture or after reading new material for the first time and then measured again a few days or weeks later, so that the low retention rates of this approach may not be widely appreciated.^1,[4](http://www.psy-world.com/0409CNS_Stahl.pdf)^ No wonder formal medical education conferences without enabling or practice-reinforcing strategies appear to have relatively little impact on practice and healthcare outcomes.^[8](http://www.ncbi.nlm.nih.gov/pubmed/10478694),[9](http://www.ncbi.nlm.nih.gov/pubmed/9617647),[10](http://blog.evidenceinmotion.com/evidence/files/EducationalCMEDavisJAMA1995.pdf)^" [^Stahl2]: Stahl 2010: > "For example, simple restudying allows the learner to reexperience all of the material but actually produces poor long-term retention.^[25](http://tigger.uic.edu/~bstorm/sbs_2010.pdf),[26](http://www.pashler.com/Articles/Pashler.Rohrer.Cepeda.Carpenter_2007.pdf),[35](http://commonsenseatheism.com/wp-content/uploads/2011/01/Roediger-Test-Enhanced-Learning.pdf)^ Why do students keep studying the original materials? Certainly if this is their only choice, then restudying is a necessary tactic. Another answer may be that repeated studying falsely inflates students' confidence in their ability to remember in the future because they sense that they understand it now, and they and their instructors may be unaware of the many studies that show poor retention on delayed testing after this form of repetition.^25,26,35^" [^Kornell2010]: From Kornell et al 2010: > "Contrary to the massing-aids-induction hypothesis, final test performance was consistently and considerably superior in the spaced condition. A large majority of participants, however, judged massing to be more effective than spacing, despite making the judgment *after* taking the test. > > ...Metacognitive judgments—that is, judgments about one's own memory and cognition—are often based on feelings of fluency(e.g., see [Benjamin, Bjork, & Schwartz, 1998](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.118.1497&rep=rep1&type=pdf); [Rhodes & Castel, 2008](http://www.ncbi.nlm.nih.gov/pubmed/18999356)). Because massing naturally leads to feelings of fluency and increases short-term task performance during learning, learners frequently rate spacing as less effective than massing, even when their performance shows the opposite pattern ([Baddeley & Longman, 1978](http://www.tandfonline.com/doi/abs/10.1080/00140137808931764); Kornell & Bjork, 2008; Simon & Bjork, 2001; Zechmeister & Shaughnessy, 1980). Averaged across Kornell and Bjork's (2008) experiments, for example, more than 80% of participants rated massing as equally or more effective than spacing,whereas only 15% of participants actually performed better in the massed condition than in the spaced condition. > > ...Such an illusion was apparent in the induction condition. Contrary to previous research, however, participants gave higher ratings for spacing than massing during repetition learning (see, e.g., Simon & Bjork, 2001; Zechmeister & Shaughnessy, 1980). This outcome may have occurred because of a process of a habituation: Six presentations and a total of 30 s spent studying a single painting may have come to seem inefficient and pointless. Thus, there appears to be a turning point in metacognitive ratings based on fluency: As fluency increases, metacognitive ratings increase up to a point, but as fluency continues to increase and encoding or retrieval becomes too easy, metacognitive ratings may begin to decrease. > > ...In advance of their research, Kornell and Bjork (2008) were convinced that such inductive learning would benefit from massing, yet their results showed the opposite. Undaunted, we remained convinced that spacing would be more beneficial for repetition learning than for inductive learning— especially for older adults, given their overall declines in episodic memory. The current results disconfirmed our expectations once again. If our intuitions are erroneous, despite our years spent proving and praising the spacing effect—including roughly 40 years’ worth contributed by Robert A. Bjork—those of the average student are surely mistaken as well (as the inaccuracy of the participants’ metacognitive ratings suggests). We have, perhaps, fallen victim to the illusion that making learning easy makes learning effective, rather than recognizing that spacing is a desirable difficulty ([Bjork, 1994](http://bjorklab.psych.ucla.edu/pubs/RBjork_1994a.pdf)) that enhances inductive learning as well as repetition learning well into old age." [^cramming]: Anki has its [Cram Mode](http://ankisrs.net/docs/CramMode.html) and Mnemosyne 2.0 has a cramming plugin. When a SRS doesn't have explicit support, it's always possible to 'game' the algorithm by setting one's scores artificially low, so the SR algorithm thinks you are very stupid and need to do a lot of repetitions. [^vacha]: One study looking at cramming is the 1993 ["Cramming: A barrier to student success, a way to beat the system or an effective learning strategy?"](http://psycnet.apa.org/?&fa=main.doiLanding&uid=1993-39276-001), abstract: > "Tested the hypothesis that cramming is an ineffective study strategy by examining the weekly study diaries of 166 undergraduates. All Ss also completed an end-of-semester questionnaire measuring study habits. Ss were classified in the following study patterns: ideal, confident, zealous, or crammer. Contrary to the hypothesis, results suggest that cramming is an effective approach, most widespread in courses using take-home essay examinations and major research papers. Crammers' grades were as good as or better than those of Ss using other strategies; the longer Ss were in college, the more likely it was that they crammed. Crammers studied more hours than most students and were as interested in their courses as other students." Note that there is no measure of long-term retention, suggesting that people who only care about grades are rationally choosing to cram. [^conway]: ["Examining the examiners: Why are we so bad at assessing students?"](http://www.commonsenseatheism.com/wp-content/uploads/2011/02/Newstead-Examining-the-examiners.pdf), Stephen Newstead: > "[Conway, Cohen and Stanhope (1992)](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.56.6901&rep=rep1&type=pdf) looked at long term memory for the information presented on a psychology course. They found that some types of information, especially that relating to research methods, were remembered better than others. But in a follow up analysis, they found that the type of assessment used had an effect on memory. In essence, material assessed by continuous assessment was more likely to be remembered than information assessed by exams." As one would expect if the testing and pacing effects are real things, students who naturally test themselves and study well in advance of exams tend to have higher GPAs.[^Hartwig] [^Hartwig]: ["Study strategies of college students: Are self-testing and scheduling related to achievement?"](/docs/2012-hartwig.pdf), Hartwig & Dunlosky 2012: > "Previous studies, such as those by Kornell and Bjork (_Psychonomic Bulletin & Review_, 14:219–224, 2007) and Karpicke, Butler, and Roediger (_Memory_, 17:471–479, 2009), have surveyed college students’ use of various study strategies, including self-testing and rereading. These studies have documented that some students do use self-testing (but largely for monitoring memory) and rereading, but the researchers did not assess whether individual differences in strategy use were related to student achievement. Thus, we surveyed 324 undergraduates about their study habits as well as their college grade point average (GPA). Importantly, the survey included questions about self-testing, scheduling one’s study, and a checklist of strategies commonly used by students or recommended by cognitive research. Use of self-testing and rereading were both positively associated with GPA. Scheduling of study time was also an important factor: Low performers were more likely to engage in late-night studying than were high performers; massing (vs. spacing) of study was associated with the use of fewer study strategies overall; and all students—but especially low performers—were driven by impending deadlines. Thus, self-testing, rereading, and scheduling of study play important roles in real-world student achievement." Note the self-testing correlation excludes flashcards, a result that both the authors and me found surprising. The sleep connection is interesting, given the [hypothesized link](#when-to-review) between stronger memory formation & studying before a good night's sleep - you can hardly get a good night's sleep if you are cramming late into the night (correlated with lower grades) but you can if you do so at a reasonable time in the evening (in time to get a solid night). This short-term perspective is not a good thing in the long term, of course. Knowledge builds on knowledge; one is not learning independent bits of trivia. [Richard Hamming](!Wikipedia) recalls in ["You and Your Research"](http://www.cs.virginia.edu/~robins/YouAndYourResearch.html): > "You observe that most great scientists have tremendous drive. I worked for ten years with [John Tukey](!Wikipedia) at [Bell Labs](!Wikipedia). He had tremendous drive. One day about three or four years after I joined, I discovered that John Tukey was slightly younger than I was. John was a genius and I clearly was not. Well I went storming into [Bode](!Wikipedia "Hendrik Wade Bode")'s office and said, "How can anybody my age know as much as John Tukey does?" He leaned back in his chair, put his hands behind his head, grinned slightly, and said, "You would be surprised Hamming, how much you would know if you worked as hard as he did that many years." I simply slunk out of the office! > > What Bode was saying was this: "Knowledge and productivity are like [compound interest](!Wikipedia)." Given two people of approximately the same ability and one person who works ten percent more than the other, the latter will more than twice outproduce the former. The more you know, the more you learn; the more you learn, the more you can do; the more you can do, the more the opportunity - it is very much like compound interest. I don't want to give you a rate, but it is a very high rate. Given two people with exactly the same ability, the one person who manages day in and day out to get in one more hour of thinking will be tremendously more productive over a lifetime. I took Bode's remark to heart; I spent a good deal more of my time for some years trying to work a bit harder and I found, in fact, I could get more work done." Knowledge needs to accumulate, and flashcards with spaced repetition can aid in just that accumulation, fostering steady review even as the number of cards and intellectual prerequisites mounts into the [thousands](#the-workload). This long term focus may explain why explicit spaced repetition is an uncommon studying technique: the pay-off is distant and unobvious, the cost of self-control near and vivid. (See [hyperbolic discounting](!Wikipedia).) It doesn't help that it's pretty difficult to figure out *when* one should review - the optimal point is when you're just about to forget about it, but that's the kicker: if you're just about to forget about it, how are you supposed to remember to review it? You only remember to review what you remember, and what you already remember isn't what you need to review![^wired1] [^wired1]: "SuperMemo is based on the insight that there is an ideal moment to practice what you've learned. Practice too soon and you waste your time. Practice too late and you've forgotten the material and have to relearn it. The right time to practice is just at the moment you're about to forget. Unfortunately, this moment is different for every person and each bit of information. Imagine a pile of thousands of flash cards. Somewhere in this pile are the ones you should be practicing right now. Which are they?" Gary Wolf, ["Want to Remember Everything You'll Ever Learn? Surrender to This Algorithm"](http://www.wired.com/medtech/health/magazine/16-05/ff_wozniak), _[Wired Magazine](!Wikipedia)_ The paradox is resolved by letting a computer handle all the calculations. We can thank [Ebbinghaus](http://psy.ed.asu.edu/~classics/Ebbinghaus/index.htm) for investigating in such tedious detail than we can, in fact, program a computer to calculate both the forgetting curve and optimal set of reviews^["Make no mistake about it: Computers process numbers - not symbols. We measure our understanding (and control) by the extent to which we can arithmetize an activity." Perlis, _ibid._]. This is the insight behind [spaced repetition](!Wikipedia) software: ask the same question over and over, but over increasing spans of time. You start with asking it once every few days, and soon the human remembers it reasonably well. Then you expand intervals out to weeks, then months, and then years. Once the memory is formed and dispatched to long-term memory, it needs but occasional exercise to remain hale and hearty^[this exponential expansion is how a SR program can handle continual input of cards: obviously if cards were scheduled at fixed intervals, like every other day, review would soon become quite impossible - I have >8000 items in Mnemosyne, but I don't have time to review 4500 questions a day!] - I remember well the large dinosaurs made of cardboard for my 4^th^ or 5^th^ birthday, or the tunnel made out of boxes, even though I recollect them once or twice a year at most. ## Literature review But don't take my word for it - _Nullius in verba_! We can look at the science. Of course, if you do take my word for it, you probably just want to read about how to use it and all the nifty things you can do, so I suggest you [skip all the way down](#using-it) to that section. Everyone else, we start at the beginning: ### Background: testing works! > "If you read a piece of text through twenty times, you will not learn it by heart so easily as if you read it ten times while attempting to recite from time to time and consulting the text when your memory fails."^[_[The New Organon](!Wikipedia)_, [Francis Bacon](!Wikipedia)] The [testing effect](!Wikipedia) is the well-established psychological effect that the mere act of testing someone's memory will strengthen the memory, regardless of whether there is feedback. Since [spaced repetition](!Wikipedia) is just testing on particular days, we ought to establish that testing works better than regular review or study, and that it works outside of memorizing random dates in history. To cover a few studies: 1. Allen, G.A., Mahler, W.A., & Estes, W.K. (1969). ["Effects of recall tests on long-term retention of paired associates"](http://www.mendeley.com/research/effects-of-recall-tests-on-longterm-retention-of-paired-associate/). _Journal of Verbal Learning and Verbal Behavior_, 8, 463-470 1 test results in memories as strong a day later as studying 5 times; intervals improve retention compared to massed presentation. 2. Karpicke & Roediger (2003). ["The Critical Importance of Retrieval for Learning"](http://commonsenseatheism.com/wp-content/uploads/2011/01/Karpicke-The-critical-importance-of-retrieval-for-learning.pdf) In learning Swahili vocabulary, students were given varying routines of testing or studying or testing and studying; this resulted in similar scores during the learning phase. Students were asked to predict what percentage they'd remember (average: 50% over all groups). One week later, the students who tested remembered ~80% of the vocabulary versus ~35% for non-testing students. Some students were tested or studied more than others; diminishing returns set in very quickly once the memory had formed the first day. Students reported rarely testing themselves and not testing already learned items. Lesson: again, testing improves memory compared to studying. Also, no student knows this. 3. Roediger & Karpicke (2006a). ["Test-Enhanced Learning: Taking Memory Tests Improves Long-Term Retention"](http://commonsenseatheism.com/wp-content/uploads/2011/01/Roediger-Test-Enhanced-Learning.pdf) Students were tested (with no feedback) on reading comprehension of a passage over 5 minutes, 2 days, and 1 week. Studying beat testing over 5 minutes, but nowhere else; students believed studying superior to testing over all intervals. At 1 week, testing scores were ~60% versus ~40%. Lesson: testing improves memory compared to studying. Everyone (teachers & students) 'knows' the opposite. 4. Karpicke & Roediger (2006a). ["Expanding retrieval promotes short-term retention, but equal interval retrieval enhances long-term retention"](http://learninglab.psych.purdue.edu/downloads/2007_Karpicke_Roediger_JEPLMC.pdf) General scientific prose comprehension; from Roediger & Karpicke 2006b: "After 2 days, initial testing produced better retention than restudying (68% vs. 54%), and an advantage of testing over restudying was also observed after 1 week (56% vs. 42%)." 5. Roediger & Karpicke (2006b). ["The Power of Testing Memory: Basic Research and Implications for Educational Practice"](http://commonsenseatheism.com/wp-content/uploads/2011/01/Roediger-The-power-of-testing-memory.pdf) Literature review; 7 studies before 1941 demonstrating testing improves retention, and 6 afterwards. 6. Agarwal et al. (2008) ["Examining the Testing Effect with Open- and Closed-Book Tests"](http://commonsenseatheism.com/wp-content/uploads/2011/01/Agarwal-Examining-the-Testing-Effect.pdf) As with #2, the purer forms of testing (in this case, open-book versus closed-book testing) did better over the long run, and students were deluded about what worked best. 7. Bangert-Drowns, Kulik, and Kulik (1991). ["Effects of frequent classroom testing"](http://www.jstor.org/pss/27540459) Meta-analysis of 35 studies (1929-1989) varying tests during school semesters. 29 found benefits; 5 found negatives; 1 null result. Meta-study found large benefits to testing even once, then diminishing returns. 8. Cook 2006, ["Impact of self-assessment questions and learning styles in Web-based learning: a randomized, controlled, crossover trial"](http://www.ncbi.nlm.nih.gov/pubmed/16501263); final scores were higher when the doctors (residents) learned with questions. 9. Johnson & Kiviniemi 2009, ["The Effect of Online Chapter Quizzes on Exam Performance in an Undergraduate Social Psychology Course"](http://www.ncbi.nlm.nih.gov/pubmed/20046908) ("This study examined the effectiveness of compulsory, mastery-based, weekly reading quizzes as a means of improving exam and course performance. Completion of reading quizzes was related to both better exam and course performance.") (One might be tempted to object that testing works for *some* [learning styles](!Wikipedia), perhaps verbal styles. This is an unsupported assertion inasmuch as the experimental literature on learning styles is poor and the existing evidence mixed that there are such things as learning styles.[^style]) [^style]: See the 2008 meta-analysis, ["Learning Styles: Concepts and Evidence"](http://www.psychologicalscience.org/journals/pspi/PSPI_9_3.pdf) ([APS press release](http://www.psychologicalscience.org/index.php/news/releases/learning-styles-debunked-there-is-no-evidence-supporting-auditory-and-visual-learning-psychologists-say.html)); from the abstract: > "...in order to demonstrate that optimal learning requires that students receive instruction tailored to their putative learning style, the experiment must reveal a specific type of interaction between learning style and instructional method: Students with one learning style achieve the best educational outcome when given an instructional method that differs from the instructional method producing the best outcome for students with a different learning style. In other words, the instructional method that proves most effective for students with one learning style is not the most effective method for students with a different learning style. > > Our review of the literature disclosed ample evidence that children and adults will, if asked, express preferences about how they prefer information to be presented to them. There is also plentiful evidence arguing that people differ in the degree to which they have some fairly specific aptitudes for different kinds of thinking and for processing different types of information. However, we found virtually no evidence for the interaction pattern mentioned above, which was judged to be a precondition for validating the educational applications of learning styles. Although the literature on learning styles is enormous, very few studies have even used an experimental methodology capable of testing the validity of learning styles applied to education. Moreover, of those that did use an appropriate method, several found results that flatly contradict the popular meshing hypothesis. > > We conclude therefore, that at present, there is no adequate evidence base to justify incorporating learning-styles assessments into general educational practice. Thus, limited education resources would better be devoted to adopting other educational practices that have a strong evidence base, of which there are an increasing number. However, given the lack of methodologically sound studies of learning styles, it would be an error to conclude that all possible versions of learning styles have been tested and found wanting; many have simply not been tested at all." #### Subjects The above studies often used pairs of words or words themselves. How well does the testing effect generalize? Materials which benefited from testing: - foreign vocabulary (eg. Karpicke & Roediger 2003, [Cepeda et al 2009](http://www.cs.colorado.edu/~mozer/Research/Selected%20Publications/reprints/Cepedaetal2009.pdf), Fritz et al 2007^[Fritz, C. O., Morris, P. E., Acton, M., Etkind, R., & Voelkel, A. R (2007). "Comparing and combining expanding retrieval practice and the keyword mnemonic for foreign vocabulary learning". _Applied Cognitive Psychology_, 21, 499-526.]) - [GRE](!Wikipedia) materials (like vocab, [Kornell 2009](http://sites.williams.edu/nk2/files/2011/08/Kornell.2009b.pdf)); prose passages on general scientific topics (Karpicke & Roediger, 2006a; Pashler et al., 2003) - trivia ([McDaniel & Fisher](http://www.mendeley.com/research/tests-and-test-feedback-as-learning-sources/), 1991) - elementary & middle school lessons with subjects such as biographical material and science ([Gates 1917](http://www.archive.org/details/recitationasafa00gategoog); [Spitzer 1939](/docs/1939-spitzer.pdf)[^Spitzer] and Vlach & Sandhofer 2012[^Vlach], respectively) - Agarwal et al (2006): short-answer tests superior on textbook passages - history textbooks; retention better with initial short-answer test rather than multiple choice ([Nungester & Duchastel 1982](/docs/1982-nungester.pdf)) - [LaPorte & Voss (1975)](/docs/1975-laporte.pdf) also found better retention compared to multiple-choice or recognition problems - [Duchastel & Nungester, 1981](/docs/1981-duchastel): 6 months after testing, testing beat studying in retention of a history passage - [Duchastel (1981)](/docs/1981-duchastel.pdf): free recall decisively beat short-answer & multiple choice for reading comprehension of a history passage - [Glover (1989)](/docs/1989-glover.pdf): free recall self-test beat recognition or [Cloze deletions](!Wikipedia); subject matter was the labels for parts of flowers - [Kang, McDermott, and Roediger (2007)](http://memory.wustl.edu/Pubs/2007_Kang.pdf): prose passages; initial short answer testing produced superior results 3 days later on both multiple choice and short answer tests - [Leeming (2002)](http://www.eric.ed.gov/ERICWebPortal/recordDetail?accno=EJ761434): tests in 2 psychology courses, introductory & memory/learning; "80% vs. 74% for the introductory psychology course and 89% vs. 80% for the learning and memory course"^[See also Balch 2006, who compared spacing & massed in an introductory psychology course as well.] [^Spitzer]: From Balota, describing Spitzer, H. F. (1939); "Studies in retention"; _Journal of Educational Psychology_, 30, 641-657: > "Spitzer (1939) incorporated a form of expanded retrieval in a study designed to assess the ability of sixth graders to learn science facts. Impressively, Spitzer tested over 3600 students in Iowa-the entire sixth-grade population of 91 elementary schools at the time. The students read two articles, one on peanuts and the other on bamboo, and were given a 25-item multiple choice test to assess their knowledge (such as 'To which family of plants does bamboo belong?'). Spitzer tested a total of nine groups, manipulating both the timing of the test (administered immediately or after various delays) and the number of identical tests students received (one to three). Spitzer did not incorporate massed or equal interval retrieval conditions, but he had at least two groups that were tested on an expanding schedule of retrieval, in which the intervals between tests were separated by the passage of time (in days) rather than by intervening to-be-learned information. For example, in one of the groups, the first test was given immediately, the second test was given seven days after the first test, and the third test was given 63 days after the second test. Thus, in essence, this group was tested on a 0-7-63 day expanding retrieval schedule. Spitzer compared performance of the expanded retrieval group to a group given a single test 63 days after reading the original article. On the first (immediate) test, the expanded retrieval group correctly answered 53% of the questions. After 63 days and two previous tests, their score was still an impressive 43%. The single test group correctly answered only 25% of the original items after 63 days, giving the expanded retrieval group an 18% retention advantage. This is quite impressive, given that this large benefit remained after a 63-day retention interval. Similar beneficial effects were found in a group tested on a 0-1-21 day expanded retrieval schedule compared to a group given a single test after 21 days. Of course, this study does not decouple the effects of testing from spacing or expansion, but the results do clearly indicate considerable learning and retention using the expanded repeated testing procedure. Spitzer concluded that '...examinations are learning devices and should not be considered only as tools for measuring achievement of pupils' (p. 656, italics added)" [^Vlach]: ["Distributing Learning Over Time: The Spacing Effect in Children's Acquisition and Generalization of Science Concepts"](http://www.gse.uci.edu/docs/VlachSandhoferChildDevelopment.pdf): > "The spacing effect describes the robust finding that long-term learning is promoted when learning events are spaced out in time, rather than presented in immediate succession. Studies of the spacing effect have focused on memory processes rather than for other types of learning, such as the acquisition and generalization of new concepts. In this study, early elementary school children (5-7 year-olds; _N_ = 36) were presented with science lessons on one of three schedules: massed, clumped, and spaced. The results revealed that spacing lessons out in time resulted in higher generalization performance for both simple and complex concepts. Spaced learning schedules promote several types of learning, strengthening the implications of the spacing effect for educational practices and curriculum." This covers a pretty broad range of what one might call 'declarative' knowledge. Extending testing to other fields is more difficult and may reduce to 'write many frequent analyses, not large ones' or 'do lots of small exercises', whatever those might mean in those fields: > "A third issue, which relates to the second, is whether our proposal of testing is really appropriate for courses with complex subject matters, such as the philosophy of Spinoza, Shakespeare's comedies, or creative writing. Certainly, we agree that most forms of objective testing would be difficult in these sorts of courses, but we do believe the general philosophy of testing (broadly speaking) would hold-students should be continually engaged and challenged by the subject matter, and there should not be merely a midterm and final exam (even if they are essay exams). Students in a course on Spinoza might be assigned specific readings and thought-provoking essay questions to complete every week. This would be a transfer-appropriate form of weekly 'testing' (albeit with take-home exams). Continuous testing requires students to continuously engage themselves in a course; they cannot coast until near a midterm exam and a final exam and begin studying only then."^[Roediger & Karpicke 2006b again.] #### Downsides Testing does have some known flaws: 1. interference in recall - ability to remember tested items drives out ability to remember similar untested items Most/all studies were in laboratory settings and found relatively small effects: > "In sum, although various types of recall interference are quite real (and quite interesting) phenomena, we do not believe that they compromise the notion of test-enhanced learning. At worst, interference of this sort might dampen positive testing effects somewhat. However, the positive effects of testing are often so large that in most circumstances they will overwhelm the relatively modest interference effects." 2. multiple choice tests can accidentally lead to 'negative suggestion effects' where having previously seen a falsehood as an item on the test makes one more likely to believe it. This is mitigated or eliminated when there's quick feedback about the right answer (see Butler & Roediger 2008 ["Feedback enhances the positive effects and reduces the negative effects of multiple-choice testing"](http://commonsenseatheism.com/wp-content/uploads/2011/01/Butler-Feedback-enhances-the-positive-effects.pdf)). Solution: don't use multiple choice; inferior in testing ability to free recall or short answers, anyway. Neither problem seems major. ### Distributed A lot depends *when* you do all your testing. Above we saw some benefits to testing a lot the moment you learn something, but the same number of tests could be spread out over time, to give us the *spacing effect* or *spaced repetition*. There are hundreds of studies involving the spacing effect^[For example, [Cepeda 2006](http://uweb.cas.usf.edu/~drohrer/pdfs/Cepeda_et_al_2006PsychBull.pdf) is a review of 184 articles with 317 experiments.], and almost unanimously they find spacing out tests is superior to massed testing when the final test/measurement is conducted days or years later[^superiority], although the mechanism isn't clear[^mechanism]. Besides all the previously mentioned studies, we can throw in: - Peterson, L. R., Wampler, R., Kirkpatrick, M., & Saltzman, D. (1963). ["Effect of spacing presentations on retention of a paired associate over short intervals"](/docs/1963-peterson.pdf). _Journal of Experimental Psychology_, 66(2), 206-209 - Glenberg, A. M. (1977). ["Influences of retrieval processes on the spacing effect in free recall"](/docs/1977-glenberg.pdf). _Journal of Experimental Psychology: Human Learning and Memory_, 3(3), 282-294 - Balota, D. A., Duchek, J. M., & Paullin, R. (1989). ["Age-related differences in the impact of spacing, lag and retention interval"](http://www.artsci.wustl.edu/~dbalota/Age%20related%20differences%20in%20the%20impact%20of%20spacing%20Balota%20Duchek%20Paullin.pdf). _Psychology and Aging_, 4, 3-9 [^mechanism]: The Balota review offers a synthesis of current theories on how massed and spaced differ, based on [memory encoding](!Wikipedia "Encoding (memory)"): > "According to encoding variability theory, performance on a memory test is dependent upon the overlap between the contextual information available at the time of test and the contextual information available during encoding. During massed study, there is relatively little time for contextual elements to fluctuate between presentations and so this condition produces the highest performance in an immediate memory test, when the test context strongly overlaps with the same contextual information encoded during both of the massed presentations. In contrast, when there is spacing between the items, there is time for fluctuation to take place between the presentations during study, and hence there is an increased likelihood of having multiple unique contexts encoded. Because a delayed test will also allow fluctuation of context, it is better to have multiple unique contexts encoded, as in the spaced presentation format, as opposed to a single encoded context, as in the massed presentation format." The research literature focuses *extensively* on the question of *what kind* of spacing is best and what this implies about memory: a spacing that has static fixed intervals or a spacing which expands? This is very important for understanding memory and building models of it. But for practical purposes, this is very uninteresting; to sum it up, there are many studies pointing each way, and whatever difference in efficiency exists, is minimal. Most existing software follows SuperMemo in using an expanding spacing algorithm, so it's not worth worrying about; as Mnemosyne developer Peter Bienstman says, it's not clear the more complex algorithms really help[^bienstman], and the Anki developers [were concerned about](http://ankisrs.net/docs/FrequentlyAskedQuestions.html#_what_spaced_repetition_algorithm_does_anki_use) the larger errors SM3+ risks attempting to be more optimal. So too here. [^superiority]: Balota review: > "No feedback or correction was given to subjects if they made errors or omitted answers. Landauer and Bjork found that the expanding-interval schedule produced better recall than equal-interval testing on a final test at the end of the session, and equal-interval testing, in turn, produced better recall than did initial massed testing. Thus, despite the fact that massed testing produced nearly errorless performance during the acquisition phase, the other two schedules produced better retention on the final test given at the end of the session. However, the difference favoring the expanding retrieval schedule over the equal-interval schedule was fairly small at around 10%. In research following up Landauer and Bjork's (1978) original experiments, practically all studies have found that spaced schedules of retrieval (whether equal-interval or expanding schedules) produce better retention on a final test given later than do massed retrieval tests given immediately after presentation (e.g., Cull, 2000; Cull, Shaughnessy, & Zechmeister, 1996), although exceptions do exist. For example, in Experiments 3 and 4 of Cull et al. (1996), massed testing produced performance as good as equal-interval testing on a 5-5-5 schedule, but most other experiments have found that any spaced schedule of testing (either equal-interval or expanding) is better than a massed schedule for performance on a delayed test. However, whether expanding schedules are better than equal-interval schedules for long-term retention-the other part of Landauer and Bjork's interesting findings-remains an open question. Balota, Duchek, and Logan (in press) have provided a thorough consideration of the relevant evidence and have shown that it is mixed at best, and that most researchers have found no difference between the two schedules of testing. That is, performance on a final test at the end of a session often shows no difference in performance between equal-interval and expanding retrieval schedules." Cull, for those curious (Cull, W. L. (2000). ["Untangling the benefits of multiple study opportunities and repeated testing for cued recall"](http://www.mendeley.com/research/untangling-benefits-multiple-study-opportunities-repeated-testing-cued-recall/). _Applied Cognitive Psychology_, 14, 215-235): > "Cull (2000) compared expanded retrieval to equal interval spaced retrieval in a series of four experiments designed to mimic typical teaching or study strategies encountered by students. He examined the role of testing versus simply restudying the material, feedback, and various retention intervals on final test performance. Paired associates (an uncommon word paired with a common word, such as bairn-print) were presented in a manner similar to the flashcard techniques students often use to learn vocabulary words. The intervals between retrieval attempts of to-be-learned information ranged from minutes in some experiments to days in others. Interestingly, across four experiments, Cull did not find any evidence of an advantage of an expanded condition over a uniform spaced condition (i.e., no significant expanded retrieval effect), although both conditions consistently produced large advantages over massed presentations. He concluded that distributed testing of any kind, expanded or equal interval, can be an effective learning aid for teachers to provide for their students." [^bienstman]: From Mnemosyne's [Principles](http://www.mnemosyne-proj.org/principles.php) page: > "The Mnemosyne algorithm is very similar to [SM2](http://www.supermemo.com/english/ol/sm2.htm) used in one of the early versions of SuperMemo. There are some modifications that deal with early and late repetitions, and also to add a small, healthy dose of randomness to the intervals. > > Supermemo now uses SM11. However, we are a bit skeptical that the huge complexity of the newer SM algorithms provides for a statistically relevant benefit. But, that is one of the facts we hope to find out with our data collection. > > We will only make modifications to our algorithms based on common sense or if the data tells us that there is a statistically relevant reason to do so." For those interested, 3 of the studies that found fixed spacings better than expanding: 1. Carpenter, S. K., & DeLosh, E. L. (2005). ["Application of the testing and spacing effects to name learning"](http://www.psychology.iastate.edu/~shacarp/Carpenter_DeLosh_2005.pdf). _Applied Cognitive Psychology_, 19, 619-636[^carpenter] 2. Logan, J. M. (2004). _Spaced and expanded retrieval effects in younger and older adults_. Unpublished doctoral dissertation, Washington University, St. Louis, MO This thesis is interesting inasmuch as Logan found that young adults did considerably worse with an expanding spacing after a day. 3. Karpicke & Roediger, 2006a The fixed vs expanding issue aside, a list of additional generic studies finding benefits to spaced vs massed: - [Cepeda et al. 2006](http://commonsenseatheism.com/wp-content/uploads/2011/01/Cepeda-Distributed-Practice-in-Verbal-Recall-Trasks.pdf) (large review used elsewhere in this page) - Karpicke & Roediger 2006a - Rohrer & Taylor 2006. ["The effects of over-learning and distributed practice on the retention of mathematics knowledge"](http://commonsenseatheism.com/wp-content/uploads/2011/01/Rohrer-The-Effect-of-Overlearning-on-Long-Term-Retention.pdf). _Applied Cognitive Psychology_, 20: 1209-1224. - Seabrook et al 2005. ["Distributed and Massed Practice: From Laboratory to Classroom"](http://commonsenseatheism.com/wp-content/uploads/2011/01/Saebrook-Distributed-and-massed-practice-From-laboratory-to-classroom.pdf) ([abstract](http://onlinelibrary.wiley.com/doi/10.1002/acp.1066/abstract)) - Keppel, Geoffrey. ["A Reconsideration of the Extinction-Recovery Theory"](/docs/1967-keppel.pdf). _Journal of Verbal Learning & Verbal Behavior_. 6(4) 1967, 476-486 A week later, the massed reviewers went from 5.9 correct ~> 2.1; the spaced reviewers went from 5.5 ~> 5.0. (Note the usual observation: massed was initially better, and later much worse, less than half as good.) - Bloom, Kristine C; Schuell, Thomas J. ["Effects of massed and distributed practice on the learning and retention of second-language vocabulary"](http://www.jstor.org/pss/27539823). _Journal of Educational Research_. Vol 74(4) Mar-Apr 1981, 245-248. Four days after the 2 high school groups memorized 16 French words, the spaced group remembered 15 and the massed 11. Rea, Cornelius P; Modigliani, Vito. ["The effect of expanded versus massed practice on the retention of multiplication facts and spelling lists"](http://psycnet.apa.org/?&fa=main.doiLanding&uid=1986-07610-001). _Human Learning: Journal of Practical Research & Applications_. Vol 4(1) Jan-Mar 1985, 11-18[^multiplication] > "A test immediately following the training showed superior performance for the distributed group (70 percent correct) compared to the massed group (53 percent correct). These results seem to show that the spacing effect applies to school-age children and to at least some types of materials that are typically taught in school."^[[Balota et al](http://www.psych.wustl.edu/coglab/publications/Balota+et+al+roddy+chapter.pdf).] - Donovan, John J; Radosevich, David J. ["A meta-analytic review of the distribution of practice effect: Now you see it, now you don't"](http://web.archive.org/web/20030529134732/http://www.unlv.edu/Colleges/Health_Sciences/Kinesiology/classes/KIN750/meta.pdf). _Journal of Applied Psychology_. Vol 84(5) October 1999, 795-805 > "According to Donovan and Radosevich's meta-analysis of spacing studies, the [effect size](!Wikipedia) for the spacing effect is d = .42. This means that the average person getting distributed training remembers better than about 67 percent of the people getting massed training. This effect size is nothing to sneeze at-in education research, effect sizes as low as d = .25 are considered practically significant, while effect sizes above d = 1 are rare."^[Balota et al] > "In one meta-analysis by Donovan and Radosevich (1999), for instance, the size of the spacing effect declined sharply as conceptual difficulty of the task increased from low (e.g. rotary pursuit) to average (e.g. word list recall) to high (e.g. puzzle). By this finding, the benefits of spaced practise may be muted for many mathematics tasks."^[Rohrer & Taylor 2006] The Donovan meta-analysis notes that the effect size is smaller in studies with better methodology, but still significant. - Bahrick, Harry P; Phelphs, Elizabeth. ["Retention of Spanish vocabulary over 8 years"](/docs/1987-bahrick.pdf). _Journal of Experimental Psychology: Learning, Memory, & Cognition_. Vol 13(2) April 1987, 344-349; the extremely long delay after the initial training period makes this particularly interesting: > Harry Bahrick and Elizabeth Phelps (1987) examined the retention of 50 Spanish vocabulary words after an eight-year delay. Subjects were divided into three groups. Each practiced for seven or eight sessions, separated by a few minutes, a day, or 30 days. In each session, subjects practiced until they could produce the list perfectly one time....Eight years later, people in the no-delay group could recall 6 percent of the words, people in the one-day delay group could remember 8 percent, and those in the 30-day group averaged 15 percent. Everyone also took a multiple choice test, and again, the spacing effect was observed. The no-delay group scored 71 percent, the one-day group scored 80 percent, and the 30-day group scored 83 percent. > > ...Bahrick and his colleagues varied both the spacing of practice and the amount of practice. Practice sessions were spaced 14, 28, or 56 days apart, and totaled 13 or 26 sessions. They tested subjects' memory one, two, three, and five years after training. Once again, it took a bit longer to reach the criterion within each session when practice sessions were spaced farther apart, but again, this small investment paid dividends years later. It didn't matter whether testing occurred at one, two, three, or five years after practice-the 56-day group always remembered the most, the 28-day group was next, and the 14-day group remembered the least. Further, the effect was quite large. If words were practiced every 14 days, you needed twice as much practice to reach the same level of performance as when words were practiced every 56 days! - Pashler et al., 2003; ["Is Temporal Spacing of Tests Helpful Even When It Inflates Error Rates?"](http://www.pashler.com/Articles/Pashler_Zarow_Triplett03.pdf) Long intervals between tests necessarily means you will often err; errors were thought to intrinsically reduce learning. While the extra errors do damage accuracy in the short-run, the long intervals are powerful enough that they still win. - works in ill subpopulations: - works on short-term review conducted with Alzheimer's patients; spacing used on the scale of seconds and minutes, with modest success in teaching object locations or daily tasks to do[^calendar]: - Camp, C. J. (1989). "Facilitation of new learning in Alzheimer's disease". In G. C. Gilmore, P. J. Whitehouse, & M. L. Wykle (Eds.), _Memory, aging, and dementia_ (pp. 212-225) - Camp, C. J., & McKitrick, L. A. (1992). "Memory interventions in Alzheimer's-type dementia populations: Methodological and theoretical issues". In R. L. West & J. D. Sinnott (Eds.), _Everyday memory and aging: Current research and methodology_ (pp. 152-172) - works with traumatic brain injury; Goverover et al 2009, ["Application of the spacing effect to improve learning and memory for functional tasks in traumatic brain injury: a pilot study."](http://www.thefreelibrary.com/Application+of+the+spacing+effect+to+improve+learning+and+memory+for...-a0212106703) - and multiple sclerosis; Goverover et al 2009, ["A functional application of the spacing effect to improve learning and memory in persons with multiple sclerosis"](http://www.ncbi.nlm.nih.gov/pubmed/18720184) - math[^rohrerwarning]: - multiplication (Ria & Modigliani 1985) - [permuting](!Wikipedia) a sequence ([Rohrer & Taylor 2006](http://uweb.cas.usff.edu/~drohrer/pdfs/Rohrer%26Taylor2006ACP.pdf)) - calculating the volume of [polyhedrons](!Wikipedia) ([Rohrer & Taylor 2007](http://uweb.rc.usf.edu/~drohrer/pdfs/Rohrer&Taylor2007IS.pdf)) - statistics ([Smith & Rothkopf 1984](http://people.tamu.edu/~stevesmith/SmithMemory/SmithRothkopf1984.pdf)) - pre-calculus ([Revak 1997](http://www.shs-wasc.info/Homework-FloridaJournalofEducational.pdf)[^pre-calculus] but there's a related [null 'calculus I' result](http://digitalcommons.uconn.edu/dissertations/AAI3464319/) as well) and [algebra](http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1284369/pdf/12102132.pdf) - medicine ([Kerfoot & Brotschi 2009](http://www.ncbi.nlm.nih.gov/pubmed/18614145); [Kerfoot 2009](http://www.ncbi.nlm.nih.gov/pubmed/19375095), a 2 years later followup to [Kerfoot et al 2007](http://www.ncbi.nlm.nih.gov/pubmed/17209889); Kerfoot has a number of [other relevant studies](http://app.qstream.com/pricekerfoot)) and surgery (Moulton et al 2006, ["Teaching Surgical Skills: What Kind of Practice Makes Perfect? A Randomized, Controlled Trial"](http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1856544/), distributed practice of microvascular suturing) - introductory psychology (Balch 2006, ["Encouraging Distributed Study: A Classroom Experiment on the Spacing Effect"](http://teachpsych.org/resources/e-books/tips2011/III-01-02Balch2006.pdf)[^balch]. _Teaching of Psychology_, 33, 249-252) - eight-grade American history ([Carpenter, Pashler, and Cepeda 2009](http://laplab.ucsd.edu/articles/CarpenterPashlerCepeda2009.pdf)) - learning to read with phonics (Seabrook et al 2005) - music ([Stambaugh 2009](http://www.legacyweb.rcm.ac.uk/cache/fl0020113.pdf)) [^rohrerwarning]: Rohrer & Taylor 2006 warns us, though, about many of the other math studies: > "In one meta-analysis by Donovan and Radosevich (1999), for instance, the size of the spacing effect declined sharply as conceptual difficulty of the task increased from low (e.g. rotary pursuit) to average (e.g. word list recall) to high (e.g. puzzle). By this finding, the benefits of spaced practise may be muted for many mathematics tasks." [^pre-calculus]: What is especially nice about this study was that not only did it use high-quality (intelligent & motivated) college students ([United States Air Force Academy](!Wikipedia)), the conditions were relatively controlled - both groups had the *same* homework (so equal testing effect), but like Rohrer & Taylor 2006/2007, the *distribution* was what varied: > "The course topics, textbook, handouts, reading assignments, and graded assignments (with the exception of quiz, homework, and participation points) were identical for the treatment and control groups. The listing of homework assignments in the syllabus differed between groups. The control group was assigned daily homework related to the topic(s) presented that day in class. Peterson (1971) calls this the vertical model for assigning mathematics homework. The treatment group was assigned homework in accordance with a distributed organizational pattern that combines practice on current topics and reinforcement of previously covered topics. Under the distributed model, approximately 40% of the problems on a given topic were assigned the day the topic was first introduced, with an additional 20% assigned on the next lesson and the remaining 40% of problems on the topic assigned on subsequent lessons (Hirsch et al., 1983). In Hirsch's research and in this study, after the initial homework assignment, problem(s) representing a given topic resurfaced on the 2nd, 4th, 7th, 12th, and 21st lesson. Consequently, treatment group homework for lesson one consisted of only one topic; homework for lessons two and three consisted of two topics; and homework for lesson four through six consisted of three topics. This pattern continued as new topics were added and was applied to all non-exam, non-laboratory lessons. As shown by Tables 1 and 2, the same homework problems were assigned to both groups with only the pattern of assignment differing. Because of the nature of the distributed practice model, homework for the treatment group contained fewer problems (relative to the control group) early in the semester with the number of problems increasing as the semester progressed. Later in the semester, homework for the treatment group contained more problems (relative to the control group)....The USAFA routinely collects study time data. After each exam, a large sample of cadets (at least 60% of the course population) anonymously reported the amount of time (in minutes) spent studying for the exam. Time spent studying was approximately equal for both groups (see Table 5). Descriptive data revels that, for both the treatment and control group, study time for the third exam was at least 16% greater than study time for any other exam. Study time for the final exam was at least 68% greater than study time for any of the hourly exams (see Table 5) > > ...The treatment produced an effect size (f 2) of 0.013 on the first exam, 0.029 on the second exam, 0.035 on the fourth exam, and 0.040 on the final course percentage grade. Although the effect sizes appear to be small, the treatment group outscored the control group in every case. A mean difference of 5.13 percentage points on the first, second, and fourth exam translates to an advantage of about a third of a letter grade for students in the treatment group. In addition, higher minimum scores earned by the treatment group may indicate that the distributed practice treatment served to eliminate the extremely low scores (refer to Table 3)....Oddly, the distributed practice treatment did not produce a significant effect on final exam scores. One possible cause for the disparity was the USAFA policy exempting the top performers from the final exam. Of the 16 exempted students, 11 were from the treatment group with only 5 from the control group." [^balch]: Balch 2006 abstract: > "Two introductory psychology classes (_N_ = 145) participated in a counterbalanced classroom experiment that demonstrated the spacing effect and, by analogy, the benefits of distributed study. After hearing words presented twice in either a massed or distributed manner, participants recalled the words and scored their recall protocols, reliably remembering more distributed than massed words. Posttest scores on a multiple-choice quiz covering points illustrated by the experiment averaged about twice the comparable pretest scores, indicating the effectiveness of the exercise in conveying content. Students' subjective ratings suggested that the experiment helped convince them of the benefits of distributed study." [^carpenter]: Balota: > Carpenter and DeLosh (2005, Exp. 2) have recently investigated face-name learning under massed, expanded (1-3-5), and equal interval (3-3-3) conditions. This study also involved study and study and test procedures during the acquisition phase. Carpenter and DeLosh found a large effect of spacing, but no evidence of a benefit of expanded over equal interval practice. In fact, Carpenter and DeLosh reported a reliable benefit of the equal interval condition over the expanded retrieval condition. [^multiplication]: Balota again: > "Rea and Modigliani (1985) tested the effectiveness of expanded retrieval in a third-grade classroom setting. In separate conditions, students were given new multiplication problems or spelling words to learn. The problem or word was presented audiovisually once and then tested on either a massed retrieval schedule of 0-0-0-0 or an expanding schedule of 0-1-2-4, in which the intervals involved being tested on old items or learning new items. After each test trial for a given item, the item was re-presented in its entirety so students received feedback on what they were learning. Performance during the learning phase was at 100% for both spelling words and multiplication facts. On an immediate final retention test, Rea and Modigliani found a performance advantage for all items-math and spelling- practiced on an expanding schedule compared to the massed retrieval schedule. They suggested, as have others, that spacing combined with the high success rate inherent in the expanded retrieval schedule produced better retention than massed retrieval practice. However, as in Spitzer's study, Rea and Modigliani did not test an appropriate equal interval spacing condition. Hence, their finding that expanded retrieval is superior to massed retrieval in third graders could simply reflect the superiority of spaced versus massed rehearsal-in other words, the spacing effect." [^calendar]: Balota: > "...long-term retention of information has been demonstrated over several days in some cases (e.g., Camp et al., 1996). For example, in the latter study, Camp et al. employed an expanding retrieval strategy to train 23 individuals with mild to moderate AD to refer to a daily calendar as a cue to remember to perform various personal activities (e.g., take medication). Following a baseline phase to determine whether subjects would spontaneously use the calendar, spaced retrieval training was implemented by repeatedly asking the subject the question, 'How are you going to remember what to do each day?' at expanding time intervals. The results indicated that 20/23 subjects did learn the strategy (i.e., to look at the calendar) and retained it over a 1-week period." #### Generality of spacing effect We have already seen that spaced repetition is effective on a variety of academic fields and mediums. Beyond that, spacing effects can be found in: - various "domains (e.g., learning perceptual motor tasks[^motor] or learning lists of words)"^[See [Cepeda 2006](http://uweb.cas.usf.edu/~drohrer/pdfs/Cepeda_et_al_2006PsychBull.pdf)] such as spatial^[Commins, S., Cunningham, L., Harvey, D., and Walsh, D. (2003). ["Massed but not spaced training impairs spatial memory"](http://www.ncbi.nlm.nih.gov/pubmed/12642190). _Behavioural Brain Research_ 139, 215-223] - "across species (e.g., [rats](http://www.ncbi.nlm.nih.gov/pubmed/11264314), pigeons, and humans [or [flies](http://dubnaulab.cshl.edu/pdf/margulies2005.pdf), and sea slugs, [Carew et al 1972](http://www.sciencemag.org/content/175/4020/451.short) & [Sutton et al 2002](http://www.ncbi.nlm.nih.gov/pmc/articles/PMC155928/)])" - "across age groups [infancy[^Gallucio], childhood[^Toppino], adulthood[^Glenberg], the elderly^[See Kornell et al 2010]] and individuals with different memory impairments" - "and across retention intervals of seconds^[Mammarella, N., Russo, R., & Avons, S. E. (2002). ["Spacing effects in cued-memory tasks for unfamiliar faces and nonwords"](http://www.brandimontelab.it/pubpdf/nmam/M&C_Spacing2002.pdf). _Memory & Cognition_, 30, 1238-1251] [to days^[Childers, J. B., & Tomasello, M. (2002). ["Two-year-olds learn novel nouns, verbs, and conventional actions from massed or distributed exposures"](http://www.ncbi.nlm.nih.gov/pubmed/12428708). _Developmental Psychology_, 38, 967-978]] to months" (we have already seen studies using years) [^motor]: It should be noted that reviews directly conflict on motor skills; Lee and Genovese 1988 find benefits, while Adams 1987 and earlier do not. The difference may be that simple motor tasks benefit from spacing as suggested by [Shea & Morgan 1979](http://www.eric.ed.gov/ERICWebPortal/search/detailmini.jsp?_nfpb=true&_&ERICExtSearch_SearchValue_0=EJ215260&ERICExtSearch_SearchType_0=no&accno=EJ215260) (benefits to a randomized/spaced schedule), while *complex* ones where the subject is already operating at his limits do not benefit, suggested by [Wulf & Shea 2002](http://www.castonline.ilstu.edu/smith/405/readings_pdf/attn_focus_rdngs/wulf%20shea%20simple%20complex.pdf). Stambaugh 2009 mentions some of the diverging studies: > "The contextual interference hypothesis (Shea and Morgan 1979, Battig 1966 ["Facilitation and interference" in _Acquisition of skill_]) predicted the blocked condition would exhibit superior performance immediately following practice (acquisition) but the random condition would perform better at delayed retention testing. This hypothesis is generally consistent in laboratory motor learning studies (e.g. [Lee and Magill 1983](/docs/1983-lee.pdf), [Brady 2004](http://www.ncbi.nlm.nih.gov/pubmed/15446636)), but less consistent in applied studies of sports skills (e.g. [Landin and Hebert 1997](http://www.ncbi.nlm.nih.gov/pubmed/9421848), [Hall et al. 1994](http://www.ncbi.nlm.nih.gov/pubmed/8084699)) and fine-motor skills ([Ollis et al. 2005](http://www.ncbi.nlm.nih.gov/pubmed/15698822), [Ste-Marie et al. 2004](http://info.wincol.ac.il/home/home.exe/208/3813?load=T.pdf))." [^Gallucio]: Galluccio, L. & Rovee-Collier, C. (2006). ["Nonuniform effects of reinstatement within the time window"](/docs/2006-galluccio.pdf). _Learning and Motivation_, 37, 1-17. [^Toppino]: See the previous sections for many using children; one previously uncited is Toppino, T. C. (1993), ["The spacing effect in preschool children's free recall of pictures and words"](http://www.ncbi.nlm.nih.gov/pubmed/2017039); but [Toppino et al 2009](http://www.ncbi.nlm.nih.gov/pubmed/19246346) adds some interesting qualifiers to spaced repetition in the young: > "Preschoolers, elementary school children, and college students exhibited a spacing effect in the free recall of pictures when learning was intentional. When learning was incidental and a shallow processing task requiring little semantic processing was used during list presentation, young adults still exhibited a spacing effect, but children consistently failed to do so. Children, however, did manifest a spacing effect in incidental learning when an elaborate semantic processing task was used." [^Glenberg]: Another previously uncited study: Glenberg, A. M. (1979), ["Component-levels theory of the effects of spacing of repetitions on recall and recognition"](http://www.springerlink.com/content/5v15k124807g8363/). _Memory & Cognition_, 7, 95-112. The domains are limited, however. Cepeda 2006: > "[[Moss 1996](http://psycnet.apa.org/psycinfo/1996-95005-375), reviewing 120 articles] concluded that longer ISIs facilitate learning of verbal information (e.g., spelling^[eg. [Fishman et al](http://www.ncbi.nlm.nih.gov/pubmed/5672245) 1968]) and motor skills (e.g., mirror tracing); in each case, over 80% of studies showed a distributed practice benefit. In contrast, only one third of intellectual skill (e.g., math computation) studies showed a benefit from distributed practice, and half showed no effect from distributed practice. > > ...[Donovan and Radosevich (1999)] The largest effect sizes were seen in low rigor studies with low complexity tasks (e.g., rotary pursuit, typing, and peg reversal), and retention interval failed to influence effect size. The only interaction Donovan and Radosevich examined was the interaction of ISI and task domain. It is important to note that task domain moderated the distributed practice effect; depending on task domain and lag, an increase in ISI either increased or decreased effect size. Overall, Donovan and Radosevich found that increasingly distributed practice resulted in larger effect sizes for verbal tasks like free recall, foreign language, and verbal discrimination, but these tasks also showed an inverse-U function, such that very long lags produced smaller effect sizes. In contrast, increased lags produced smaller effect sizes for skill tasks like typing, gymnastics, and music performance." Skills like gymnastics and music performance raise an important point about the testing effect and spaced repetition: they are for the maintenance of memories or skills, they do not increase it beyond what was already learned. If one is a gifted amateur when one starts reviewing, one remains a gifted amateur. Ericsson covers what is necessary to *improve* and attain new expertise: [deliberate practice](!Wikipedia)^[The famous '10,000 hours of practice' figure may not be as true or important as Ericsson and publicizers like Malcolm Gladwell imply, given the high [variance](!Wikipedia) of expertise against time, and results from sports showing [significantly smaller](http://www.sportsscientists.com/2011/08/talent-training-and-performance-secrets.html) time investments, but the insight of 'deliberate practice' seems real. One may be able to get away with 3,000 hours rather than 10,000, but one isn't going to do that with mindless repetition or no repetitions.]. From ["The Role of Deliberate Practice"](/docs/1993-ericsson-deliberatepractice.pdf): > "The view that merely engaging in a sufficient amount of practice - regardless of the structure of that practice - leads to maximal performance, has a long and contested history. In their classic studies of Morse Code operators, Bryan and Harter (1897, 1899) identified plateaus in skill acquisition, when for long periods subjects seemed unable to attain further improvements. However, with extended efforts, subjects could restructure their skill to overcome plateaus....Even very experienced Morse Code operators could be encouraged to dramatically increase their performance through deliberate efforts when further improvements were required...More generally, Thorndike (1921) observed that adults perform at a level far from their maximal level even for tasks they frequently carry out. For instance, adults tend to write more slowly and illegibly than they are capable of doing....The most cited condition [for optimal learning and improvement of performance] concerns the subjects' motivation to attend to the task and exert effort to improve their performance....The subjects should receive immediate informative feedback and knowledge of results of their performance....In the absence of adequate feedback, efficient learning is impossible and improvement only minimal even for highly motivated subjects. Hence mere repetition of an activity will not automatically lead to improvement in, especially, accuracy of performance...In contrast to play, deliberate practice is a highly structured activity, the explicit goal of which is to improve performance. Specific tasks are invented to overcome weaknesses, and performance is carefully monitored to provide cues for ways to improve it further. We claim that deliberate practice requires effort and is not inherently enjoyable." ##### Abstraction Another potential objection is to argue^[Gentner, D., Loewenstein, J., & Thompson, L. (2003). ["Learning and transfer: A general role for analogical encoding"](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.58.2647&rep=rep1&type=pdf). _Journal of Educational Psychology_, 95, 393-40] that spaced repetition inherently hinders any kind of abstract learning and thought because related materials are not being shown together - allowing for comparison and inference - but days or months apart. Ernst A. Rothkopf: "Spacing is the friend of recall, but the enemy of induction" (Kornell & Bjork 2008, p. 585). This is plausible based on some of the early studies[^abstraction] but the 4 recent studies I know directly examining the issue both found spaced repetition helped abstraction as well as general recall: [^abstraction]: From Kornell et al 2010: > "The benefits of spacing seem to diminish or disappear when to-be-learned items are not repeated exactly ([Appleton-Knapp, Bjork, & Wickens, 2005](http://ideas.repec.org/a/ucp/jconrs/v32y2005i2p266-276.html))...a number of studies have shown that massing, rather than spacing, promotes inductive learning. These studies have generally employed relatively simple perceptual stimuli that facilitate experimental control ([Gagné, 1950](http://psycnet.apa.org/psycinfo/1950-05095-001); [Goldstone, 1996](http://cognitrn.psych.indiana.edu/rgoldsto/interrelated/interrelated.html); [Kurtz & Hovland, 1956](/docs/1956-kurtz.pdf); [Whitman J. R., & Garner, W. R. (1963). "Concept learning as a function of the form of internal structure". _Journal of Verbal Learning & Verbal Behavior_, 2, 195–202])." 1. Kornell, N., & Bjork, R. A. (2008). ["Learning concepts and categories: Is spacing the enemy of induction?"](http://faculty.kutztown.edu/rryan/classes/theories/objectiv/applying_cog_principles_to_ed_resources/Interleaving-Shuffling/Kornell%20&%20Bjork%202008.pdf) _Psychological Science_, 19, 585-592 2. Vlach, H. A., Sandhofer, C. M., & Kornell, N. (2008). ["The spacing effect in children's memory and category induction"](http://babytalk.psych.ucla.edu/documents/Vlach_Sandhofer_Kornell_2008.pdf). _Cognition_, 109, 163-167 3. Kornell, N., Castel, A. D., Eich, T. S., & Bjork, R. A. (2010). ["Spacing as the friend of both memory and induction in younger and older adults"](http://bjorklab.psych.ucla.edu/pubs/Kornell_Castel_Eich_Bjork_2010). _Psychology and Aging_, 25, 498-503 4. Vlach & Sandhofer 2012, ["Distributing Learning Over Time: The Spacing Effect in Children's Acquisition and Generalization of Science Concepts"](http://www.gse.uci.edu/docs/VlachSandhoferChildDevelopment.pdf), _Child Development_ ### Review summary To bring it all together with the gist: - testing is very effective and comes with minimal [negative factors](#downsides) - expanding spacing is roughly as good as or better than (wide) fixed intervals, but expanding is more convenient and the default - testing (and hence spacing) is best on intellectual, highly factual, verbal domains, but may still work in many low-level domains - the research favors questions which force the user to use their memory as much as possible; in descending order of preference: 1. free recall 2. short answers 3. multiple-choice 4. Cloze deletion 5. recognition - the research literature is comprehensive and most questions have been answered - somewhere. - the most common mistakes with spaced repetition are 1. formulating poor questions and answers 2. assuming it will help you learn, as opposed to maintain and preserve what one already learned^[High error rates - indicating one didn't actually learn the card contents in the first place - seem to be connected to failures of the spacing effect; there's [some evidence](http://www.columbia.edu/cu/psychology/metcalfe/PDFs/Son2010.pdf) that people naturally choose to mass study when they don't yet know the material.]. (It's hard to learn *from* cards, but if you have learned something, it's much easier to then devise a set of flashcards that will test your weak points.) ## Using it One doesn't need to use SuperMemo, of course; there are plenty of free alternatives. I like [Mnemosyne](!Wikipedia "Mnemosyne (software)") ([homepage](http://www.mnemosyne-proj.org/)) myself - [Free](!Wikipedia "Free software"), packaged for [Ubuntu Linux](!Wikipedia), and quite easy to use. OK, but what does one do with it? It's a surprisingly difficult question, actually. It's akin to "the tyranny of the blank page" (or blank wiki); now that I have all this power - a mechanical golem that will never forget and never let me forget whatever I chose to - what do I choose to remember? ### Problems One common experience of new users to spaced repetition is to add too much stuff - trivialities and things they don't really care about. But they soon learn the curse of [Borges's](!Wikipedia "Jorge Luis Borges") [Funes the Memorious](!Wikipedia). If they don't actually want to learn the material they put in, they will soon stop doing the daily reviews - which will cause reviews to pile up, which will be further discouraging, and so they stop. At least with physical fitness there isn't a precisely dismaying number indicating how far behind you are! But if you have too little at the beginning, you'll have few repetitions per day, and you'll see no obvious benefit from the technique itself - it looks just like boring flash card review. ### What to add The most difficult task, beyond that of just persisting until the benefits do become obvious, is deciding what's valuable enough to add in. In a 3 year period, one can expect to spend "30-40 seconds"^[["SuperMemo as a new tool increasing the productivity of a programmer. A case study: programming in Object Windows"](http://www.supermemo.com/articles/programming.htm)] on any given item. The long run [theoretical predictions](http://www.supermemo.com/articles/theory.htm) are a little hairier. Given a single item, the formula for daily time spent on it is $\text{time} = \frac{1}{500} \times \text{nthYear}^{-1.5} + \frac{1}{30000}$. During our 20th year, we would spend $t = \frac{1}{500} \times 20^{-1.5} + \frac{1}{3000}$, or 3.556940131083312e-4 minutes a day. This is the average daily time, so to recover the annual time spent, we simply multiply by 365. Suppose we were interested in how much time a flashcard would cost us over 20 years. The average daily time changes every year (the graph looks like an exponential decay, remember), so we have to run the formula for each year and sum them all; in Haskell: ~~~{.haskell} > sum $map (\year -> ((1/500 * year**(-(1.5))) + 1/30000) * 365.25) [1..20] 1.8291 ~~~ Which evaluates to 1.8 minutes. (This may seem too small, but one doesn't spend much time in the first year and the time drops off quickly^[The 20 years look like this (note the [scientific notation](!Wikipedia)): [0.742675, 0.27044575182838654, 0.15275979054767388, 0.10348750000000001, 7.751290630254386e-2, 6.187922936397532e-2, 5.161829250474865e-2, 4.445884397854832e-2, 3.923055555555555e-2, 3.5275438307530015e-2, 3.219809429218694e-2, 2.9748098818459235e-2, 2.7759942051635768e-2, 2.6120309801216147e-2, 2.474928593068675e-2, 2.35890625e-2, 2.2596898475825956e-2, 2.1740583401051353e-2, 2.0995431241707652e-2, 2.0342238287817983e-2]].) But maybe [Piotr Woźniak](!Wikipedia "Piotr Wozniak (researcher)") was being optimistic or we're bad at [writing flashcards](http://www.supermemo.com/articles/20rules.htm), so we'll double it to 5 minutes. That's our key rule of thumb that lets us decide what to learn and what to forget: if, over your lifetime, you will spend more than 5 minutes looking something up or will lose more than 5 minutes as a result of not knowing something, then it's worthwhile to memorize it with spaced repetition. 5 minutes is the line that divides trivia from useful data.[^memorizing] (There might seem to be thousands of flashcards that meet the 5 minute rule. That's fine. Spaced repetition can accommodate dozens of thousands of cards. See the [next section](#the-workload).) [^memorizing]: modulo things where knowing it is useful even if you don't need it very often - it can be a brick in a pyramid of knowledge; cf. [page 3](http://www.wired.com/medtech/health/magazine/16-05/ff_wozniak?currentPage=3) of Wolf: > "The problem of forgetting might not torment us so much if we could only convince ourselves that remembering isn't important. Perhaps the things we learn - words, dates, formulas, historical and biographical details - don't really matter. Facts can be looked up. That's what the Internet is for. When it comes to learning, what really matters is how things fit together. We master the stories, the schemas, the frameworks, the paradigms; we rehearse the lingo; we swim in the episteme. > > The disadvantage of this comforting notion is that it's false. "The people who criticize memorization - how happy would they be to spell out every letter of every word they read?" asks Robert Bjork, chair of UCLA's psychology department and one of the most eminent memory researchers. After all, Bjork notes, children learn to read whole words through intense practice, and every time we enter a new field we become children again. "You can't escape memorization," he says. "There is an initial process of learning the names of things. That's a stage we all go through. It's all the more important to go through it rapidly." The human brain is a marvel of associative processing, but in order to make associations, data must be loaded into memory." To a lesser extent, one might wonder when one is in a hurry, should one learn something with spaced repetition and with massed? How far away should the tests or deadlines be before abandoning spaced repetition? It's hard to compare since one would need a specific regimens to compare for the crossover point, but for massed repetition, the average time after memorization at which one has a 50% chance of remembering the memorized item seems to be 3-5 days.^[See Stephen R. Schmidt's webpage ["Theories of Forgetting"](http://frank.itlab.us/forgetting/mtsu_forgetting/#II.%20Decay%20Theory), which cites 'Woodworth & Schlosbeg (1961)' when presenting a [log graph](http://frank.itlab.us/forgetting/mtsu_forgetting/woodworth.jpg) of various studies' forgetting curves.] Since there would be 2 or 3 repetitions in that period, presumably one would do better than 50% in recalling an item. 5 minutes and 5 days seems like a memorable enough rule of thumb: 'don't use spaced repetition if you need it sooner than 5 days or it's worth less than 5 minutes'. I find one of the best uses for Mnemosyne is, besides adding questions relating to class material, to add in words from [A Word A Day](!Wikipedia)^[which neatly addresses the issue of such mailing lists being useless ('who learns a word after just one exposure?').] and [Wiktionary](!Wikipedia), memorable quotes I see^[Mnemosyne in this case constitutes both a way to learn the quotes so I can use them, and a [waste book](!Wikipedia "Notebook (style)"); just the other day I had 3 or 4 apposite quotes for an essay because I had entered them into Mnemosyne months or years ago.], personal information such as birthdays ^[I could never remember my license plate number until I entered 3 or 4 questions [anent](http://www.google.com/search?q=define%3Aanent)'t into Mnemosyne.], and so on. Quotidian uses, but valuable to me. With a diversity of flashcards, I find my daily review interesting. I get all sorts of questions - now I'm trying to see whether a Haskell fragment is syntactically correct, now I'm pronouncing Korean [hangul](!Wikipedia) and listening to the answer, now I'm trying to find the Ukraine on a map, now I'm enjoying some [A.E. Housman](!Wikipedia) poetry, followed by a few quotes from [LessWrong](http://www.lesswrong.com) [quote threads](http://lesswrong.com/tag/quotes/), and so on. ### The workload On average, when I'm studying a new topic, I'll add 3-20 questions a day. Combined with my particular memory, I usually review about 90 or 100 items a day (out of the total >18,300). This takes under 20 minutes, which is not too bad. (I expect the time is expanded a bit by the fact that early on, my formatting guidelines were still being developed, and I hadn't the full panoply of categories I do now - so every so often I must stop and edit categories.) If I haven't been studying something recently, the exponential back off of reviews slowly drops the daily review. For example, in March 2011, I wasn't studying very many things, so for 24-26 March 2011, my scheduled daily reviews are 73, 83, and 74; after that, it'll probably drop down into the 60s, and then after another week or two, into the 50s and so on until it hits the minimum plateau which will very slowly shrink over years. (I haven't gone long enough without dumping cards in to know what that might be.) By February 2012, the daily reviews are in the 40s or sometimes 50s for similar reasons, but the gradual shrinkage will continue. We can see this vividly, and we can even see a sort of analogue of the original forgetting curve, if we ask Mnemosyne 2.0 to graph the number of cards to review per day for the next year up to February 2013 (assuming no additions or missed reviews etc.): ![A wildly varying but clearly decreasing graph of predicted cards per day](/images/spaced-repetition-scheduled-cards.png) If Mnemosyne weren't using spaced repetition, it would be very hard to keep up with 18,300+ flashcards. But because it is using spaced repetition, keeping up is very easy. Nor is 18.3k extraordinary. Many users have decks in the 6-7k range, Mnemosyne developer [Peter Bienstman](http://groups.google.com/group/mnemosyne-proj-users/browse_frm/thread/433872b155ad7451/31c1e4c556680a0c) has >8.5k & Patrick Kenny >27k, [Hugh Chen](http://groups.google.com/group/mnemosyne-proj-users/browse_frm/thread/eff44f5fdb1d738b/7a7b654ca87e63be) has a 73k+ deck, and in [#anki](irc://irc.freenode.net#anki), they tell me of one user who triggered bugs with his >200k deck. 200,000 may be a bit much, but for regular humans, some amount smaller seems possible - it's interesting to compare SRS decks to the Muslim title of ['hafiz'](!Wikipedia "Hafiz (Qur'an)"), one who has memorized the ~80,000 words of the Koran, or the stricter 'hafid', one who had memorized the Koran *and* 100,000 [hadiths](!Wikipedia) as well. Other forms of memory are still more powerful.[^visualization] [^visualization]: Andrew Drucker. in ["Multiplying 10-digit numbers using Flickr: The power of recognition memory"](http://people.csail.mit.edu/andyd/rec_method.pdf), employs visual memory to calculate$9883603368 \times 4288997768 = 42390752785149282624$; he cites as precedent Lionel Standing's "Learning 10000 pictures" (_Quarterly Journal of Experimental Psychology_, 25:207-222, May 1973): > "In one of the most widely-cited studies on recognition memory, Standing showed participants an epic 10,000 photographs over the course of 5 days, with 5 seconds' exposure per image. He then tested their familiarity, essentially as described above. The participants showed an 83% success rate, suggesting that they had become familiar with about 6,600 images during their ordeal. Other volunteers, trained on a smaller collection of 1,000 images selected for vividness, had a 94% success rate." ### When to review When should one review? In the morning? In the evening? Any old time? The studies demonstrating the spacing effect do not control or vary the time of day, so in one sense, the answer is: it doesn't matter - if it did matter, there would be considerable variance in how effective the effect is based on when a particular study had its subjects do their reviews. So one reviews at whatever time is convenient. Convenience makes one more likely to stick with it, and sticking with it overpowers any temporary improvement. If one is not satisfied with that answer, then on general considerations, one ought to review before bedtime & sleep. [Memory consolidation](!Wikipedia "Memory consolidation#Spacing Effect") seems to be related, and [sleep](!Wikipedia "Sleep and memory") is known to powerfully influence what memories enter long-term memory; interrupting sleep without affecting total sleep time or quality still [damages memory formation in mice](http://www.pnas.org/content/early/2011/07/20/1015633108)[^polyphasic]. So reviewing before bedtime would be best. (Other mental exercises show improvement when trained before bedtime; for example, [dual n-back](DNB FAQ#sleep).) One possible mechanism is that it may be that the [*expectancy*](http://www.jneurosci.org/content/31/5/1563.short) of future reviews/tests is enough to encourage memory consolidation during sleep; so if one reviews and goes to bed, presumably the expectancy is stronger than if one reviewed at breakfast and had an eventful day and forgot entirely about the reviewed flashcards. (See also the correlation between time of studying & GPA in Hartwig & Dunlosky 2012.) Neural growth may be related; from Stahl 2010: > "...Recent advances in our understanding of the neurobiology underlying normal human memory formation have revealed that learning is not an event, but rather a process that unfolds over time.^[16](http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1876761/),[17](http://www.ncbi.nlm.nih.gov/pubmed/19640595),[18](http://mrcanu.pharm.ox.ac.uk/pdfs/oneill2010tins.pdf),[Squire\ 2003\ _Fundamental\ Neuroscience_],[20](http://sirricenter.com/pdfarticles/42_Making_Memories_Stick.pdf)^ Thus, it is not surprising that learning strategies that repeat materials over time enhance their retention.^20,[21](http://www.ncbi.nlm.nih.gov/pubmed/7219173),[22](http://www.ncbi.nlm.nih.gov/pubmed/2010724),[23](http://www.citeulike.org/user/klbclp/article/9091109),[24](http://commonsenseatheism.com/wp-content/uploads/2011/01/Karpicke-The-critical-importance-of-retrieval-for-learning.pdf),[25](http://tigger.uic.edu/~bstorm/sbs_2010.pdf),[26](http://www.pashler.com/Articles/Pashler.Rohrer.Cepeda.Carpenter_2007.pdf)^ > > ...Thousands of new cells are generated in this region every day, although many of these cells die within weeks of their creation.^[31](http://www.ncbi.nlm.nih.gov/pubmed/11406822)^ The survival of dentate gyrus neurons has been shown to be enhanced in animals when they are placed into learning situations.^16-20^ Animals that learn well retain more dentate gyrus neurons than do animals that do not learn well. Furthermore, 2 weeks after testing, animals trained in discrete spaced intervals over a period of time, rather than in a single presentation or a 'massed trial' of the same information, remember better.^16-20^ The precise mechanism that links neuronal survival with learning has not yet been identified. One theory is that the hippocampal neurons that preferentially survive are the ones that are somehow activated during the learning process.^16-20^ The distribution of learning over a period of time may be more effective in encouraging neuronal survival by allowing more time for changes in gene expression and protein synthesis that extend the life of neurons that are engaged in the learning process. > > ...Transferring memory from the encoding stage, which occurs during alert wakefulness, into consolidation must thus occur at a time when interference from ongoing new memory formation is reduced.^17,18^ One such time for this transfer is during sleep, especially during non-rapid eye movement sleep, when the hippocampus can communicate with other brain areas without interference from new experiences.^[32](http://www.ib.cnea.gov.ar/~redneu/clasicos/stickgold_science_2001.pdf),[33](http://inside.bard.edu/~luka/documents/sleepconsolidation.pdf),[34](http://www.nhhs.net/ourpages/auto/2007/10/29/1193690123726/sleep%20paper.pdf)^ Maybe that is why some decisions are better made after a good night's rest and also why pulling an all-nighter, studying with sleep deprivation, may allow you to pass an exam an hour later but not remember the material a day later." [^polyphasic]: In this vein, I am reminded of what a former [polyphasic sleeper](!Wikipedia "Polyphasic sleep") [told](http://lesswrong.com/lw/5n0/optimizing_sleep/44wz) [me](http://lesswrong.com/lw/5n0/optimizing_sleep/4509): > "I've been polyphasic for about a year. (Not anymore; kills my memory.)...Anki reps, mostly. I found that I could do proper review sessions for about 2-3 days and would hit an impenetrable wall. I couldn't learn a single new card and had total brain fog until I got 3 hours more sleep. That, however, would reset my adaptation. The whole effect is a bit less pronounced on Everyman, but not much. It is however easier to add sleep when you already have a core. I didn't notice any other major mental impairment after the initial sleep deprivation." #### Prospects: extended flashcards Let's step back for a moment. What are all our flashcards, small and large, doing for us? Why do I have a pair of flashcards for the word 'anent' among many others? I can just look it up. But look ups take time compared to already knowing something. (Let's ignore the previously discussed 5 minute rule.) If we think about this abstractly in a computer science context, we might recognize it as an old concept in algorithms & optimization discussions - the [space-time tradeoff](!Wikipedia). We trade off lookup time against limited skull space. The most obvious example is the sort of factual data already given as examples - we might one day need to know the average annual rainfall in Honolulu or Austin, but it would require too much space to memorize such data for all capitals. There are millions of English words, but in practice any more than 100,000 is excessive. Less obvious is a sort of procedural knowledge. An extreme form of space-time tradeoffs in computers is when a computation is replaced by pre-calculated constants. We could take a math [function](!Wikipedia "Function (mathematics)") and calculate its output for each possible input. Usually such a [lookup table](!Wikipedia) of input to output is really large. Think about how many entries would be in such a table for all possible integer multiplications between 1 and 1 billion. But sometimes the table is really small (like binary Boolean functions) or small (like trigonometric tables) or very large but still useful ([rainbow table](!Wikipedia)s usually start in the gigabytes and easily reach terabytes). Given an infinitely large lookup table, we could replace *completely* the skill of, say, addition or multiplication by the lookup table. No computation. The space-time tradeoff taken to the extreme of the space side of the continuum. (We could go the other way and define multiplication or addition as the very slow computation which doesn't know any specifics like the [multiplication table](!Wikipedia) - as if every time you wanted to add$2+2$you had to count on 4 fingers.) So suppose we were children who wanted to learn multiplication. SRS and Mnemosyne can't help because multiplication is not a specific factoid? The space-time tradeoff shows us that we can de-proceduralize multiplication and turn it partly into factoids. It wouldn't be hard for us to write a quick script or macro to generate, say, 500 random cards which ask us to multiply AB by XY, and import them to Mnemosyne.^[Presumably one would immediately give them all some high grade like 5 to avoid suddenly having a daily load of 500 cards for a while.] After all, which is your mind going to do - get good at multiplying 2 numbers (generate on-demand), or memorize 500 different multiplication problems ([memoize](!Wikipedia))? From my experience with multiple subtle variants on a card, the mind gives up after just a few and falls back on a problem-solving approach - which is exactly what one wants to exercise, in this case. Congratulations; you have done the impossible. From a software engineering point of view, we might want to modify or improve the cards, and 500 snippets of text would be a tad hard to update. So coolest would be a 'dynamic card'. Add a markup type like  , and then Mnemosyne feeds the src argument straight into the Python interpreter, which returns a [tuple](!Wikipedia) of the question text and the answer text. The question text is displayed to the user as usual, the user thinks, requests the answer, and grades himself. So for multiplication, the dynamic card would get 2 random integers, print a question like 'x * y = ?' and then print x*y as the answer. Every so often you would get a new multiplication question, and as you get better at multiplication, you see it less often - exactly as you should. Still in a [math vein](http://www.reddit.com/r/math/comments/hvqzd/printable_math_flashcards_in_pdf_and_latex_source/c1yror5), you could generate variants on formulas or programs where one version is the correct one and the others are subtly wrong; I do this by hand with my programming flashcards (especially if I make an error doing exercises, that signals a finer point to make several flashcards on), but it can be done automatically. [kpreid](http://lesswrong.com/lw/64k/memory_spaced_repetition_and_life/4e5n) [describes](http://lesswrong.com/lw/64k/memory_spaced_repetition_and_life/4e5n) one tool of his: > "I have written [a program](https://github.com/kpreid/mathquiz/) (in the form of [a web page](http://kpreid.github.com/mathquiz/mathquiz.html)) which does a specialized form of this [generating 'damaged formulas']. It has a set of generators of formulas and damaged formulas, and presents you with a list containing several formulas of the same type (e.g. ∫ 2x dx = x^2 + C) but with one damaged (e.g. ∫ 2x dx = 2x^2 + C)." This approach generalizes to anything you can generate random problems of or have large databases of examples of. For example, maybe you are studying Go and are interested in learning [life-and-death positions](!Wikipedia "Life and death"). Those are things that can be generated by computer Go programs, or fetched from places like [GoProblems.com](http://www.goproblems.com). For even more examples, Go is rotationally invariant - the best move remains the same regardless of which way the board is oriented and since there is no canonical direction for the board (like in chess) a good player ought to be able to play the same no matter how the board looks - so each specific example can be mirrored in 3 other ways. Or one could test one's ability to 'read' a board by writing a dynamic card which takes each example board/problem and adds some random pieces as long as some go-playing program like [GNU Go](!Wikipedia) says the best move hasn't changed because of the added noise. One could learn an awful lot of things this way. Programming languages could be learned this way - someone learning [Haskell](!Wikipedia "Haskell (programming language)") could take all the functions listed in the Prelude or his Haskell textbook, and ask [QuickCheck](!Wikipedia) to generate random arguments for the functions and ask the [GHC](!Wikipedia "Glasgow Haskell Compiler") interpreter ghci what the function and its arguments evaluate to. Games other than go, like chess. A fair bit of mathematics. If the dynamic card has Internet access, it can pull down fresh questions from an [RSS feed](!Wikipedia) or just a website; this functionality could be quite useful in a foreign language learning context with every day bringing a fresh sentence to translate or another exercise. Even though these things seem like 'skills' and not 'data'! # Popularity ----------------------------------------------------------------------------------------------------------- Metric[^date] Mnemosyne [Mnemododo][] [Anki][] iSRS [AnyMemo][] --------------- -------------------- --------- ---------- -------------- ------------------ Homepage Alexa [383k][] [27.5m][] [112k][] [1,766k][][^alexa] ML/forum members [461][] [4129][]/[215][] [129][] [119][] Ubuntu installs [7k][] [9k][] Debian installs [164][] [364][] Arch votes [85][] [96][] iPhone ratings Unreleased[^imnemo] [193][] [69][] Android ratings [20][] [703][] [836][] Android installs [100-500][] [10-50k][] [50-100k][FANTASTIC] -------------------------------------------------------------------------------------------------------------- [Mnemododo]: http://www.tbrk.org/software/mnemododo.html [Anki]: !Wikipedia [AnyMemo]: http://anymemo.org/ [383k]: http://www.alexa.com/siteinfo/mnemosyne-proj.org [112k]: http://www.alexa.com/siteinfo/ankisrs.net# [27.5m]: http://www.alexa.com/siteinfo/tbrk.org# [1,766k]: http://www.alexa.com/siteinfo/anymemo.org# [461]: http://groups.google.com/group/mnemosyne-proj-users [4129]: http://groups.google.com/group/ankisrs/ [215]: https://groups.google.com/group/ankisrs-users/about [129]: http://groups.google.com/group/isrs-support [119]: http://anymemo.org/forum/ [7k]: http://popcon.ubuntu.com/universe/by_inst [9k]: http://popcon.ubuntu.com/unknown/by_inst [164]: http://qa.debian.org/popcon.php?package=mnemosyne [364]: http://qa.debian.org/popcon.php?package=anki [85]: http://aur.archlinux.org/packages.php?ID=13628 [96]: http://aur.archlinux.org/packages.php?ID=14403 [193]: http://itunes.apple.com/us/app/ankisrs/id373493387 [69]: http://itunes.apple.com/app/isrs-free/id332350042 [20]: https://market.android.com/details?id=org.tbrk.mnemododo [703]: https://market.android.com/details?id=com.ichi2.anki [836]: https://market.android.com/details?id=org.liberty.android.fantastischmemo [100-500]: https://market.android.com/details?id=org.tbrk.mnemododo [10-50k]: https://market.android.com/details?id=com.ichi2.anki [FANTASTIC]: https://market.android.com/details?id=org.liberty.android.fantastischmemo [^date]: All numbers from 2 May 2011. SuperMemo doesn't fall under the same ratings, but it has sold in the hundreds of thousands over its 2 decades: > "Biedalak is CEO of SuperMemo World, which sells and licenses Wozniak's invention. Today, SuperMemo World employs just 25 people. The venture capital never came through, and the company never moved to California. About 50,000 copies of SuperMemo were sold in 2006, most for less than$30. Many more are thought to have been pirated."^[[_Wired_](http://www.wired.com/medtech/health/magazine/16-05/ff_wozniak?currentPage=5)] It seems safe to estimate the combined market-share of Anki, Mnemosyne, iSRS and other SRS apps at somewhere under 50,000 users (making due allowance for users who install multiple times, those who install and abandon it, etc.). Relatively few users seem to have migrated from SuperMemo to those newer programs, so it seems fair to simply add that 50k to the other 50k and conclude that the worldwide population is somewhere around (but probably under) 100,000. [^alexa]: Smaller is better. [^imnemo]: ["For Mnemosyne 2.x, Ullrich is working on an official Mnemosyne iPhone client which will have very easy syncing."](http://groups.google.com/group/mnemosyne-proj-users/browse_frm/thread/5bbe0fceaef5dab5/83b6f215c918771f) # Where was I going with this? Nowhere, really. Mnemosyne/SR software in general are just one of my favorite tools: it's based on a famous effect[^proudest] discovered by science, and it exploits it very elegantly[^me] and usefully. It's a testament to the Enlightenment ideal of improving humanity through reason and overcoming our human flaws; the idea of SR is seductive in its mathematical rigor[^splendor]. In this age where so often the ideal of 'self-improvement' and progress are decried, and gloom are espoused by even the common people, it's really nice to just have a small example like this in one's daily life, an example not yet so prosaic and boring as the lightbulb. [^proudest]: See [Page 4](http://www.wired.com/medtech/health/magazine/16-05/ff_wozniak?currentPage=4), Wolf 2008: > "The spacing effect was one of the proudest lab-derived discoveries, and it was interesting precisely because it was not obvious, even to professional teachers. The same year that Neisser revolted, Robert Bjork, working with Thomas Landauer of Bell Labs, published the results of two experiments involving nearly 700 undergraduate students. Landauer and Bjork were looking for the optimal moment to rehearse something so that it would later be remembered. Their results were impressive: The best time to study something is at the moment you are about to forget it. And yet - as Neisser might have predicted - that insight was useless in the real world." [^me]: When I first read of SuperMemo, I had already taken a class in [cognitive psychology](!Wikipedia) and was reasonably familiar with Ebbinghaus's forgetting curve - so my reaction to its methodology was Huxley's: "How extremely stupid not to have thought of that!" [^splendor]: See [page 7](http://www.wired.com/medtech/health/magazine/16-05/ff_wozniak?currentPage=7), Wolf 2008 > "And yet now, as I grin broadly and wave to the gawkers, it occurs to me that the cold rationality of his approach may be only a surface feature and that, when linked to genuine rewards, even the chilliest of systems can have a certain visceral appeal. By projecting the achievement of extreme memory back along the forgetting curve, by provably linking the distant future - when we will know so much - to the few minutes we devote to studying today, Wozniak has found a way to condition his temperament along with his memory. He is making the future noticeable. He is trying not just to learn many things but to warm the process of learning itself with a draft of utopian ecstasy." # See also In the course of using Mnemosyne, I've written a number of scripts to generate repetitively varying cards. - [mnemo.hs](haskell/mnemo.hs) will take any newline-delimited chunk of text, like a poem, and generates to every possible [Cloze deletion](!Wikipedia); that is, a ABC poem will become 3 questions: \_BC/ABC, A\_C/ABC, AB\_C/ABC - [mnemo2.hs](haskell/mnemo2.hs) works as above, but is more limited and is intended for long chunks of text where mnemo.hs would cause a combinatorial explosion of generated questions; it generates a subset: for ABCD, one gets \_\_CD/ABCD, A\_\_D/ABCD, and AB\_\_/ABCD (it removes 2 lines, and iterates through the list). - [mnemo3.hs](haskell/mnemo3.hs) is intended for date or name-based questions. It'll take input like 'Barack Obama is %47%.' and spit out some questions based on this: 'Barack Obama is \_7./47', 'Barack Obama is 4\_./47' etc. - [mnemo4.hs](haskell/mnemo4.hs) is intended for long lists of items. If one wants to memorize the list of US Presidents, the natural questions for flashcards goes something like 'Who was the 3rd president?/Thomas Jefferson', 'Thomas Jefferson was the \_rd president./3', 'Who was president after John Adams?/Thomas Jefferson', 'Who was president before James Madison?/Thomas Jefferson'. You note there's repetition if you do this for each president - one asks the ordinal position of the item both ways (item -> position, position -> item), what precedes it, and what succeeds it. mnemo4.hs automates this, given a list. In order to be general, the wording is a bit odd, but it's better than writing it all out by hand! (Example output is in the [comments](!Wikipedia "Comment (computer programming)") to the source code). The reader might well be curious by this point what *my* Mnemosyne database looks like. I use Mnemosyne quite a bit, and as of 23 March 2011, I have 18,327 cards in my deck. Said curious reader may find my database exported as XML at [docs/gwern.xml.gz](). # External links - ["Teaching linear algebra"](http://bentilly.blogspot.com/2009/09/teaching-linear-algebra.html) (with spaced repetition), by Ben Tilly - [AJATT table of contents](http://www.alljapaneseallthetime.com/blog/all-japanese-all-the-time-ajatt-how-to-learn-japanese-on-your-own-having-fun-and-to-fluency) -(applying SRS to learning Japanese) - ["Janki Method: Using spaced repetition systems to learn and retain technical knowledge"](http://www.jackkinsella.ie/2011/12/05/janki-method.html) -(SRS for programming; [Reddit](http://www.reddit.com/r/programming/comments/n30hl/janki_method_learning_programming_with_6000/)) - [Bash scripts](http://groups.google.com/group/mnemosyne-proj-users/browse_thread/thread/fd10b9e601fb0eb6) for generating vocabulary flashcards (processing multiple online dictionaries, good for having multiple examples; images; and audio) - vocabulary selection: 1. ["Programmed Vocabulary Learning as a Travelling Salesman Problem"](http://jtauber.com/blog/2004/11/26/programmed_vocabulary_learning_as_a_travelling_salesman_problem/) 2. ["Teaching New Testament Greek"](http://jtauber.com/blog/2006/05/05/teaching_new_testament_greek/) 3. ["A New Kind of Graded Reader"](http://jtauber.com/blog/2008/02/10/a_new_kind_of_graded_reader/) (video talk) 4. [Mailing list](http://groups.google.com/group/graded-reader) 5. [Programs](https://code.google.com/p/graded-reader/) (I took [a stab](http://community.haskell.org/~gwern/hcorpus/) at writing Haskell versions once) ## Flashcard sources - the [Mnemosyne deck collection](http://mnemosyne-proj.org/taxonomy/term/10%208%209%206%207%2011%2012%2013%2028%2014%2015%2016%2017%2018%2019%2020%2021%2022%2023%2024%2025%2026%2027%2029%2030%2031%2032%2040%2039%2034%2035%2036%2037%2041%2042%2046%2047%2050%2049) - the Anki deck collection is viewable only through the Anki clients; >3000 decks - [FlashCardExchange.com](http://www.flashcardexchange.com) - [StudyStack.com](http://www.studystack.com/) - [AnyMemo](http://anymemo.org/index.php?page=databases) (partially redundant with Mnemosyne & Anki) - [Flashcarddb](http://flashcarddb.com/cardsets)