Is your text easy to understand? Or is it full of technical terms? This simple tool highlights difficult words in your text - in English and in German.
how it works
data folder contains word lists. The first column always contains a word, the second column the logarithm of the relative frequency. This value is between 0 and -infinity.
0 means that there is only one word repeating in the text: »Doh! Doh! Doh! Doh! Doh!«
-1 means a probability of e^-1 (37%). So that word would occur about every third time. »Like, it is like, like, you know, like, things, like stuff, and so...«
bin folder contains some node.js helper script.
1_scan_html.js scans an HTML file, extracts all text between
<p></p>, removes all tags and save the text.
2_scan_text.js calculates the word frequency in a text.
3_scan_words.js scans and combines multiple wordlists and prepares the result for using them in the web app.
web folder contains a hacky frontend, with jquery, bootstrap and quill.js as the texteditor.
Q: Why do you use stemming?
A: Since stemming reduces multiple words to the same stem, the word list becomes shorter ... or the number of words can increase with the same amount of memory. Also: In German you need a stemmer, especially for verbs! A verb like packen (to pack sth) can take different forms depending on time, person, grammatical mood, etc. like packen, packe, packst, packt, pack, packen, packest, packet, packten, packte, packtest, packtet, ...
Q: Why do you combine wordlists from multiple sources and use different weights?
A: Every text source has its advantages and disadvantages. For example, Wikipedia often uses words like references and reception, as these are common sub-categories in articles. Newspapers have a preference for political terms, but don't use rumpelstiltskin. And old fairy tales don't talk about iPhones and YouTube.
In the end, it is almost impossible to find the perfect text source suitable for children. Combining text sources with different weights seems to be the best approach to meet our despair.
Q: How did you do the realtime highlighting in the web editor?
A: First I tried to put a
<textarea> over a
<div> element. That worked great, till I tested it on iOS and noticed an unfixable padding generated by a shitty shadow DOM element.
Then I tried to switch to a
<div>, that fixed one problem and created several new ones.
In the end I found quill.js. It is small, fast, easy and flexible! My thanks to Jason Chen!