This is now deployed via netlify.
Is your text easy to understand? Or is it full of technical terms? This simple tool highlights difficult words in your text - in English and in German.
The data
folder contains word lists. The first column always contains a word, the second column the logarithm of the relative frequency. This value is between 0 and -infinity.
0
means that there is only one word repeating in the text: »Doh! Doh! Doh! Doh! Doh!«
-1
means a probability of e^-1 (37%). So that word would occur about every third time. »Like, it is like, like, you know, like, things, like stuff, and so...«
The bin
folder contains some node.js helper script.
1_scan_html.js
scans an HTML file, extracts all text between <p></p>
, removes all tags and save the text.
2_scan_text.js
calculates the word frequency in a text.
3_scan_words.js
scans and combines multiple wordlists and prepares the result for using them in the web app.
The web
folder contains a hacky frontend, with jquery, bootstrap and quill.js as the texteditor.
A: Since stemming reduces multiple words to the same stem, the word list becomes shorter ... or the number of words can increase with the same amount of memory. Also: In German you need a stemmer, especially for verbs! A verb like packen (to pack sth) can take different forms depending on time, person, grammatical mood, etc. like packen, packe, packst, packt, pack, packen, packest, packet, packten, packte, packtest, packtet, ...
A: Every text source has its advantages and disadvantages. For example, Wikipedia often uses words like references and reception, as these are common sub-categories in articles. Newspapers have a preference for political terms, but don't use rumpelstiltskin. And old fairy tales don't talk about iPhones and YouTube. In the end, it is almost impossible to find the perfect text source suitable for children. Combining text sources with different weights seems to be the best approach to meet our despair.
A: First I tried to put a <textarea>
over a <div>
element. That worked great, till I tested it on iOS and noticed an unfixable padding generated by a shitty shadow DOM element.
Then I tried to switch to a contenteditable
<div>
, that fixed one problem and created several new ones.
In the end I found quill.js. It is small, fast, easy and flexible! My thanks to Jason Chen!