Is your text easy to understand? Or is it full of technical terms? This simple tool highlights difficult words in your text - in English and in German.
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
bin
data
web
LICENSE
README.md
package.json

README.md

simpleplease

Is your text easy to understand? Or is it full of technical terms? This simple tool highlights difficult words in your text - in English and in German.

how it works

The data folder contains word lists. The first column always contains a word, the second column the logarithm of the relative frequency. This value is between 0 and -infinity.
0 means that there is only one word repeating in the text: »Doh! Doh! Doh! Doh! Doh!«
-1 means a probability of e^-1 (37%). So that word would occur about every third time. »Like, it is like, like, you know, like, things, like stuff, and so...«

The bin folder contains some node.js helper script.
1_scan_html.js scans an HTML file, extracts all text between <p></p>, removes all tags and save the text.
2_scan_text.js calculates the word frequency in a text.
3_scan_words.js scans and combines multiple wordlists and prepares the result for using them in the web app.

The web folder contains a hacky frontend, with jquery, bootstrap and quill.js as the texteditor.

FAQ

Q: Why do you use stemming?

A: Since stemming reduces multiple words to the same stem, the word list becomes shorter ... or the number of words can increase with the same amount of memory. Also: In German you need a stemmer, especially for verbs! A verb like packen (to pack sth) can take different forms depending on time, person, grammatical mood, etc. like packen, packe, packst, packt, pack, packen, packest, packet, packten, packte, packtest, packtet, ...

Q: Why do you combine wordlists from multiple sources and use different weights?

A: Every text source has its advantages and disadvantages. For example, Wikipedia often uses words like references and reception, as these are common sub-categories in articles. Newspapers have a preference for political terms, but don't use rumpelstiltskin. And old fairy tales don't talk about iPhones and YouTube.
In the end, it is almost impossible to find the perfect text source suitable for children. Combining text sources with different weights seems to be the best approach to meet our despair.

Q: How did you do the realtime highlighting in the web editor?

A: First I tried to put a <textarea> over a <div> element. That worked great, till I tested it on iOS and noticed an unfixable padding generated by a shitty shadow DOM element.
Then I tried to switch to a contenteditable <div>, that fixed one problem and created several new ones.
In the end I found quill.js. It is small, fast, easy and flexible! My thanks to Jason Chen!