Skip to content

infographicsgroup/simpleplease

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

48 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

simpleplease

This is now deployed via netlify.

Is your text easy to understand? Or is it full of technical terms? This simple tool highlights difficult words in your text - in English and in German.

how it works

The data folder contains word lists. The first column always contains a word, the second column the logarithm of the relative frequency. This value is between 0 and -infinity. 0 means that there is only one word repeating in the text: »Doh! Doh! Doh! Doh! Doh!« -1 means a probability of e^-1 (37%). So that word would occur about every third time. »Like, it is like, like, you know, like, things, like stuff, and so...«

The bin folder contains some node.js helper script. 1_scan_html.js scans an HTML file, extracts all text between <p></p>, removes all tags and save the text. 2_scan_text.js calculates the word frequency in a text. 3_scan_words.js scans and combines multiple wordlists and prepares the result for using them in the web app.

The web folder contains a hacky frontend, with jquery, bootstrap and quill.js as the texteditor.

FAQ

Q: Why do you use stemming?

A: Since stemming reduces multiple words to the same stem, the word list becomes shorter ... or the number of words can increase with the same amount of memory. Also: In German you need a stemmer, especially for verbs! A verb like packen (to pack sth) can take different forms depending on time, person, grammatical mood, etc. like packen, packe, packst, packt, pack, packen, packest, packet, packten, packte, packtest, packtet, ...

Q: Why do you combine wordlists from multiple sources and use different weights?

A: Every text source has its advantages and disadvantages. For example, Wikipedia often uses words like references and reception, as these are common sub-categories in articles. Newspapers have a preference for political terms, but don't use rumpelstiltskin. And old fairy tales don't talk about iPhones and YouTube. In the end, it is almost impossible to find the perfect text source suitable for children. Combining text sources with different weights seems to be the best approach to meet our despair.

Q: How did you do the realtime highlighting in the web editor?

A: First I tried to put a <textarea> over a <div> element. That worked great, till I tested it on iOS and noticed an unfixable padding generated by a shitty shadow DOM element. Then I tried to switch to a contenteditable <div>, that fixed one problem and created several new ones. In the end I found quill.js. It is small, fast, easy and flexible! My thanks to Jason Chen!

About

Is your text easy to understand? Or is it full of technical terms? This simple tool highlights difficult words in your text - in English and in German.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published