Repository for Frequency Word List Generator and processed files
In early days I hosted the generated files on OneDrive with my blog https://invokeit.wordpress.com/frequency-word-lists/ linking to it. Moving forward, the code and the generated outputs are on GitHub.
OpenSubtitle tokenized source
The data used to generate this lists can be found at http://opus.lingfil.uu.se/OpenSubtitles2016.php
Format of the frequency lists:
word1 number1 (number1 represents occurance of word1 across all files)
word2 number2 (number2 represents occurance of word2 across all files)
If you like to contribute towards my project, you can donate using PayPal button