This is a very simple micromaterial created for the Oxford Summer of Hacks Language Hack Day.
The aim is to give learners practice in doing a very simple NLP task: finding the most frequent words in a text (frequency distribution), and also finding the type/token ratio (number of unique words / number of total words).
- what is a type, and what is a token
- count the total words (tokens) in a text
- converting a text into unique words
- count the unique words (types) in a text
- calculate the type/token ratio of a text
One big skeleton function has already been written, along with the test for it. So to complete the activity, just fill in the functions and run the tests. If the test passes, you did it! If not, try to fix the function so the test passes.
to run the test:
python -m unittest