Skip to content

lpmi-13/simple-NLP-stats

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Simple-NLP

This is a very simple micromaterial created for the Oxford Summer of Hacks Language Hack Day.

The aim is to give learners practice in doing a very simple NLP task: finding the most frequent words in a text (frequency distribution), and also finding the type/token ratio (number of unique words / number of total words).

learning objectives

  • what is a type, and what is a token
  • count the total words (tokens) in a text
  • converting a text into unique words
  • count the unique words (types) in a text
  • calculate the type/token ratio of a text

The activity

One big skeleton function has already been written, along with the test for it. So to complete the activity, just fill in the functions and run the tests. If the test passes, you did it! If not, try to fix the function so the test passes.

to run the test: python -m unittest

Possible steps:

1) Find out about types and tokens.

2) Find out about turning text into a list of words

3) Find out about turning a list of words into a list of unique words

4) Keep track of the total words and unique words, then calculate the ratio.

About

a simple micromaterial to help learners practice doing simple NLP tasks

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages