Skip to content

lpmi-13/simple-NLP-stats

main
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 

Simple-NLP

This is a very simple micromaterial created for the Oxford Summer of Hacks Language Hack Day.

The aim is to give learners practice in doing a very simple NLP task: finding the most frequent words in a text (frequency distribution), and also finding the type/token ratio (number of unique words / number of total words).

learning objectives

  • what is a type, and what is a token
  • count the total words (tokens) in a text
  • converting a text into unique words
  • count the unique words (types) in a text
  • calculate the type/token ratio of a text

The activity

One big skeleton function has already been written, along with the test for it. So to complete the activity, just fill in the functions and run the tests. If the test passes, you did it! If not, try to fix the function so the test passes.

to run the test: python -m unittest

Possible steps:

1) Find out about types and tokens.

2) Find out about turning text into a list of words

3) Find out about turning a list of words into a list of unique words

4) Keep track of the total words and unique words, then calculate the ratio.

About

a simple micromaterial to help learners practice doing simple NLP tasks

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages