Skip to content

Handy frequency lists from corpora and a few related utilities.

Notifications You must be signed in to change notification settings

franfranz/Word_Frequency_Toytools

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 

Repository files navigation

Word Frequency Toytools

Code to generate/annotate/handle lists of frequency from corpora.

Normalize Word Frequency v0.1.5

R code to normalize raw frequency counts into fpmw, fpbw, zipf, zipf per billion and other popular measures to indicate word frequency. To use Normalize Word Frequency :

Prepare your input file:

  • make sure your txt or csv input files have a header: the column with raw frequency you want to normalize must be called "Frequency"

Set input specifications in the code:

  • set the paths for input and output files (line 28-30)
  • set the file extension (36)
  • set the file separators (48)
  • set size of corpus (70) - this version reports the size of Itwac.

Set output specifications in the code:

  • choose what transformations to apply by commenting/ uncommenting (56-65)

Normalize Word Frequency v0.1.4

This version has been deprecated.

About

Handy frequency lists from corpora and a few related utilities.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages