Skip to content

karex/ngramizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

nGramIzer

This program calculates the forward conditional probability and backward conditional probability of Norwegian Bokmål words in sentences in a text file. The probability calculation is based the formulae in Onnis et al. (2022) and uses data from the n-gram database from the National Library of Norway.

Prerequisites

Database of N-grams in Norwegian Bokmål

  1. Download the Norwegian Bokmål n-gram database from the National Library of Norway to the project directory.
  2. Decompress the archive, e.g. by opening a command line and issuing the command: tar xf ngram_nob.tar.gz

You should have a bokm folder in the project directory after some minutes.

Python

Download Python 3.x

Run

Run the nGramIzer on any number of text files with the command

py -3 ngram.py input_file [input_file_2]...

Output files will be generated with a _result.csv postfix.

Note: building the dictionaries takes some time before the actual analysis runs.

Output format

The generated CSV will contain the following columns:

  • Sentence Number
  • Word Number
  • Word
  • Forward Probability
  • Backward Probability

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages