nGramIzer

This program calculates the forward conditional probability and backward conditional probability of Norwegian Bokmål words in sentences in a text file. The probability calculation is based the formulae in Onnis et al. (2022) and uses data from the n-gram database from the National Library of Norway.

Prerequisites

Database of N-grams in Norwegian Bokmål

Download the Norwegian Bokmål n-gram database from the National Library of Norway to the project directory.
Decompress the archive, e.g. by opening a command line and issuing the command: tar xf ngram_nob.tar.gz

You should have a bokm folder in the project directory after some minutes.

Python

Download Python 3.x

Run

Run the nGramIzer on any number of text files with the command

py -3 ngram.py input_file [input_file_2]...

Output files will be generated with a _result.csv postfix.

Note: building the dictionaries takes some time before the actual analysis runs.

Output format

The generated CSV will contain the following columns:

Sentence Number
Word Number
Word
Forward Probability
Backward Probability

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
ngram.py		ngram.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

ngram.py

ngram.py

Repository files navigation

nGramIzer

Prerequisites

Database of N-grams in Norwegian Bokmål

Python

Run

Output format

About

Releases

Packages

Languages

License

karex/ngramizer

Folders and files

Latest commit

History

Repository files navigation

nGramIzer

Prerequisites

Database of N-grams in Norwegian Bokmål

Python

Run

Output format

About

Resources

License

Stars

Watchers

Forks

Languages