Text Predictor for Python

This project contains Python classes for making word and chracter level predictions based on an N-gram language model. The word prediction class predicts words based the current prefix of a word and optional left context. The character prediction class predicts the most probable next characters based on optional left context.

System requirements

pip
Python 2.7
KenLM

Installation

Language model queries are performed using the KenLM library. Use the package manager pip to install KenLM. We have made a branch of the original KenLM repo. The only change is to change several scripts to compile KenLM with support for up to 12-gram language models. This is required by the example character language model provided here.

pip install https://github.com/kdv123/kenlm/archive/master.zip

Examples

The examples directory under the root repository has the following example scripts:

Usage

There are three python scripts which represent three class for the predictor. The predictor.py script contains the WordPredictor class and the chracter_predictor.py script contains the CharacterPredictor class. The vocabtrie.py contains a VocabTrie class which is used by the WordPredictor class to create a trie data structure.

To use the WordPredictor class you need to do the following:

from predictor import WordPredictor

Then you need to specify the path to a language model filename and a vocabulary filename. There are some example language models and vocabulary filename in the resources sub-directory.

lm_filename = 'resources/lm_word_medium.kenlm'
vocab_filename = 'resources/vocab_100k'
word_predictor = WordPredictor(lm_filename, vocab_filename)

There are three methods to predict the most probable word or a list of probable words:

The first method takes a prefix, a vocab_id and a minimum log probabilty as argument and returns a list of probable words without considering any context:

def get_words(prefix, vocab_id, num_predictions, min_log_prob)

When an object of the WordPredictor is instantiated it creates a trie data structure with the default vocab_id = ''. A list of characters from the vocabulary is also created on instantiation and the method returns a list of probable words starting with the prefix and each character of the character list. The default value for the parameter num_predictions is 0 and the method returns all the predictions ordering from the most probable to the least. The default value for the parameter min_log_prob is -float('inf').

The second method is similar to the previous one by it also takes into account a context to predict the list of probable words:

def get_words(prefix, context, vocab_id, num_predictions, min_log_prob)

The third method can take the similar arguments to the first and second method but in this case it only returns the most probable word for a given context and a prefix:

def get_most_probable_word(prefix, context, vocab_id, num_predictions, min_log_prob)

Acknowledgements

This material is based upon work supported by the National Science Foundation under Grant No. 1750193. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
.idea		.idea
examples		examples
resources		resources
test		test
.gitignore		.gitignore
AUTHORS		AUTHORS
LICENSE		LICENSE
README.md		README.md
character_predictor.py		character_predictor.py
predictor.py		predictor.py
vocabtrie.py		vocabtrie.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Text Predictor for Python

System requirements

Installation

Examples

Usage

Acknowledgements

About

Uh oh!

Releases

Packages

Languages

License

kdv123/TextPredictorPython

Folders and files

Latest commit

History

Repository files navigation

Text Predictor for Python

System requirements

Installation

Examples

Usage

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages