GitHub - kumquatexpress/nGramBabbler: A simple probabilistic guesser for word corpora

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
Babbler.py		Babbler.py
Babbler.pyc		Babbler.pyc
README		README
dbConnector.py		dbConnector.py
nGram.py		nGram.py
nGramBabble.py		nGramBabble.py
nGramReader.py		nGramReader.py
nGramWriter.py		nGramWriter.py

Repository files navigation

Babbler
A simple n-gram babbler that spits out probabilitic word choices when given a corpus to prime itself on and an integer denoting the number of words to guess on.

To use, create an ngram Babbler by feeding it a file and an integer:

  b = Babbler("test.txt", 2)

Then successively call generate_next_word([list of words]) with the 
words that the Babbler is generating:

  word_list = []
  while [condition]:
       word_list.append(b.generate_next_word(word_list))
  return " ".join(word_list)


nGram and nGramReader
nGram is an object that holds information about a given number of words preceding a word
mapped to the word and its count in the corpus.  nGramReader reads from a corpus file
and stores relevant word progression information in a dictionary, which it then stores
as many nGram objects in a database transaction.

To use, create an nGramReader:
 
  n = nGramReader("test.txt", 2)

where the integer is the number of preceding words to read for each nGram.

Then call n.make_db_transactions() to create nGrams from the dictionary and load them
into a database (currently sqlite3).