Skip to content

A small Markov Chain lib to generate logorrea-looking text (that is, nonsense)

License

Notifications You must be signed in to change notification settings

skurmedel/wordsalad

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

46 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This library is currently being cleaned up for public consumption, at the moment it is not very usable.

wordsalad

A small Python module for generating nonsense texts from a source text.

It generates a Markov chain after tokenising the input, picking a path at random and concatenating the visited words.

General use case

  • A corpus is tokenised, either with the provided tools, or custom ones. The actual Markov chain abstraction is in fact generally type agnostic, so groups of words can be entered as well (this often makes the text more plausible.)
  • Using a WordSaladMatrixBuilder and count_follower we note what word follows what.
  • When all words have been entered, we get a WordSaladMatrix using build_matrix.
  • One or more sentences are generated by picking a "start word", choosing a random number and then picking a follower based on their weights.
  • The above step is repeated until some stop condition occurs (stopping on . usually works well.)

Terminology

The terminology used in the library follows:

Term Explanation
corpus The text material we "train" the markov chains on.
word A word is simply a unit found in the input corpus, it can be a single character, a group of characters, or whatever.
follower A word (see above) that follows another word.

Internal details

The WordSaladMatrix class uses a sparse numpy matrix to encode the Markov chains.

Dependencies

  • numpy (used for the nice sparse matrices it provides)
  • flask (for a planned standalone web interface)

Standalone?

TBD

About

A small Markov Chain lib to generate logorrea-looking text (that is, nonsense)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages