Skip to content

πŸ§™β€β™‚οΈβ›“πŸ’ Text generator using a Markov Chain, model trained on the Lord of The Rings Trilogy. Python 3.6

Notifications You must be signed in to change notification settings

nwilliams770/markov-of-middle-earth

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

25 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Lord of the Markov

Markov of Middle-Earth

A Markov chain implementation, trained on J. R. R. Tolkien's Lord of the Rings trilogy.

Usage

(python markov.py) sentence <int>  | sentence of <int> length
(python markov.py) sentences <int> | <int> # of sentences

Features & Technologies

  • Trains a reusable model on any .txt corpus
  • Ability to produce an n-length sentence or n sentences
  • Python 3.6

Implementation

  • Build Phase:
    • In order to generate a model with a more structured sense of syntax, the corpus is broken into sentence arrays and sandwiched with "START" and "END" markers. Including these markers in our training data ultimately limits what words and characters will be used to start and terminate sentences when we want to generate sentences.

    • Our model is a simple dict() and as sentences are parsed word by word, each unique word is assigned as a key in the model where the value is a probability distribution represented as a FrequencyGram

    • For a given word, the FrequencyGram counts the frequencies of all words that directly follow it. From this, we can derive the current state and possible outcomes for any given word

    • Once we've iterated over the entire corpus, our model will be a dict where each key is a unique word, marker, or punctuation mark and each value is it's corresponding FrequencyGram

      FrequencyGram for "twilight"
      "twilight": {
          "the": 1,
          ",": 2,
          "in": 1,
          "came": 1,
          "as": 1,
          "under": 1,
          ".": 1 
      }
      
    • FrequencyGram has some added methods that generate a probability distribution and returns a weighted random word based on it. This is how we'll traverse from state to state.

  • Generative Phase:
    • Generating a sentence:
      • We'll first generate a starter word by requesting a random word from our "START" FrequencyGram: sentence_starter = markov_model["START"].return_weighted_rand_word()
      • From here, we just follow the chain. We can track our current word and update it accordingly as we request a random word from the corresponding FrequencyGram
      • We'll know that our sentence is complete once our current word is the "END" marker.

Examples

  • Below are some of the best examples generated so far. The model tends to struggle with generating clear dialogue blocks but some of the smaller sentences were impressive! Lord of the Markov Examples

About

πŸ§™β€β™‚οΈβ›“πŸ’ Text generator using a Markov Chain, model trained on the Lord of The Rings Trilogy. Python 3.6

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages