# Stupid Art Project 2017 - Random Holy Text
To fit the stupid art theme "Religion" and my current focus on machine learning, I'm going to generate some random 

The code below was picked up from this site:
http://agiliq.com/blog/2009/06/generating-pseudo-random-text-with-markov-chains-u/
It'll be my first attempt at random holy text generation.

In [5]:
import random


class Markov(object):
    def __init__(self, open_file):
        self.cache = {}
        self.open_file = open_file
        self.words = self.file_to_words()
        self.word_size = len(self.words)
        self.database()

    def file_to_words(self):
        self.open_file.seek(0)
        data = self.open_file.read()
        words = data.split()
        return words

    def triples(self):
        """ Generates triples from the given data string. So if our string were
                "What a lovely day", we'd generate (What, a, lovely) and then
                (a, lovely, day).
        """

        if len(self.words) < 3:
            return

        for i in range(len(self.words) - 2):
            yield (self.words[i], self.words[i + 1], self.words[i + 2])

    def database(self):
        for w1, w2, w3 in self.triples():
            key = (w1, w2)
            if key in self.cache:
                self.cache[key].append(w3)
            else:
                self.cache[key] = [w3]

    def generate_markov_text(self, size=25):
        seed = random.randint(0, self.word_size - 3)
        seed_word, next_word = self.words[seed], self.words[seed + 1]
        w1, w2 = seed_word, next_word
        gen_words = []
        for i in range(size):
            gen_words.append(w1)
            w1, w2 = w2, random.choice(self.cache[(w1, w2)])
        gen_words.append(w2)

        return ' '.join(gen_words)

I'll use the old testament first. I got the text from project Gutenberg, and have removed the Gutenberg preamble. I suspect the text will be too big to run efficiently in the code above, as it is just straight python. We shall see.

In [6]:
with open('sources/old testament.txt') as bible:
    bible_gen = Markov(bible)

In [10]:
bible_gen.generate_markov_text()

"cast into prison. 3:25 Then there arose to judgment, to save it, for mine holy name's sake, said, Let them not depart from them. 5:7 thus"

In [9]:
%%timeit
bible_gen.generate_markov_text()

10000 loops, best of 3: 39.6 µs per loop


It works, and it is fast. But the punctuation and the passage numbering are not the best. I'll try post processing the responses.

Lets first treat the passage numbers appearing mid passage, things like '7:12'. 

In [11]:
import re

'7:12' appearing at the beginning of a passage is desirable actually, so I'd liek to keep those, and possibly force those later. It is the other passage numbers appearing in the middle or end of passages that we want to avoid. Those patterns are guaranteed to have a space in front of them, ie ' 7:12', so I can regex to find and replace using that pattern.

In [26]:
passage_num_pattern = re.compile(r' \d+:\d+')

In [29]:
for _ in range(3):
    passage = bible_gen.generate_markov_text()
    print(passage)
    passage = passage_num_pattern.sub('', passage)
    print(passage)
    

this burden. 14:29 Rejoice not against the LORD came unto the house of the men which are done unto all words that ye do this upon
this burden. Rejoice not against the LORD came unto the house of the men which are done unto all words that ye do this upon
he also had the cities for defence in Judah. Rehoboam was one silver bowl of seventy shekels, after the manner of fat, of ox, or dead
he also had the cities for defence in Judah. Rehoboam was one silver bowl of seventy shekels, after the manner of fat, of ox, or dead
And Mizraim begat Ludim, and Anamim, and Lehabim, and Naphtuhim, 10:14 And the word of the milk of the guard gave him to the hold his
And Mizraim begat Ludim, and Anamim, and Lehabim, and Naphtuhim, And the word of the milk of the guard gave him to the hold his


But I want to create random passages, so they need to start with a passage number, and contain a few "complete" sentences, ie they end in a period. I think I need to alter the generating method, and I think the alterations will be particular to the old testament. So I'll extend the Markov class with a generating method specialy for this text of the old testament. 

In [65]:
class OldTestaPassagesMarkov(Markov):
    passage_num_pattern = re.compile(r'\d+:\d+')
    
    def generate_markov_text(self, seed_word='', min_words=25):
        
        # Process a user given seed_word
        seed_word_locations = [idx for idx, word in enumerate(self.words) if word.lower() == seed_word.lower()]
            
        if seed_word_locations:
            seed = random.choice(seed_word_locations)
        else:
            print(seed_word + ' is not in Old Testament')
            seed = random.randint(0, self.word_size - 3)
            
        w1, w2 = self.words[seed], self.words[seed + 1]
        gen_words = [w1, w2]
        # go until we have enough words and end in a period
        while w2[-1] != '.' or len(gen_words) < min_words:
            w1, w2 = w2, random.choice(self.cache[(w1, w2)])
            if passage_num_pattern.findall(w2):
                # Avoid adding passage numbers to the middle of the passage.
                # Also end a sentence when a passage number would have gone in.
                new_w1 = w1.replace(':', '.').replace(';', '.')
                gen_words[-1] = new_w1
            else:                
                gen_words.append(w2)

        return ' '.join(gen_words)

In [66]:
with open('sources/old testament.txt') as bible:
    bible_gen = OldTestaPassagesMarkov(bible)

In [70]:
bible_gen.generate_markov_text('12:13')

'12:13 With him is plenteous redemption. And he slew all the days that are upon her: and happy is that to the multitudes, he went and gathered the multitude crying aloud began to publish it much, and bring them in, and laid before their eyes.'

In [None]:
bible