# Generating tweets with a markov chain

Using:

- spacy for tokenization
- markovify for markov chain

Characteristics:

- Markov chain means all words come from the corpus,
  and I _believe_ at least all 2-grams (consecutive pairs of words)
  all occur in the text.
- This also means variety is severely limited by the corpus.

In [17]:
from itertools import chain


import markovify
import re
import spacy

with open("tweet-corpus.txt") as f:
    text = f.read()

nlp = spacy.load("en_core_web_sm")

In [20]:
START = "HORKRIMS "
END = "ENDHORKRIMS\n\n"

In [23]:
class TweetText(markovify.Text):
    def word_split(self, sentence):
        return list(
            chain.from_iterable(
                (word.orth_, word.whitespace_) for word in nlp(sentence)
            )
        )

    def word_join(self, words):
        return "".join(words)

    def sentence_split(self, text):
        return [s.replace(START, "") for s in text.split(END)]


models = {
    size: TweetText(text, state_size=size)
    for size in range(2, 6)
}

In [24]:
for i in range(5):
    for size, model in models.items():
        print(size, model.make_short_sentence(280))
    print("")

2 you not good reason - at Thanksgiving, parts of this. It's sad.
3 the whole body image — that? The old motherfuckers on a sexually transmitted infection
4 You start to hear a sort of inverse pet continuum
5 I will do as you've found something that looks like a tentacle

2 like people at best thought of begins to these motherfuckers on him the sort of the back on this whole belly all* gift shop.
3 I should have business that you don't know what I had a smile this gaze
4 Lo and behold, it's just *ehhhh* over the whole affair
5 your words of direction and comfort to this *dangerous* mother

2 It's wearing a natural phenomenon.
3 Oh, where you think that her densely knotted vine substructure is a person's like doctor shopping, which must just starts out of begins to be.
4 It was like Creatine. We've had pouches — it's a combination of a dragon scale cape — imagine a *big sheet*
5 At maximum volume, you want to protecc!*

2 we're revealing skin is free to do come off? It may be eight or s