# Generating tweets with a markov chain

Using:

- spacy for tokenization
- markovify for markov chain

Characteristics:

- Markov chain means all words come from the corpus,
  and I _believe_ at least all 2-grams (consecutive pairs of words)
  all occur in the text.
- This also means variety is severely limited by the corpus.

In [64]:
from itertools import chain


import markovify
import re
import spacy

with open("corpus.txt") as f:
    text = f.read()

nlp = spacy.load("en_core_web_sm")

In [65]:
START = "HORKRIMS "
END = "ENDHORKRIMS\n\n"

In [103]:
class TweetText(markovify.Text):
    def word_split(self, sentence):
        return list(
            chain.from_iterable(
                (word.orth_,)
                for word in nlp(sentence)
                # (word.orth_, word.whitespace_) for word in nlp(sentence)
            )
        )

    def word_join(self, words):
        chunks = []
        for i, word in enumerate(words):
            if not (
                i == 0
                or (word != "'" and "'" in word)
                or word == "na"
                or (len(word) == 1 and word in ".,:;!?")
            ):
                # avoid injecting space before punctuation or 's fragments
                chunks.append(" ")
            chunks.append(word)
        return "".join(chunks)

    def sentence_split(self, text):
        return [s.replace(START, "") for s in text.split(END)]


models = {size: TweetText(text, state_size=size) for size in range(1, 5)}

In [105]:
for i in range(5):
    for size, model in models.items():
        print(size, model.make_short_sentence(280, max_overlap_ratio=0.7))
    print("")

1 it's like a nectarine when one that * introduction to feet.
2 it is all the grablets that it expressed to touch it all day and nothing will happen!
3 it looks like a tentacle
4 Maybe this is the only thing that makes it weird

1 Get all house. We exist in places his bulk is minimized
2 this bizarre artifact you have invented, you can live inside
3 Give us something that isn't a lizard in dough, it's a combination of a centipede and * hell *
4 None

1 it could save more than Lolth is very little effort of yourself extruded
2 all the snakes come out of them are now a tiny amethyst flower bloom. But it is * magnificent *
3 Imagine if you will, a * dark * job orientation video.
4 we do have a decorative hanging cat — it's like an * even lazier * cat

1 You hear the two *, gaze.
2 imagine if you will — obviously there's a piece of demon fruit
3 It's back, like a grub — like a monomolecular cheese slicer
4 It's like a bat, but it's not as bad as you expect!

1 Whatever that you're talking 

In [38]:
model.make_sentence_with_start??

[0;31mSignature:[0m [0mmodel[0m[0;34m.[0m[0mmake_sentence_with_start[0m[0;34m([0m[0mbeginning[0m[0;34m,[0m [0mstrict[0m[0;34m=[0m[0;32mTrue[0m[0;34m,[0m [0;34m**[0m[0mkwargs[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mSource:[0m   
    [0;32mdef[0m [0mmake_sentence_with_start[0m[0;34m([0m[0mself[0m[0;34m,[0m [0mbeginning[0m[0;34m,[0m [0mstrict[0m[0;34m=[0m[0;32mTrue[0m[0;34m,[0m [0;34m**[0m[0mkwargs[0m[0;34m)[0m[0;34m:[0m[0;34m[0m
[0;34m[0m        [0;34m"""[0m
[0;34m        Tries making a sentence that begins with `beginning` string,[0m
[0;34m        which should be a string of one to `self.state` words known[0m
[0;34m        to exist in the corpus.[0m
[0;34m[0m
[0;34m        If strict == True, then markovify will draw its initial inspiration[0m
[0;34m        only from sentences that start with the specified word/phrase.[0m
[0;34m[0m
[0;34m        If strict == False, then markovify will draw its initial i

In [92]:
m = models[3]
m.make_sentence?

[0;31mSignature:[0m [0mm[0m[0;34m.[0m[0mmake_sentence[0m[0;34m([0m[0minit_state[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m [0;34m**[0m[0mkwargs[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Attempts `tries` (default: 10) times to generate a valid sentence,
based on the model and `test_sentence_output`. Passes `max_overlap_ratio`
and `max_overlap_total` to `test_sentence_output`.

If successful, returns the sentence as a string. If not, returns None.

If `init_state` (a tuple of `self.chain.state_size` words) is not specified,
this method chooses a sentence-start at random, in accordance with
the model.

If `test_output` is set as False then the `test_sentence_output` check
will be skipped.

If `max_words` or `min_words` are specified, the word count for the
sentence will be evaluated against the provided limit(s).
[0;31mFile:[0m      ~/conda/lib/python3.9/site-packages/markovify/text.py
[0;31mType:[0m      method


In [99]:
models[3].make_sentence(("Have", "you", "ever"), test_output=False)

' Have you ever seen a ... like a butterfly that has emerged from its cocoon?'

In [81]:
for i in range(5):
    for size, model in models.items():
        start = " ".join("Have you ever".split()[:size])
        # if size < 2:
        #     continue
        try:
            print(size, start)
            print(size, model.make_sentence_with_start(start))
        except Exception as e:
            print(e)
    print("")

2 Have you
2  Have you ever seen Turbo Teen? It defines everything!
3 Have you ever
`make_sentence_with_start` can't find sentence beginning with Have you ever
4 Have you ever
`make_sentence_with_start` can't find sentence beginning with Have you ever
5 Have you ever
`make_sentence_with_start` can't find sentence beginning with Have you ever

2 Have you
2  Have you ever had a cowcatcher - type attachment for this occurrence. I mean, * we're gonna start? She's very confused.
3 Have you ever
3  Have you ever, let's see ... I think it's called ... Crenellations? The Castle Hat.
4 Have you ever
`make_sentence_with_start` can't find sentence beginning with Have you ever
5 Have you ever
`make_sentence_with_start` can't find sentence beginning with Have you ever

2 Have you
2  Have you ever seen ... crabs?   # dnd # cteam # acqinc # dnd # cteam # acqinc # dnd # cteam
3 Have you ever
3  Have you ever played with a gyroscope? And, like, a gear — and gears don't have.
4 Have you ever
`make_sente

In [79]:
models[3].chain.model[('Have', 'you', 'ever')]

{',': 1,
 'had': 6,
 'been': 4,
 'seen': 11,
 'done': 1,
 'cracked': 1,
 'played': 2,
 'smelled': 1,
 'tried': 1}