where is the math? #140

randomgambit · 2020-07-26T20:57:20Z

Hi there and thanks for this beautiful package!! i was curious to understand more the inner workings of the package. Is there a pdf that explains how you use Markov chains exactly to create a model (with some equations)?

thanks!

jsvine · 2020-08-02T18:19:17Z

Thanks for the kind words, @randomgambit! I'm not a trained mathematician, so I am not the most authoritative person to answer this, but I believe that this chapter of Grinstead and Snell provides a helpful overview: https://www.dartmouth.edu/~chance/teaching_aids/books_articles/probability_book/Chapter11.pdf

Does that answer your question?

randomgambit · 2020-08-02T18:51:15Z

thank you! yes i know how markov chains work but I am not sure how do you implement them in the package. what is the transition matrix, how many states, etc. Can you please share more details?

thanks!

jsvine · 2020-08-02T18:54:06Z

Hi @randomgambit. The state size defaults to 2 but is configurable. The transition matrix is calculated in markovify.chain.Chain.

randomgambit · 2020-08-02T18:59:03Z

got it, but it is a bit hard to understand what is going on by just looking at the code. I think it would immensely help if you could write a small example of how the algo works (by using, say, a corpus of just two simple sentences).

jsvine · 2020-08-05T03:25:57Z

Would something like this satisfy your curiosity?:

For text corpora, Markovify begins the process of building its Markov models by splitting each corpus into a series of sentences, and each sentence into a series of "tokens" (i.e., words, plus placeholder tokens for the beginning and end of a sentence).

Then, walking sentence-by-sentence, it identifies every sequence of state_size (default: 2) tokens, and calculates how many times any other token comes immediately afterward. This dictionary of corpus[(token_a, token_b, ...)][next_token]: count frequencies comprise the "transition matrix" of the resulting Markov model.

Thus, "Janice walked to the park. After that, Janice walked to the zoo." becomes:

{
 ('___BEGIN__', '___BEGIN__'): {'After': 1, 'Janice': 1},
 ('___BEGIN__', 'After'): {'that,': 1},
 ('___BEGIN__', 'Janice'): {'walked': 1},
 ('After', 'that,'): {'Janice': 1},
 ('Janice', 'walked'): {'to': 2},
 ('that,', 'Janice'): {'walked': 1},
 ('the', 'park.'): {'___END__': 1},
 ('the', 'zoo.'): {'___END__': 1},
 ('to', 'the'): {'park.': 1, 'zoo.': 1},
 ('walked', 'to'): {'the': 2}}
}

When Markovify attempts to generate a new sentence, it randomly chooses each successive token using these frequency-based probabilities.

randomgambit · 2020-08-05T03:36:50Z

super interesting!!! thanks! I really think you should include this example on the main page.

I am just trying to figure out the exact size of the transition matrix in the example above... can we write it down or there are too many rows/columns?

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

where is the math? #140

where is the math? #140

randomgambit commented Jul 26, 2020

jsvine commented Aug 2, 2020 •

edited

randomgambit commented Aug 2, 2020

jsvine commented Aug 2, 2020

randomgambit commented Aug 2, 2020

jsvine commented Aug 5, 2020 •

edited

randomgambit commented Aug 5, 2020

where is the math? #140

where is the math? #140

Comments

randomgambit commented Jul 26, 2020

jsvine commented Aug 2, 2020 • edited

randomgambit commented Aug 2, 2020

jsvine commented Aug 2, 2020

randomgambit commented Aug 2, 2020

jsvine commented Aug 5, 2020 • edited

randomgambit commented Aug 5, 2020

jsvine commented Aug 2, 2020 •

edited

jsvine commented Aug 5, 2020 •

edited