# Replicating Markov’s Analysis of Dependent Data and the Law of Large Numbers
This notebook replicates Andrey Markov’s classic analysis demonstrating that dependent random variables can still satisfy the Law of Large Numbers. By modeling vowel–consonant transitions in natural language as a first-order Markov chain, we empirically examine whether long-run letter frequencies remain stable despite dependence.

## Data collection
The following script processes Fitzgerald’s *The Great Gatsby*, counting vowels and consonants and recording transition frequencies between adjacent letters (vowel → vowel, vowel → consonant, consonant → vowel, and consonant → consonant).

In [1]:
with open("input.txt", "r", encoding="utf-8") as f: #Use a text file as input
    book = f.read()
#First three pages used as a test is below
#book = "In my younger and more vulnerable years my fathergave me some advice that I’ve been turning overin my mind ever since.“Whenever you feel like criticising any one,” hetold me, ‘‘just remember that all the people in thisworld haven’t had the advantages that you’ve had.”He didn’t say any more, but we’ve always beenunusually communicative in a reserved way, and Iunderstood that he meant a great deal more thanthat. In consequence, I’m inclined to reserve alljudgments, a habit that has opened up many curious natures to me and also made me the victim ofnot a few veteran bores. The abnormal mind isquick to detect and attach itself to this qualitywhen it appears in a normal person, and so it cameabout that in college I was unjustly accused ofbeing a politician, because I was privy to the secretgriefs of wild, unknown men. Most of the confidenceswere unsought—frequently I have feigned sleep,preoccupation, or a hostile levity when I realized bysome unmistakable sign that an intimate revelationwas quivering on the horizon; for the intimaterevelations of young men, or at least the terms in which they express them, are usually plagiaristicand marred by obvious suppressions. Reservingjudgments is a matter of infinite hope. I am still alittle afraid of missing something if I forget that, asmy father snobbishly suggested, and I snobbishlyrepeat, a sense of the fundamental decencies is parcelled out unequally at birth.And, after boasting this way of my tolerance, Icome to the admission that it has a limit. Conductmay be founded on the hard rock or the wet marshes,but after a certain point I don’t care what it’sfounded on. When I came back from the East lastautumn I felt that I wanted the world to be inuniform and at a sort of moral attention forever;I wanted no more riotous excursions with privilegedglimpses into the human heart. Only Gatsby, theman who gives his name to this book, was exemptfrom my reaction—Gatsby, who represented everything for which I have an unaffected scorn. If personality is an unbroken series of successful gestures,then there was something gorgeous about him, someheightened sensitivity to the promises of life, as ifhe were related to one of those intricate machinesthat register earthquakes ten thousand miles away.This responsiveness had nothing to do with thatflabby impressionability which is dignified under thename of the ‘creative temperament”—it was anextraordinary gift for hope, a romantic readinesssuch as I have never found in any other person and which it is not likely I shall ever find again. No—Gatsby turned out all right at the end; it is whatpreyed on Gatsby, what foul dust floated in thewake of his dreams that temporarily closed out myinterest in the abortive sorrows and short-windedelations of men.My family have been prominent, well-to-do people in this Middle Western city for three generations.The Carraways are something of a clan, and we havea tradition that we’re descended from the Dukesof Buccleuch, but the actual founder of my line wasmy grandfather’s brother, who came here in fiftyone, sent a substitute to the Civil War, and startedthe wholesale hardware business that my father carries on to-day.I never saw this great-uncle, but I’m supposed tolook like him—with special reference to the ratherhard-boiled painting that hangs in father’s office.I graduated from New Haven in 1915, just a quarter of a century after my father, and a little later Iparticipated in that delayed Teutonic migrationknown as the Great War. I enjoyed the counterraid so thoroughly that I came back restless. Instead of being the warm centre of the world, theMiddle West now seemed like the ragged edge ofthe universe—so I decided to go East and learn thebond business. Everybody I knew was in the bondbusiness, so I supposed it could support one more"
prev = book[0]
def isVowel(char):
    return char.lower() in "aeiou"
def counter(char1,char2):
    global vv,vc,cv,cc
    if isVowel(char1) and isVowel(char2):
        vv+=1
    if isVowel(char1) and not isVowel(char2):
        vc+=1
    if not isVowel(char1) and isVowel(char2):
        cv+=1
    if not isVowel(char1) and not isVowel(char2):
        cc+=1
vv = 0
vc = 0
cv = 0
cc = 0
vowels = 0
consonants = 0
for char in book[1:]:
    if char.isalpha():
        if isVowel(char):
            vowels+=1
        else:
            consonants+=1
        counter(prev,char)
        prev = char
print("There are " + str(vowels + consonants) + " letters.")
print("There are " + str(vowels) + " vowels.")
print("There are " + str(consonants) + " consonants.")
print("Vowel to vowel: " + str(vv))
print("Vowel to consonant: " + str(vc))
print("Consonant to vowel: " + str(cv))
print("Consonant to consonant: " + str(cc))
    

There are 206190 letters.
There are 78120 vowels.
There are 128070 consonants.
Vowel to vowel: 11207
Vowel to consonant: 66914
Consonant to vowel: 66913
Consonant to consonant: 61156


## Probability Estimation
Using the observed transition counts, we compute empirical probabilities for each vowel–consonant transition.

In [2]:
vtotal = vv+vc
ctotal = cv+cc
vv_P = vv/vtotal
vc_P = vc/vtotal
cv_P = cv/ctotal
cc_P = cc/ctotal

print("Vowel to vowel probability: " + str(vv_P) + "\nVowel to consonant probability: " + str(vc_P) + "\nConsonant to vowel probability: " + str(cv_P) + "\nConsonant to consonant probability: " + str(cc_P))

Vowel to vowel probability: 0.14345694499558378
Vowel to consonant probability: 0.8565430550044162
Consonant to vowel probability: 0.5224761651922011
Consonant to consonant probability: 0.4775238348077989


## Markov Chain Simulation
Using these transition probabilities, we construct a first-order Markov chain and generate synthetic letter sequences. We then compare the resulting vowel and consonant frequencies to those observed in the original text.

In [3]:
import random

curr = "v"

markovVowels = 1
markovConsonants = 0

for i in range(vowels + consonants):
    if curr=="v":
        if random.random()<vv_P:
            markovVowels+=1
            curr = "v"
        else:
            markovConsonants+=1
            curr = "c"
    else:
        if random.random()<cv_P:
            markovVowels+=1
            curr = "v"
        else:
            markovConsonants+=1
            curr = "c"

print("There are " + str(vowels) + " vowels and the markov chain predicted " + str(markovVowels))
print("There are " + str(consonants) + " consonants and the markov chain predicted " + str(markovConsonants))

There are 78120 vowels and the markov chain predicted 78239
There are 128070 consonants and the markov chain predicted 127952


## Conclusion
This experiment supports Markov’s result by showing that dependent letter sequences generated by a Markov chain reproduce stable long-run frequency behavior.