In [2]:
from typing import TextIO
import random
def read_chunk(file:TextIO,chunk_size:int=1024)->str:
    while True:
        str_chunk = file.read(chunk_size)
        if not str_chunk:
            break
        yield str_chunk

In [3]:
#now develop some conditional memory
#a small class will help us with book keeping
class MarkovChain():
    EOD="\u0000"
    def __init__(self,window_size:int=1) -> None:
        assert window_size>0
        self._window_size=window_size
        self._memory={}
        self._current_buffer=[str(0x0) for _ in range(window_size+1)] #bootstrap with null chars
        self._generate_buffer=""
        for window in range(window_size+1):
            self._memory[window]={}

    def update(self,character:str)->None:
        assert len(character)==1
        self._current_buffer.append(character)
        self._current_buffer.pop(0)
        for i in range(1,self._window_size+2):
            self.update_item("".join(self._current_buffer[-i:]))


    def update_item(self,string:str)->None:
        window_index=len(string)-1
        if string in self._memory[window_index]:
            self._memory[window_index][string]+=1
        else:
            self._memory[window_index][string]=1

    def finish_training(self)->None:
        """used for end of document signal"""
        self.update(self.EOD)

    def _next_char(self)->str:
        base_count = self._memory[len(self._generate_buffer)-1].get(self._generate_buffer)


        if not base_count:
            raise ValueError("i've never seen this before :(")

        population = []
        counts=[]

        for string,count in self._memory[len(self._generate_buffer)].items():
            if string[0:len(self._generate_buffer)]==self._generate_buffer:
                population.append(string)
                counts.append(count)


        if sum(counts)!=base_count:
            error_msg=f""" COUNTS DON'T MATCH
            buffer={self._generate_buffer}
            count={base_count}
            count_array={counts}, sum={sum(counts)}
            pop={population}
            """
            raise RuntimeError(error_msg)
        return random.choices(population=population,weights=counts,k=1)[0][-1:]

    def generate_string(self,init:str,n=500)->str:
        self._generate_buffer=init[-self._window_size:]

        for _ in range(n):
            next_char = self._next_char()
            if next_char==self.EOD:
                break
            self._generate_buffer = (self._generate_buffer+next_char)[-self._window_size:]
            yield next_char

    def train(self,file_pointer:TextIO)->None:
        for str_chunk in read_chunk(file_pointer):
            for char in str_chunk:
                self.update(char)
        self.finish_training()

In [6]:
mc=MarkovChain(window_size=1)
with open("data/shakespeare.txt", mode="r", encoding="utf-8") as file_obj:
    mc.train(file_obj)

#"inference"
init_string = "B"
print(f'{init_string}{"".join([char for char in  mc.generate_string(init_string,1000)])}')

BELounsh mey Byemed,


The
Jur’

A.
Bout wil e fus lill:
Fed ireme heran iry  wse,
AN.
A s t ofu t sit ke,
ARCave!
VENUENERVare ayowe tht: s.
HALI A. he’d alepacye hagits t fowin, co? he
MLLOulllo JOuely bu ban an on, d I r PONEnch.
I he.
Han my; be y witheru aind capt dobur avends outo scerd wers, fiverambeepakee mse ginre choucateru stoithain y ithon hin tweriofer anve in st y adand rtode! aing  d avest d d ma-sapalisor id lurigoliry heshity pr, stlf fe l nth Don meber, is ontou  theilbeathe mas sey t d thond meer ad icear, g ayouthetedo LUSTESTaveau ild mit.

NUDER.
 kndd s tuth, akyrerod cof ace wis HExaks chin mooyoucishavedife wot MAnghe. blseer us t be tr’lerthe an, hof Sesouent, etht
Whe l inghy’d ee FIE.
OOLOI wirdid blo Wirs d inke.

Whe therowit Go tho HENoe st t yopis,
ANGAThin?
 he
Fr y HAFant cred anoul;

_Efoucond merde;
Hamioth n I upacteo k D.
Cadis ashef omag’ ’s;
Wh on-aslans baver
[_

Wet.

MBerr y uthee, er: he topetheph.

MAt e t fre to tay ELe,
Wandyotearlfay sus

In [7]:
mc=MarkovChain(window_size=2)
with open("data/shakespeare.txt", mode="r", encoding="utf-8") as file_obj:
    mc.train(file_obj)

#"inference"
init_string = "B"
print(f'{init_string}{"".join([char for char in  mc.generate_string(init_string,1000)])}')

Beiveris a me in. I lay,
No, thall owents efold what to by. Hugh I whom brou theire the gry
Isand sive, theight tiet wherieft the alood pervat thanny.

Butwer Quess

Whow, your he in the we swel prospe,
And I, Cing have, spir; herve, shop, do, shour lis’t fer jad so the sinks coney beat region ofter ings awn werem.
The a rues and my theed I wed him. Wor th’ ause anumpled may now, and houryoust swillou aring ong, frow, hatir. Wou de dring fonor on wer whim is you ch noby they not,
Theeralove nake’s kill th frove earruch I deake and_.] Whou of ithe led, East firld vipe.
I lood a frolls, wer’d my dicess, sat age in yount, wity con well no wor he of to for he Scer in. Albou sh knother comed, ing Proseup air mus niancues an he slef, gre mist, a same me lain fichat ing inset sto frow wall ads, agess blostake he ar wited,
Trut ther ther hall ing a on
Ford.

  And the lathencay, th cour me mall, fild bed.
I here Kin may re.

[Exity Clam th th trusinge a shat thours, Gody arid morragenter, noth

In [8]:
mc=MarkovChain(window_size=3)
with open("data/shakespeare.txt", mode="r", encoding="utf-8") as file_obj:
    mc.train(file_obj)

#"inference"
init_string = "B"
print(f'{init_string}{"".join([char for char in  mc.generate_string(init_string,1000)])}')

BUS.
Go be a Some II. Thusks fool all there crown?

DES.
’Twer, eit, andings with to dividen death wards,
wheets his thin.

QUICK.
Tell, a gone the dearn of sugard
By thing agless.

GUILDEN.
Peach not.

VALENANDARD.
He in me at not do appy the and wool wide._]

IAGO.
Which and printo thundinions,
Thank you have dult,
Or get thinks fath you wit a lies
Fame,
And so it must wis devilinster Kather?

CHIRD SERVANS


 [_Exeunury sirrow
Thy come the shion stay stor, dolph.
Which you keep to you, and you: wer’s honought.

HELINE.
That supposes my he watch,
Then
his greed, all hou we heartious just was quet my unselence cour comeo more true.

PANDARUS.
On all hears,
And life thank your sould no of therefore, sistren, and enes chilst I know, and be made._]

Enter moisons
Stand it quitfullight I left play erry emple bridest.
Were in withat speakingbrows on your fromissue thing’s do he go?

QUEEN ELIA.
Thy coung fore him ther’s grief heath dothe Good maid
ends duty in the partethem, offect;
Forc’d

In [9]:
mc=MarkovChain(window_size=4)
with open("data/shakespeare.txt", mode="r", encoding="utf-8") as file_obj:
    mc.train(file_obj)

#"inference"
init_string = "B"
print(f'{init_string}{"".join([char for char in  mc.generate_string(init_string,1000)])}')

Be-mong break of later times meeting; I reasy nervant, as is shamed with thou evenge to these speak; for the back; and he shall dishop of cushioned with fair. Prince my saying; but a stol’n
The stood or and servant of
her innovators,
Whethere bound they well
But not wilt waves Bolings he wish humiliard, and let thou were army own ground, and meland.

[_Reads well him to me, my lord?

ROMEO.
What wenty cannot bloody month’s are love on here might. [_Sets of he will can I will, what likely sight
I’ll take which I did shed steeming? Such him!

VOLUMNIA.
Than alread of true ass,
Mean to him up they will
And husbandys.

HORTENSIO.
Sir Toby at how myself.
Two see.

FALSTAFF.
Poor wise.

RICHARD.
The swears
Came by his eye;
But not and knows,
Torrectly.
CHATILLIAM PAGE.
And wake: ’tis that new news end,
  Legg’d to Flore.
But I’ll that I would spirit
Think thy day; but thing that and both where afearl of a dividual fine eyes do throughter,
Will tears. Coming for serve Octavia._] I am too. Tha