## Probability and Coding

#### 1. Conditional Probability and Indepdenence

1. **Probability** 

    $\displaystyle \LARGE \Pr(A)\quad \textrm{or} \quad\Pr(X=x)$<br><br>
    
2. **Conditional Probability** 

    $\displaystyle \Huge \Pr(\;A\,|\,B\;)\quad$ or $\quad\Pr(\; Y=y\,|\,X=x\;)$<br>
    
    ChatBots are something like the following specifications...

    1. **Markov**: $\Pr(\; W_{i+1}=w_{i+1}\,|\,W_i=w_i\;)$  
    2. **Bigram**: $\Pr(\; W_{i+2}=w_{i+2}\,|\, W_{i+1}=w_{i+1}, W_i=w_i\;)$  
    3. **Trigram**: $\Pr(\; W_{i+3}=w_{i+3} \,|\, W_{i+2}=w_{i+2}, W_{i+1}=w_{i+1}, W_i=w_i\;)$ 
    4. **Context**: $\Pr(\; W_{i+3}=w_{i+3} \,|\, W_{i+1}=w_{i+1}, W_i=w_i, C=c\;)$<br><br>

3. **Independence** 

    $\displaystyle \Huge \Pr(A)=\Pr(\;A\,|\,B\;)\quad$ or $\quad\Pr(Y=y) = \Pr(\; Y=y\,|\,X=x\;)$

#### 2. Multinomial distributions

1. `from scipy import stats`
2. `stats.multinomial(p=probability, n=categories).rvs(size=attempts)`
3. `import numpy as np` and `np.array()`
4. `np.random.seed(initialization)` and `np.random.choice(options, size=draws, replace=True, p=None)`

#### 3. python string manipulation for a Markovian ChatBot

- `avatar.dtypes` and `df.col.str.upper()`
    - `.replace` and `import re` "regular expressions" ("regexp") are demonstrated but will not be tested 
- **Operator overloading** `+` and `.sum().split(' ')`
- `for i in range(n)` and `for x in lst` and `for i,x in enumerate(lst)`
- `list()` and `dict()`
- `if`/`else`


In [2]:
import pandas as pd
url = "https://raw.githubusercontent.com/KeithGalli/pandas/master/pokemon_data.csv"
# fail https://github.com/KeithGalli/pandas/blob/master/pokemon_data.csv
pokeaman = pd.read_csv(url)
pokeaman

Unnamed: 0,#,Name,Type 1,Type 2,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,1,Bulbasaur,Grass,Poison,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,80,82,83,100,100,80,1,False
3,3,VenusaurMega Venusaur,Grass,Poison,80,100,123,122,120,80,1,False
4,4,Charmander,Fire,,39,52,43,60,50,65,1,False
...,...,...,...,...,...,...,...,...,...,...,...,...
795,719,Diancie,Rock,Fairy,50,100,150,100,150,50,6,True
796,719,DiancieMega Diancie,Rock,Fairy,50,160,110,160,110,110,6,True
797,720,HoopaHoopa Confined,Psychic,Ghost,80,110,60,150,130,70,6,True
798,720,HoopaHoopa Unbound,Psychic,Dark,80,160,60,170,130,80,6,True


In [3]:
pokeaman.sort_values('Attack', ascending=False)

Unnamed: 0,#,Name,Type 1,Type 2,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
163,150,MewtwoMega Mewtwo X,Psychic,Fighting,106,190,100,154,100,130,1,True
232,214,HeracrossMega Heracross,Bug,Fighting,80,185,115,40,105,75,2,False
424,383,GroudonPrimal Groudon,Ground,Fire,100,180,160,150,90,90,3,True
426,384,RayquazaMega Rayquaza,Dragon,Flying,105,180,100,180,100,115,3,True
429,386,DeoxysAttack Forme,Psychic,,50,180,20,180,20,150,3,True
...,...,...,...,...,...,...,...,...,...,...,...,...
139,129,Magikarp,Water,,20,10,55,15,20,80,1,False
261,242,Blissey,Normal,,255,10,10,75,135,55,2,False
230,213,Shuckle,Bug,Rock,20,10,230,10,230,5,2,False
121,113,Chansey,Normal,,250,5,5,35,105,50,1,False


In [4]:
pokeaman[pokeaman['Legendary']]

Unnamed: 0,#,Name,Type 1,Type 2,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
156,144,Articuno,Ice,Flying,90,85,100,95,125,85,1,True
157,145,Zapdos,Electric,Flying,90,90,85,125,90,100,1,True
158,146,Moltres,Fire,Flying,90,100,90,125,85,90,1,True
162,150,Mewtwo,Psychic,,106,110,90,154,90,130,1,True
163,150,MewtwoMega Mewtwo X,Psychic,Fighting,106,190,100,154,100,130,1,True
...,...,...,...,...,...,...,...,...,...,...,...,...
795,719,Diancie,Rock,Fairy,50,100,150,100,150,50,6,True
796,719,DiancieMega Diancie,Rock,Fairy,50,160,110,160,110,110,6,True
797,720,HoopaHoopa Confined,Psychic,Ghost,80,110,60,150,130,70,6,True
798,720,HoopaHoopa Unbound,Psychic,Dark,80,160,60,170,130,80,6,True


In [5]:
pokeaman.dtypes

#              int64
Name          object
Type 1        object
Type 2        object
HP             int64
Attack         int64
Defense        int64
Sp. Atk        int64
Sp. Def        int64
Speed          int64
Generation     int64
Legendary       bool
dtype: object

In [6]:
pokeaman[(pokeaman['Type 2']=="Ghost") ]
pokeaman[(pokeaman['Type 1']=="Ghost") & (pokeaman['Type 2']=="Ghost")]
pokeaman[(pokeaman['Type 1']=="Ghost") | (pokeaman['Type 2']=="Ghost")]


Unnamed: 0,#,Name,Type 1,Type 2,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
99,92,Gastly,Ghost,Poison,30,35,30,100,35,80,1,False
100,93,Haunter,Ghost,Poison,45,50,45,115,55,95,1,False
101,94,Gengar,Ghost,Poison,60,65,60,130,75,110,1,False
102,94,GengarMega Gengar,Ghost,Poison,60,65,80,170,95,130,1,False
215,200,Misdreavus,Ghost,,60,60,60,85,85,85,2,False
316,292,Shedinja,Bug,Ghost,1,90,45,30,30,40,3,False
326,302,Sableye,Dark,Ghost,50,75,75,65,65,50,3,False
327,302,SableyeMega Sableye,Dark,Ghost,50,85,125,85,115,20,3,False
385,353,Shuppet,Ghost,,44,75,35,63,33,45,3,False
386,354,Banette,Ghost,,64,115,65,83,63,65,3,False


In [7]:
pokeaman[(pokeaman['Attack']<70) & (pokeaman['Defense']>120)]

Unnamed: 0,#,Name,Type 1,Type 2,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
103,95,Onix,Rock,Ground,35,45,160,30,45,70,1,False
150,139,Omastar,Rock,Water,70,60,125,115,70,55,1,False
230,213,Shuckle,Bug,Rock,20,10,230,10,230,5,2,False
323,299,Nosepass,Rock,,30,45,135,45,90,30,3,False
456,411,Bastiodon,Rock,Steel,60,52,168,47,138,30,4,False
528,476,Probopass,Rock,Steel,60,55,145,75,150,40,4,False
591,531,AudinoMega Audino,Normal,Fairy,103,60,126,80,126,50,5,False
624,563,Cofagrigus,Ghost,,58,50,145,95,105,30,5,False
751,681,AegislashShield Forme,Steel,Ghost,60,50,150,50,150,60,6,False
773,703,Carbink,Rock,Fairy,50,50,150,50,150,50,6,False


In [8]:
logical_conditional = (pokeaman['Attack']<70) & (pokeaman['Defense']>120)
df[logical_conditional]
df[['col1', 'col2']]
df[4:10]
df.iloc[4:10, 3:5]
df.loc[4:10, ['col1', 'col2']]
df[4:10][['col1', 'col2']]
df.loc[logical_conditional, ['col1', 'col2']]


NameError: name 'df' is not defined

In [3]:
import numpy as np
from scipy import stats
stats.multinomial(n=2, p=[.9, 0.05, 0.05]).rvs(size=1)

array([[2, 0, 0]])

In [10]:
import numpy as np
np.random.choice?

In [7]:
import pandas as pd
url = "https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-08-11/avatar.csv"
avatar = pd.read_csv(url) #avatar.isnull().sum() #avatar[avatar.isnull().sum(axis=1)>0]
avatar[:10]

Unnamed: 0,id,book,book_num,chapter,chapter_num,character,full_text,character_words,writer,director,imdb_rating
0,1,Water,1,The Boy in the Iceberg,1,Katara,Water. Earth. Fire. Air. My grandmother used t...,Water. Earth. Fire. Air. My grandmother used t...,"‎Michael Dante DiMartino, Bryan Konietzko, Aar...",Dave Filoni,8.1
1,2,Water,1,The Boy in the Iceberg,1,Scene Description,"As the title card fades, the scene opens onto ...",,"‎Michael Dante DiMartino, Bryan Konietzko, Aar...",Dave Filoni,8.1
2,3,Water,1,The Boy in the Iceberg,1,Sokka,It's not getting away from me this time. [Clos...,It's not getting away from me this time. Watc...,"‎Michael Dante DiMartino, Bryan Konietzko, Aar...",Dave Filoni,8.1
3,4,Water,1,The Boy in the Iceberg,1,Scene Description,"The shot pans quickly from the boy to Katara, ...",,"‎Michael Dante DiMartino, Bryan Konietzko, Aar...",Dave Filoni,8.1
4,5,Water,1,The Boy in the Iceberg,1,Katara,"[Happily surprised.] Sokka, look!","Sokka, look!","‎Michael Dante DiMartino, Bryan Konietzko, Aar...",Dave Filoni,8.1
5,6,Water,1,The Boy in the Iceberg,1,Sokka,"[Close-up of Sokka; whispering.] Sshh! Katara,...","Sshh! Katara, you're going to scare it away. ...","‎Michael Dante DiMartino, Bryan Konietzko, Aar...",Dave Filoni,8.1
6,7,Water,1,The Boy in the Iceberg,1,Scene Description,"Behind Sokka, Katara is still making circular ...",,"‎Michael Dante DiMartino, Bryan Konietzko, Aar...",Dave Filoni,8.1
7,8,Water,1,The Boy in the Iceberg,1,Katara,[Struggling with the water that passes right i...,"But, Sokka! I caught one!","‎Michael Dante DiMartino, Bryan Konietzko, Aar...",Dave Filoni,8.1
8,9,Water,1,The Boy in the Iceberg,1,Scene Description,The bubble containing her fish slowly drifts a...,,"‎Michael Dante DiMartino, Bryan Konietzko, Aar...",Dave Filoni,8.1
9,10,Water,1,The Boy in the Iceberg,1,Katara,[Exclaims indignantly.] Hey!,Hey!,"‎Michael Dante DiMartino, Bryan Konietzko, Aar...",Dave Filoni,8.1


In [9]:
print(((avatar['character'].str.upper() + ": " + avatar['full_text']+"\n\n").values[:4]).sum())

KATARA: Water. Earth. Fire. Air. My grandmother used to tell me stories about the old days: a time of peace when the Avatar kept balance between the Water Tribes, Earth Kingdom, Fire Nation and Air Nomads. But that all changed when the Fire Nation attacked. Only the Avatar mastered all four elements; only he could stop the ruthless firebenders. But when the world needed him most, he vanished. A hundred years have passed, and the Fire Nation is nearing victory in the war. Two years ago, my father and the men of my tribe journeyed to the Earth Kingdom to help fight against the Fire Nation, leaving me and my brother to look after our tribe. Some people believe that the Avatar was never reborn into the Air Nomads and that the cycle is broken, but I haven't lost hope. I still believe that, somehow, the Avatar will return to save the world.

SCENE DESCRIPTION: As the title card fades, the scene opens onto a shot of an icy sea before panning slowly to the left, revealing more towering iceberg

In [10]:
avatar.dtypes

id                   int64
book                object
book_num             int64
chapter             object
chapter_num          int64
character           object
full_text           object
character_words     object
writer              object
director            object
imdb_rating        float64
dtype: object

In [11]:
avatar.character.value_counts()#[:10]

character
Scene Description    3393
Aang                 1796
Sokka                1639
Katara               1437
Zuko                  776
                     ... 
The Hippo               1
Audience                1
Young Mai               1
Old woman               1
Katara and Sokka        1
Name: count, Length: 374, dtype: int64

In [None]:
avatar.chapter.value_counts()#[:10]

In [14]:
#words = ("\n"+avatar.dropna().character.str.upper()+": "+avatar.dropna().character_words+" ").sum().split(' ')
#words = ("\n"+avatar.dropna().character.str.upper()+": "+avatar.dropna().character_words+" ").sum().split(' ')
words = ("\n"+avatar.character.str.upper().replace(' ','.')+": "+avatar.full_text+" ").sum().split(' ')

In [15]:
words[:10]

['\nKATARA:',
 'Water.',
 'Earth.',
 'Fire.',
 'Air.',
 'My',
 'grandmother',
 'used',
 'to',
 'tell']

In [16]:
#from collections import defaultdict
word_used = dict()#defaultdict(int)
next_word = dict()#defaultdict(lambda: defaultdict(int))
for i,word in enumerate(words[:-1]):
    
    if word in word_used:
        word_used[word] += 1
    else: 
        word_used[word] = 1
        next_word[word] = {}
        
    if words[i+1] in next_word[word]:
        next_word[word][words[i+1]] += 1 
    else:

        next_word[word][words[i+1]] = 1

In [17]:
words[:10]

['\nKATARA:',
 'Water.',
 'Earth.',
 'Fire.',
 'Air.',
 'My',
 'grandmother',
 'used',
 'to',
 'tell']

In [20]:
word_used

{'\nKATARA:': 1437,
 'Water.': 3,
 'Earth.': 5,
 'Fire.': 6,
 'Air.': 3,
 'My': 143,
 'grandmother': 5,
 'used': 63,
 'to': 12764,
 'tell': 129,
 'me': 469,
 'stories': 13,
 'about': 439,
 'the': 18112,
 'old': 120,
 'days:': 1,
 'a': 7911,
 'time': 194,
 'of': 7711,
 'peace': 9,
 'when': 325,
 'Avatar': 410,
 'kept': 5,
 'balance': 25,
 'between': 111,
 'Water': 139,
 'Tribes,': 1,
 'Earth': 193,
 'Kingdom,': 7,
 'Fire': 753,
 'Nation': 368,
 'and': 8400,
 'Air': 70,
 'Nomads.': 4,
 'But': 334,
 'that': 1168,
 'all': 592,
 'changed': 18,
 'attacked.': 4,
 'Only': 26,
 'mastered': 11,
 'four': 94,
 'elements;': 1,
 'only': 244,
 'he': 1572,
 'could': 226,
 'stop': 140,
 'ruthless': 3,
 'firebenders.': 10,
 'world': 63,
 'needed': 15,
 'him': 976,
 'most,': 2,
 'vanished.': 3,
 'A': 399,
 'hundred': 60,
 'years': 78,
 'have': 802,
 'passed,': 4,
 'is': 2821,
 'nearing': 4,
 'victory': 6,
 'in': 3892,
 'war.': 20,
 'Two': 49,
 'ago,': 14,
 'my': 737,
 'father': 61,
 'men': 53,
 'tribe': 

In [18]:
next_word['used']

{'to': 40,
 'it': 5,
 'as': 2,
 'that': 1,
 'bending': 1,
 'in': 1,
 'for': 3,
 'this': 1,
 'the': 1,
 'by': 1,
 'its': 1,
 'earthbending': 1,
 'my': 1,
 'up': 1,
 'to.': 1,
 'firebending': 1,
 'fear': 1}

In [19]:
import numpy as np
from scipy import stats

In [22]:
current_word = "\nKatara:".upper()
print(current_word, end=' ')
for i in range(100):
    probability_of_next_word = np.array(list(next_word[current_word].values()))/word_used[current_word]
    randomly_chosen_next_word = stats.multinomial(p=probability_of_next_word, n=1).rvs(size=1)[0,:]
    current_word = np.array(list(next_word[current_word].keys()))[1==randomly_chosen_next_word][0]
    print(current_word, end=' ')


KATARA: [Turns back.] My name him, forming a bush where everything's great here. [Katara leaves his arms.] I can see the ring, addressing Jee are driven into the vines, as do some fur to a full-blown traitor when a podium with Zuko are you help you were? 
AZULA: Actually, I'm such I am. 
HAKODA: Aren't you then? 
SCENE DESCRIPTION: Cut to lick the desk at it with them as he uses airbending skills that tornado of Sokka, startling a gentle spirit, [Frontal shot of Aang as Aang and Aang walks away with his chest, while Actress Aang is out of the 

In [28]:
import re
avatar.full_text = avatar.full_text.apply(lambda string: re.sub(r'\[.*?\]', lambda match: match.group(0).replace(' ', '_ '), string))
avatar.loc[avatar.character=='Scene Description','full_text'] = avatar.full_text[avatar.character=='Scene Description'].str.replace(' ', '- ')
words = ("\n"+avatar.character.str.upper().str.replace(' ','.')+": "+avatar.full_text+" ").sum().split(' ')

In [24]:
avatar[:10]

Unnamed: 0,id,book,book_num,chapter,chapter_num,character,full_text,character_words,writer,director,imdb_rating
0,1,Water,1,The Boy in the Iceberg,1,Katara,Water. Earth. Fire. Air. My grandmother used t...,Water. Earth. Fire. Air. My grandmother used t...,"‎Michael Dante DiMartino, Bryan Konietzko, Aar...",Dave Filoni,8.1
1,2,Water,1,The Boy in the Iceberg,1,Scene Description,"As- the- title- card- fades,- the- scene- open...",,"‎Michael Dante DiMartino, Bryan Konietzko, Aar...",Dave Filoni,8.1
2,3,Water,1,The Boy in the Iceberg,1,Sokka,It's not getting away from me this time. [Clos...,It's not getting away from me this time. Watc...,"‎Michael Dante DiMartino, Bryan Konietzko, Aar...",Dave Filoni,8.1
3,4,Water,1,The Boy in the Iceberg,1,Scene Description,The- shot- pans- quickly- from- the- boy- to- ...,,"‎Michael Dante DiMartino, Bryan Konietzko, Aar...",Dave Filoni,8.1
4,5,Water,1,The Boy in the Iceberg,1,Katara,"[Happily_ surprised.] Sokka, look!","Sokka, look!","‎Michael Dante DiMartino, Bryan Konietzko, Aar...",Dave Filoni,8.1
5,6,Water,1,The Boy in the Iceberg,1,Sokka,[Close-up_ of_ Sokka;_ whispering.] Sshh! Kata...,"Sshh! Katara, you're going to scare it away. ...","‎Michael Dante DiMartino, Bryan Konietzko, Aar...",Dave Filoni,8.1
6,7,Water,1,The Boy in the Iceberg,1,Scene Description,"Behind- Sokka,- Katara- is- still- making- cir...",,"‎Michael Dante DiMartino, Bryan Konietzko, Aar...",Dave Filoni,8.1
7,8,Water,1,The Boy in the Iceberg,1,Katara,[Struggling_ with_ the_ water_ that_ passes_ r...,"But, Sokka! I caught one!","‎Michael Dante DiMartino, Bryan Konietzko, Aar...",Dave Filoni,8.1
8,9,Water,1,The Boy in the Iceberg,1,Scene Description,The- bubble- containing- her- fish- slowly- dr...,,"‎Michael Dante DiMartino, Bryan Konietzko, Aar...",Dave Filoni,8.1
9,10,Water,1,The Boy in the Iceberg,1,Katara,[Exclaims_ indignantly.] Hey!,Hey!,"‎Michael Dante DiMartino, Bryan Konietzko, Aar...",Dave Filoni,8.1


In [25]:
from collections import defaultdict
word_used2 = defaultdict(int)
next_word2 = defaultdict(lambda: defaultdict(int))
for i,word in enumerate(words[:-2]):
    word_used2[word+' '+words[i+1]] += 1
    next_word2[word+' '+words[i+1]][words[i+2]] += 1 

In [26]:
next_word2

defaultdict(<function __main__.<lambda>()>,
            {'\nKATARA: Water.': defaultdict(int, {'Earth.': 2}),
             'Water. Earth.': defaultdict(int, {'Fire.': 2}),
             'Earth. Fire.': defaultdict(int, {'Air.': 2}),
             'Fire. Air.': defaultdict(int, {'My': 1, 'Long': 1}),
             'Air. My': defaultdict(int, {'grandmother': 1}),
             'My grandmother': defaultdict(int, {'used': 1, 'gave': 1}),
             'grandmother used': defaultdict(int, {'to': 1}),
             'used to': defaultdict(int,
                         {'tell': 2,
                          'be': 7,
                          'always': 2,
                          'kind': 1,
                          'hang': 1,
                          'have': 1,
                          'come': 4,
                          'calling': 1,
                          'show': 1,
                          'be.': 2,
                          'each': 1,
                          'say': 2,
                  

In [29]:
current_word_1 = "\nKatara:".upper()
current_word_2 = "Water."
print(current_word_1, end=' ')
print(current_word_2, end=' ')
for i in range(100):
    probability_of_next_word = np.array(list(next_word2[current_word_1+' '+current_word_2].values()))/word_used2[current_word_1+' '+current_word_2]
    randomly_chosen_next_word = stats.multinomial(p=probability_of_next_word, n=1).rvs(size=1)[0,:]
    current_word_1,current_word_2 = current_word_2,np.array(list(next_word2[current_word_1+' '+current_word_2].keys()))[1==randomly_chosen_next_word][0]
    print(current_word_2.replace('_', '').replace('-', ''), end=' ')


KATARA: Water. Earth. Fire. Air. Long ago, the four elements talk is sounding like Avatar stuff. 
IROH: It is an ancient city down there. But it's deep. 
SCENE.DESCRIPTION: She shoots some of Jet's people are sleeping on Appa's back. 
KATARA: You have no idea how to deal with things like canyon crawlers. 
SOKKA: [Taking out his boomerang, turning backtoback with his and laying his other hand supportively behind her back.] Yeah, okay Gran. [He smiles weakly at Katara.] I think about what's really important. I realized– 
AANG: [Interrupting Sokka.] Hey Katara, check this out. 
SCENE.DESCRIPTION: The episode opens to a view of 

In [30]:
word_used3 = defaultdict(int)
next_word3 = defaultdict(lambda: defaultdict(int))
for i,word in enumerate(words[:-3]):
    word_used3[word+' '+words[i+1]+' '+words[i+2]] += 1
    next_word3[word+' '+words[i+1]+' '+words[i+2]][words[i+3]] += 1 

In [31]:
current_word_1 = "\nKatara:".upper()
current_word_2 = "Water."
current_word_3 = "Earth."
print(current_word_1, end=' ')
print(current_word_2, end=' ')
print(current_word_3, end=' ')
for i in range(100):
    probability_of_next_word = np.array(list(next_word3[current_word_1+' '+current_word_2+' '+current_word_3].values()))/word_used3[current_word_1+' '+current_word_2+' '+current_word_3]
    randomly_chosen_next_word = stats.multinomial(p=probability_of_next_word, n=1).rvs(size=1)[0,:]
    current_word_1,current_word_2,current_word_3 = current_word_2,current_word_3,np.array(list(next_word3[current_word_1+' '+current_word_2+' '+current_word_3].keys()))[1==randomly_chosen_next_word][0]
    print(current_word_3.replace('_', '').replace('-', ''), end=' ')


KATARA: Water. Earth. Fire. Air. My grandmother used to tell me what to do! 
MOMO: Oh, you don't like Aang, but we owe him and  
SOKKA: [Cutting her off.] Katara! [Slightly annoyed.] Are you gonna take that? 
MAI: [Aerial view of Sokka over the shoulder of Piandao.] You showed something beyond that. [Unsheathes the sword, showing its black blade. Cut to sideview as Momo runs off and the camera flashes to Aang sitting comfortably on the animal's head.] 
AANG: [To Appa.] We're home, buddy! We're home. [His eyes squint a bit in happiness.] 
SCENE.DESCRIPTION: The scene changes to nightfall at Fong's base. 

In [32]:
from collections import Counter, defaultdict
characters = Counter("\n"+avatar.character.str.upper().str.replace(' ','.')+":")

nested_dict = lambda: defaultdict(nested_dict)
word_used2C = nested_dict()
next_word2C = nested_dict()

for i,word in enumerate(words[:-2]):
    
    if word in characters:
        character = word
        
    if character not in word_used2C:
        word_used2C[character] = dict()
    if word+' '+words[i+1] not in word_used2C[character]:
        word_used2C[character][word+' '+words[i+1]] = 0
    word_used2C[character][word+' '+words[i+1]] += 1

    if character not in next_word2C:
        next_word2C[character] = dict()
    if word+' '+words[i+1] not in next_word2C[character]:
        next_word2C[character][word+' '+words[i+1]] = dict()
    if words[i+2] not in next_word2C[character][word+' '+words[i+1]]:
        next_word2C[character][word+' '+words[i+1]][words[i+2]] = 0
    next_word2C[character][word+' '+words[i+1]][words[i+2]] += 1
        
        

In [33]:
current_word_1 = "\nKatara:".upper()
current_word_2 = "Water."
print(current_word_1, end=' ')
print(current_word_2, end=' ')
for i in range(100):
    if current_word_1 in characters:
        character = current_word_1

    probability_of_next_word = np.array(list(next_word2C[character][current_word_1+' '+current_word_2].values()))/word_used2C[character][current_word_1+' '+current_word_2]
    randomly_chosen_next_word = stats.multinomial(p=probability_of_next_word, n=1).rvs(size=1)[0,:]
    current_word_1,current_word_2 = current_word_2,np.array(list(next_word2C[character][current_word_1+' '+current_word_2].keys()))[1==randomly_chosen_next_word][0]
    print(current_word_2.replace('_', '').replace('-', ''), end=' ')


KATARA: Water. Earth. Fire. Air. Long ago, the four nations lived together in harmony. Then, everything changed when the world needs you now. You give people hope. 
SCENE.DESCRIPTION: Scene fades to another table, which is the second; outside the trench as villagers continue chanting while the rest of the top of the broken window at the ship. Iroh enters Zuko's room. 
URSA: Your father would never do what to you? What is wrong with that child? 
SCENE.DESCRIPTION: The spirit releases a burst of energy at him. Aang gets on its branches. 
SUKI: [Peeks her head around the door to look at the 