In [7]:
# Please note that you need python 3.9.1 to run this code
# Ignore this cell as this is only a part of the training corpus taken to train the model.

corpus = """At the first God made the heaven and the earth.
And the earth was waste and without form; and it was dark on the face of the deep: and the Spirit of God was moving on the face of the waters.
And God said, Let there be light: and there was light.
And God, looking on the light, saw that it was good: and God made a division between the light and the dark,
Naming the light, Day, and the dark, Night. And there was evening and there was morning, the first day.
And God said, Let there be a solid arch stretching over the waters, parting the waters from the waters.
And God made the arch for a division between the waters which were under the arch and those which were over it: and it was so.
And God gave the arch the name of Heaven. And there was evening and there was morning, the second day.
And God said, Let the waters under the heaven come together in one place, and let the dry land be seen: and it was so.
And God gave the dry land the name of Earth; and the waters together in their place were named Seas: and God saw that it was good.
And God said, Let grass come up on the earth, and plants producing seed, and fruit-trees giving fruit, in which is their seed, after their sort: and it was so.
And grass came up on the earth, and every plant producing seed of its sort, and every tree producing fruit, in which is its seed, of its sort: and God saw that it was good.
And there was evening and there was morning, the third day.
And God said, Let there be lights in the arch of heaven, for a division between the day and the night, and let them be for signs, and for marking the changes of the year, and for days and for years:
And let them be for lights in the arch of heaven to give light on the earth: and it was so.
And God made the two great lights: the greater light to be the ruler of the day, and the smaller light to be the ruler of the night: and he made the stars.
And God put them in the arch of heaven, to give light on the earth;
To have rule over the day and the night, and for a division between the light and the dark: and God saw that it was good.
And there was evening and there was morning, the fourth day.
And God said, Let the waters be full of living things, and let birds be in flight over the earth under the arch of heaven.
And God made great sea-beasts, and every sort of living and moving thing with which the waters were full, and every sort of winged bird: and God saw that it was good.
And God gave them his blessing, saying, Be fertile and have increase, making all the waters of the seas full, and let the birds be increased in the earth.
And there was evening and there was morning, the fifth day.
And God said, Let the earth give birth to all sorts of living things, cattle and all things moving on the earth, and beasts of the earth after their sort: and it was so.
And God made the beast of the earth after its sort, and the cattle after their sort, and everything moving on the face of the earth after its sort: and God saw that it was good.
And God said, Let us make man in our image, like us: and let him have rule over the fish of the sea and over the birds of the air and over the cattle and over all the earth and over every living thing which goes flat on the earth.
And God made man in his image, in the image of God he made him: male and female he made them.
And God gave them his blessing and said to them, Be fertile and have increase, and make the earth full and be masters of it; be rulers over the fish of the sea and over the birds of the air and over every living thing moving on the earth.
And God said, See, I have given you every plant producing seed, on the face of all the earth, and every tree which has fruit producing seed: they will be for your food:"""

In [8]:
import sys
import random
from typing import Iterator
from collections import defaultdict, Counter


def bigrams(l: list[str]) -> Iterator[list[str]]:
    """
    Generate bigrams of the input corpus
    """
    if not isinstance(l, list):
        raise TypeError(f"expected list found {type(l)}")
    # [["i", "am"], ["am", "a"]...]
    for i in range(len(l) - 1):
        yield list(l[i : i + 2])

In [9]:
class ConditionalProbDist:
    def __init__(self, bigrams: Iterator[list[str]]):
        self.bigrams = list(bigrams)
        
        # conditions
        self.k = set()
        # values
        self.v = set()

        # initialize probability dictionary
        # d -> dict[str, dict[str, int]]
        self.d = defaultdict(Counter)
        # marginal probability
        c = defaultdict(int)
        
        # i am a student am player am student am a
        # k = am, v = a
        for k, v in self.bigrams:
            # {"i": {"am": 1}, am: {"a": 2, "student": 1, "player": 1}}
            self.d[k].update([v])
            # i: 1, am: 3
            c[k] += 1

            # edge case to update the key and value 
            # for the case when input is not present
            self.v.update({v})
            self.k.update({k})

        # p(y | x)
        # p(a | am) = 2 / 4 = 0.5
        for i in self.d:
            for j in self.d[i]:
                self.d[i][j] /= c[i]
                
    def conditions(self) -> set[str]:
        """return all x in p(y | x)"""
        return self.k

    def values(self) -> set[str]:
        """return all y in p(y | x)"""
        return self.v

    def prob(self) -> defaultdict[str, Counter[str]]:
        """return all conditional probability"""
        return self.d

    def predict(self, n: int):
        c = "and"
        w = []
        for _ in range(n):
            try:
                # x = ('the', 0.4)
                x = random.choice(self.d[c].most_common(1))
                print(self.d[c].most_common(1))
                w.append(x[0])
                c = w[-1]
            except IndexError:
                continue

        return w

In [10]:
def main() -> None:
    # x = ["i", "am", "a"]
    x = ConditionalProbDist(bigrams(corpus.lower().split()))
#     print(x.prob())
    print(x.predict(5))
#     print(x.conditions())
#     print(x.values())

In [11]:
main()

[('god', 0.27472527472527475)]
[('said,', 0.32142857142857145)]
[('let', 0.8888888888888888)]
[('the', 0.35714285714285715)]
[('earth', 0.09090909090909091)]
['god', 'said,', 'let', 'the', 'earth']
