<a href="https://colab.research.google.com/github/manya7842/Statistical-Models/blob/main/bigram_model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import random
from collections import defaultdict
import nltk
nltk.download('punkt_tab')

[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt_tab.zip.


True

In [None]:
# Sample text corpus
text = "I love deep learning. I love machine learning. Machine learning is amazing."

# Tokenize the text into words
tokens = nltk.word_tokenize(text.lower())

tokens

['i',
 'love',
 'deep',
 'learning',
 '.',
 'i',
 'love',
 'machine',
 'learning',
 '.',
 'machine',
 'learning',
 'is',
 'amazing',
 '.']

In [None]:
bigram_counts = defaultdict(lambda: defaultdict(int))
bigram_counts

defaultdict(<function __main__.<lambda>()>, {})

In [None]:
for w1, w2 in zip(tokens[:-1], tokens[1:]):
    bigram_counts[w1][w2] += 1

# Let's check what we built
bigram_counts

defaultdict(<function __main__.<lambda>()>,
            {'i': defaultdict(int, {'love': 3}),
             'love': defaultdict(int,
                         {'programming': 1, 'deep': 1, 'machine': 1}),
             'programming': defaultdict(int, {'in': 1}),
             'in': defaultdict(int, {'python': 1}),
             'python': defaultdict(int, {'.': 1, 'is': 1}),
             '.': defaultdict(int, {'python': 1, 'i': 1, 'machine': 1}),
             'is': defaultdict(int, {'great': 1, 'amazing': 1}),
             'great': defaultdict(int, {'!': 1}),
             'deep': defaultdict(int, {'learning': 1}),
             'learning': defaultdict(int, {'.': 2, 'is': 1}),
             'machine': defaultdict(int, {'learning': 2}),
             'amazing': defaultdict(int, {'.': 1})})

In [None]:
# Convert counts to probabilities (MLE estimation)
bigram_probs = {w1: {w2: count / sum(next_words.values())
                      for w2, count in next_words.items()}
                for w1, next_words in bigram_counts.items()}

bigram_probs

{'i': {'love': 1.0},
 'love': {'programming': 0.3333333333333333,
  'deep': 0.3333333333333333,
  'machine': 0.3333333333333333},
 'programming': {'in': 1.0},
 'in': {'python': 1.0},
 'python': {'.': 0.5, 'is': 0.5},
 '.': {'python': 0.3333333333333333,
  'i': 0.3333333333333333,
  'machine': 0.3333333333333333},
 'is': {'great': 0.5, 'amazing': 0.5},
 'great': {'!': 1.0},
 'deep': {'learning': 1.0},
 'learning': {'.': 0.6666666666666666, 'is': 0.3333333333333333},
 'machine': {'learning': 1.0},
 'amazing': {'.': 1.0}}

In [None]:
# Helper function to generate text using the bigram model
def generate_text(start_word, num_words):
    sentence = [start_word]
    for i in range(num_words - 1):
        next_word_options = bigram_probs.get(sentence[-1], None)
       # print("next_word_options ", next_word_options)
        if not next_word_options:  # If no known next word, stop generation
            break
        next_word = random.choices(list(next_word_options.keys()),
                                   weights=next_word_options.values())
       # print("next_word ", i," ", next_word,"\n")
        sentence.append(next_word[0])
    return " ".join(sentence)

In [None]:
# Example: Generate a sentence starting with "i"
print(generate_text("love", 12))

next_word_options  {'programming': 0.3333333333333333, 'deep': 0.3333333333333333, 'machine': 0.3333333333333333}
next_word_options  {'learning': 1.0}
next_word_options  {'.': 0.6666666666666666, 'is': 0.3333333333333333}
next_word_options  {'python': 0.3333333333333333, 'i': 0.3333333333333333, 'machine': 0.3333333333333333}
next_word_options  {'learning': 1.0}
next_word_options  {'.': 0.6666666666666666, 'is': 0.3333333333333333}
next_word_options  {'python': 0.3333333333333333, 'i': 0.3333333333333333, 'machine': 0.3333333333333333}
next_word_options  {'.': 0.5, 'is': 0.5}
next_word_options  {'python': 0.3333333333333333, 'i': 0.3333333333333333, 'machine': 0.3333333333333333}
next_word_options  {'learning': 1.0}
next_word_options  {'.': 0.6666666666666666, 'is': 0.3333333333333333}
love machine learning . machine learning . python . machine learning .


Training on a book excerpt

In [None]:
text= '''CAPE CANAVERAL, Fla. (AP) — Stuck in space no more, NASA astronauts Butch Wilmore and Suni Williams returned to Earth on Tuesday, hitching a different ride home to close out a saga that began with a bungled test flight more than nine months ago.

Their SpaceX capsule parachuted into the Gulf of Mexico in the early evening, just hours after departing the International Space Station. Splashdown occurred off the coast of Tallahassee in the Florida Panhandle, bringing their unplanned odyssey to an end.

Within an hour, the astronauts were out of their capsule, waving and smiling at the cameras while being hustled away in reclining stretchers for routine medical checks

It all started with a flawed Boeing test flight last spring.

The two expected to be gone just a week or so after launching on Boeing’s new Starliner crew capsule on June 5. So many problems cropped up on the way to the space station that NASA eventually sent Starliner back empty and transferred the test pilots to SpaceX, pushing their homecoming into February. Then SpaceX capsule issues added another month’s delay.

Sunday’s arrival of their relief crew meant Wilmore and Williams could finally leave. NASA cut them loose a little early, given the iffy weather forecast later this week. They checked out with NASA’s Nick Hague and Russia’s Alexander Gorbunov, who arrived in their own SpaceX capsule last fall with two empty seats reserved for the Starliner duo.

Wilmore and Williams ended up spending 286 days in space — 278 days longer than anticipated when they launched. They circled Earth 4,576 times and traveled 121 million miles (195 million kilometers) by the time of splashdown.

“On behalf of SpaceX, welcome home,” radioed SpaceX Mission Control in California.

“What a ride,” replied Hague, the capsule’s commander. “I see a capsule full of grins ear to ear.”

Dolphins circled the capsule as divers readied it for hoisting onto the recovery ship. Once safely on board, the side hatch was opened and the astronauts were helped out, one by one. Williams was next-to-last out, followed by Wilmore who gave two gloved thumbs-up.

Wilmore and Williams’ plight captured the world’s attention, giving new meaning to the phrase “stuck at work” and turning “Butch and Suni” into household names. While other astronauts had logged longer spaceflights over the decades, none had to deal with so much uncertainty or see the length of their mission expand by so much.

Wilmore and Williams quickly transitioned from guests to full-fledged station crew members, conducting experiments, fixing equipment and even spacewalking together. With 62 hours over nine spacewalks, Williams set a record: the most time spent spacewalking over a career among female astronauts.

Both had lived on the orbiting lab before and knew the ropes, and brushed up on their station training before rocketing away. Williams became the station’s commander three months into their stay and held the post until earlier this month.

Their mission took an unexpected twist in late January when President Donald Trump asked SpaceX founder Elon Musk to accelerate the astronauts’ return and blamed the delay on the Biden administration. The replacement crew’s brand new SpaceX capsule still wasn’t ready to fly, so SpaceX subbed it with a used one, hurrying things along by at least a few weeks.

After splashdown, Musk offered his congratulations via X. NASA’s Joel Montalbano said the space agency was already looking at various options when Trump made his call to hurry the astronauts home.

Even in the middle of the political storm, Wilmore and Williams continued to maintain an even keel at public appearances from orbit, casting no blame and insisting they supported NASA’s decisions from the start.

NASA hired SpaceX and Boeing after the shuttle program ended, in order to have two competing U.S. companies for transporting astronauts to and from the space station until it’s abandoned in 2030 and steered to a fiery reentry. By then, it will have been up there more than three decades; the plan is to replace it with privately run stations so NASA can focus on moon and Mars expeditions.

“This has been nine months in the making, and I couldn’t be prouder of our team’s versatility, our team’s ability to adapt and really build for the future of human spaceflight,” NASA’s commercial crew program manager Steve Stich said.

With Starliner still under engineering investigation, SpaceX will launch the next crew for NASA as soon as July. Stich said NASA will have until summer to decide whether the crew after that one will be flown by SpaceX or Boeing — or whether Boeing will have to prove itself by flying cargo before people again.

Both retired Navy captains, Wilmore and Williams stressed they didn’t mind spending more time in space — a prolonged deployment reminiscent of their military days. But they acknowledged it was tough on their families.

Wilmore, 62, missed most of his younger daughter’s senior year of high school; his older daughter is in college. Williams, 59, had to settle for internet calls from space to her husband, mother and other relatives.

“We have not been worried about her because she has been in good spirits,” said Falguni Pandya, who is married to Williams’ cousin. “She was definitely ready to come home.”

Prayers for Williams and Wilmore were offered up at 21 Hindu temples in the U.S. in the months leading up to their return, said organizer Tejal Shah, president of World Hindu Council of America. Williams has spoken frequently about her Indian and Slovenian heritage. Prayers for their safe return also came from Wilmore’s Baptist church in Houston, where he serves as an elder.

Crowds in Jhulasan, the ancestral home of Williams’ father, danced and celebrated in a temple and performed rituals during the homecoming.

After returning in the gulf — Trump in January signed an executive order renaming the body of water Gulf of America — Wilmore and Williams will have to wait until they’re off the SpaceX recovery ship and flown to Houston before reuniting with their loved ones. The three NASA astronauts will be checked out by flight surgeons as they adjust to gravity, officials said, and should be allowed to go home after a day or two.


'''

tokens = nltk.word_tokenize(text.lower())

In [None]:
bigram_counts = defaultdict(lambda: defaultdict(int))
bigram_counts

defaultdict(<function __main__.<lambda>()>, {})

In [None]:
for w1, w2 in zip(tokens[:-1], tokens[1:]):
  bigram_counts[w1][w2] += 1

bigram_probs= {w1: { w2: count/sum(next_words.values())
                      for w2, count in next_words.items()}
                for w1, next_words in bigram_counts.items()}

def generate_text(start_word, num_of_words):
  sentence= [start_word]
  for _ in range(num_of_words-1):
    next_word_options= bigram_probs.get(sentence[-1], None)
    if not next_word_options:
      break;
    next_word= random.choices(list(next_word_options.keys()), weights= next_word_options.values())[0]
    sentence.append(next_word)
  return " ".join(sentence)

In [None]:
print(generate_text("nasa", 30))

nasa ’ s ability to maintain an executive order renaming the ancestral home , wilmore and williams was already looking at least a ride home . wilmore and williams became
