In [None]:
# Always run this code.
%config InteractiveShell.ast_node_interactivity="none"
import sys
if 'google.colab' in sys.modules:
  !pip install --force-reinstall git+https://github.com/jamcoders/jamcoders-public-2025.git --quiet
from jamcoders.base_utils import *
from jamcoders.week4.labw4d2b import *
import random

# Week 4, Day 2B: Bigram Graph Traversals

In today's lab, we will use a graph that represents a bigram model, where each node is a word, and each edge shows the probability of going from one word to the next.

In an undirected graph, an adjacency list might look like this:

```python
adj_list = [
    [2, 3],    # Node 0 connects to nodes 2 and 3
    [3, 4],    # Node 1 connects to nodes 3 and 4
    [4],       # Node 2 connects to node 4
    [0, 5],    # Node 3 connects to nodes 0 and 5
    []         # Node 4 connects to no one (dead end)
]
```

In a bigram model, we need to know more than just where you can go. We also need to know the probability of going there from one word to another word. So instead of storing a list of neighbors, we store a dictionary which maps current word to a dictionary containing {next_word: probability:

```python
weighted_adj_list = {
    "<START>": {"I": 1},  # The sentence always starts with "I"
    "I": {"am": 0.5, "like": 0.5},  # After "I", "am" and "like" are equally likely
    "am": {"happy": 0.7, "sad": 0.2, "bananas": 0.1},  # After "am", "happy" is most likely, then "sad", then "bananas"
    "like": {"bananas": 0.6, "math": 0.4}  # After "like", "bananas" is more likely than "math"
}
```

This allows us to construct a _weighted directed graph_.

In [None]:
weighted_adj_list = {
    "<START>": {"I": 1.0},
    "I": {"am": 0.5, "like": 0.5},
    "am": {"happy": 0.7, "sad": 0.2, "bananas": 0.1},
    "like": {"bananas": 0.6, "math": 0.4}
}

G = generate_graph(weighted_adj_list)
plot_graph(G)

**1.1**

To start, consider the following questions about the above graph.

Assign `num_nodes` to the number of nodes in the graph.

In [None]:
num_nodes = ...

Assign `num_edges` to the number of edges in the graph.

In [None]:
num_edges = ...

Assign `longest_path_length` to the number of edges encountered on the longest path in the graph.

In [None]:
longest_path_length = ...

Is there a cycle in this graph? If so, assign `cycle_exists` to `True`. Otherwise, assign it to False.

In [None]:
cycle_exists = ...

In [None]:
check_answer_1_1([num_nodes, num_edges, longest_path_length, cycle_exists])

**1.2**

Write a function `random_next_word`, which takes as input a key of the dictionary `weighted_adj_list`, and randomly generates the next word. The probability of possible next words is located in the values of `weighted_adj_list`.

**HINT:** the line of code `sample_from_dict({"bananas" : 0.6, "math" : 0.4})` will randomly sample from the list `["bananas", "math"]`, and select `"bananas"` 60% of the time and select `"math"` 40% of the time.

In [None]:
def random_next_word(weighted_adj_list, current_word):
    """
    Randomly selects the next word from a weighted adjacency list.

    Args:
        weighted_adj_list (dict): A dictionary where each key is a word,
            and each value is another dictionary mapping possible next words
            to their probabilities.
        current_word (str): The current word to look up in the adjacency list.

    Returns:
        str: A randomly selected next word based on the given probabilities.
    """
    # YOUR CODE HERE

In [None]:
random.seed(21)
assert_equal(got=random_next_word(weighted_adj_list, "<START>"), want="I")
assert_equal(got=random_next_word(weighted_adj_list, "I"), want="like")
assert_equal(got=random_next_word(weighted_adj_list, "I"), want="like")
assert_equal(got=random_next_word(weighted_adj_list, "am"), want="happy")
assert_equal(got=random_next_word(weighted_adj_list, "am"), want="happy")
assert_equal(got=random_next_word(weighted_adj_list, "am"), want="sad")
assert_equal(got=random_next_word(weighted_adj_list, "like"), want="math")
assert_equal(got=random_next_word(weighted_adj_list, "like"), want="bananas")

**1.3**

Write a function called `get_random_sentence`. Starting at `<START>`, it should use `random_next_word` to randomly generate a 3 word sentence and return that sentence as a string. Your answer does not need to include punctuation.

In [None]:
def get_random_sentence(weighted_adj_list):
    """Generate a 3-word sentence starting from <START> using weighted random choices.
    
    Returns:
        A list of 3 words, e.g. ["I", "am", "happy"].
    """
    # YOUR CODE HERE

In [None]:
random.seed(21)
assert_equal(got=get_random_sentence(weighted_adj_list), want=["I", "like", "math"])
assert_equal(got=get_random_sentence(weighted_adj_list), want=["I", "am", "sad"])
assert_equal(got=get_random_sentence(weighted_adj_list), want=["I", "like", "bananas"])

In [None]:
# Here, we re-plot G to minimize scrolling
plot_graph(G)

**1.4**

In a bigram model, we assume that the probability of a word occuring depends solely on the previous word.  To calculate the probability of a specific sentence occuring, we multiply the probabilities (edge weights) of each word transition as we follow the words in a sentence.

The probability of observing the sentence `"I am happy"` is equal to `1 * 0.5 * 0.7`, because the probability of moving from `"<START>"` to `"I"` is `1`, the probability of moving from `"I"` to `"am"` is `0.5`, and the probability of moving from `"am"` to `"happy"` is `0.7`.

Using this idea, write a function `get_probability` that takes in the `weighted_adj_list` and a list of words called `sentence` and returns the probability of that sentence being generated given the bigram probabilities.

In [None]:
def get_probability(weighted_adj_list, sentence):
    """Return the probability of generating a given sentence based on bigram probabilities.

    Args:
        weighted_adj_list (dict): A dictionary mapping words to dictionaries of next-word probabilities.
        sentence (list): A list of words representing a sentence (e.g., ["I", "am", "happy"]).

    Returns:
        float: The product of bigram probabilities from <START> through the sentence.
    """
    # YOUR CODE HERE

In [None]:
assert_equal(got=get_probability(weighted_adj_list, ["I", "am", "happy"]), want=1 * 0.5 * 0.7)
assert_equal(got=get_probability(weighted_adj_list, ["I", "like", "bananas"]), want=1 * 0.5 * 0.6)
assert_equal(got=get_probability(weighted_adj_list, ["I", "am", "bananas"]), want=1 * 0.5 * 0.1)
assert_equal(got=get_probability(weighted_adj_list, ["I", "like", "math"]), want=1 * 0.5 * 0.4)