# Assignment 2
Instructor: [Ziyu Yao at George Mason University](https://ziyuyao.org/)
Class: CS478 Fall 2024

**Total points: 100 points**

## Overview
This assignment includes three parts:
- Part 1: Constructing a bigram language model (50 points)
- Part 2: Evaluating the bigram language model (30 points)
- Part 3: Sampling sentences from the bigram language model (20 points)

You can run this notebook locally on your laptop or computer. Otherwise, running it on Google Colab (https://colab.research.google.com/) is also feasible -- if you go with this option, select "Upload" and then this notebook to upload it to Colab. **Note that no matter how you will run this notebook, do not modify existing code/markdown cells in the initial template; you should only need to fill out the blanks as instructed.**

This assignment also comes with a LaTex code file. To collect points from this assignment, you must revise the LaTex code file and fill out the answer blanks as instructed.

## Submission Guideline
When you complete this notebook, save all the cell outputs and complete the required fields in the LaTex code file. Compile the PDF from the LaTex code file. Submit the PDF to Gradescope.
- You do NOT need to submit the LaTex source code.
- You do NOT need to submit this notebook either; however, be sure to attach a shared One Drive or Google Drive link to your notebook in the PDF.

## Google Drive Setup/Tentative Data Loading
Our assignment will be using a small sample of the Gutenberg corpus (https://www.gutenberg.org/) provided by the NLTK Python library. To make it easy, we have included the txt data ("nltk_gutenberg_austen_emma.txt") along with this assignment.

**If you are using Google Colab**, there are a few more steps for setting up the data access. **If you are running this notebook locally**, skip commands in this part and directly start from "Data Loading".

For Colab users, there are two ways how you can let your notebook access the provided txt file; you only need to choose one of them.

**(1) Mounting your Google Drive.** If you take this option, execute the following code and follow the pop-up instructions to give Colab the access. Please make sure to tick the option of allowing Colab to read/write files from/to your Google Drive.

In [122]:
# from google.colab import drive
# drive.mount('/content/drive')

Your Google Drive is now mounted. Go to your Drive folder through https://drive.google.com and create a folder for CS478. Then upload "nltk_gutenberg_austen_emma.txt" to this folder. Now your Colab should be able to access this txt file from "/content/drive/My Drive/CS478/nltk_gutenberg_austen_emma.txt".

**(2) Uploading tentative data copy to Colab.** Alternatively, especially if you have concerns about giving Colab access to your Drive, run the following command to allow for tentative data upload. You will see a "Choose File" button. Click the button and select "nltk_gutenberg_austen_emma.txt" from your local folder.

In [123]:
# from google.colab import files
# files.upload()

## Data Loading

With the data access set up well, we now start to load in and process the txt file. The following code will read in the corpus and split it into the training and the test set.

In [124]:
# select the proper data path
DATA_PATH = None
# DATA_PATH = "/content/drive/My Drive/CS478/nltk_gutenberg_austen_emma.txt" # UNCOMMENT if mounting Google Drive
DATA_PATH = "nltk_gutenberg_austen_emma.txt" # UNCOMMENT if running the notebook locally or doing tentative data upload to cloud

# load data
sents_all = []
with open("nltk_gutenberg_austen_emma.txt", "r") as f:
    for line in f.readlines():
        sents_all.append(line.strip().split())

total_num = len(sents_all)

sents_train = sents_all[:int(total_num*0.8)]
sents_test = sents_all[len(sents_train):]
print("Number of sentences in the training and the test corpus: %d and %d, respectively" % (
    len(sents_train), len(sents_test)))

Number of sentences in the training and the test corpus: 6173 and 1544, respectively


In this assignment, you will be instructed to build a bigram language model (LM) based on the Gutenberg training set, and then evaluate it on the test set.

**Before You Start:** A hint on debugging your LM is to use toy data which you can manually calculate the LM probabilities and verify the result. Example toy `sents_train` and `sents_test` corpora are shown in the following code block. Uncomment the code block when you want to debug your model.

In [125]:
# alternatively, load the toy data
# sents_train = [
#    ["A", "B", "B", "C"],
#    ["A", "D", "C"],
#    ["E", "A"]
# ]
# sents_test = sents_train

In [126]:
from typing import List, Any
from math import isclose

In [127]:
print("The first 2 sentences from `sents_train`: ")
for idx in range(2):
    print(sents_train[idx]) # each sentence is a list of words

The first 2 sentences from `sents_train`: 
['A', 'crowd', 'in', 'a', 'little', 'room', '--', 'Miss', 'Woodhouse', ',', 'you', 'have', 'the', 'art', 'of', 'giving', 'pictures', 'in', 'a', 'few', 'words', '.']
['Mr', '.', 'Knightley', "'", 's', 'eyes', 'had', 'preceded', 'Miss', 'Bates', "'", 's', 'in', 'a', 'glance', 'at', 'Jane', '.']


### **NOTE: In this assignment, each sentence has been tokenized, and NO MORE preprocessing is required.**

## Part 1: Language Model Construction (50 points)

**Overview:** In this part, you will construct a bigram language model (LM) based on the `sents_train` corpus. A bigram LM models the probability of the next word given the current word, i.e., $p(w_t | w_{t-1})$. The construction of a bigram LM could be achieved by counting up the word or word-pair frequency:
$$p(w_t|w_{t-1}) = \frac{count(w_{t-1}, w_t)}{count(w_{t-1})}.$$

To deal with zero counts, *add-one smoothing* is commonly used:
$$p(w_t|w_{t-1}) = \frac{count(w_{t-1}, w_t) + 1}{count(w_{t-1}) + |V|},$$
where $|V|$ is the size of the vocabulary.
**Your implementation of the LM should be based on the smoothing version.**

---
**Step 1 -- Construct a vocabulary (20 points):** To get started, you will first need to construct a vocabulary based on the `sents_train` corpus. Also, don't forget the special start-of-sentence (`<s>`) and end-of-sentence (`</s>`) tokens -- your LM should eventually be able to model `p(w|<s>)` (i.e., how to start a sentence) and `p(</s>|w)` (i.e., when to stop a sentence).

First, create a counter for word types in the `sents_train` corpus **(5 points)**.

<font color='blue'>PLEASE INPUT YOUR ANSWER BELOW</font>

In [128]:
# TODO: create a counter of word type based on sents_train

def add_word_count_to_count_dict(word, count_dict, count=None):
    if count_dict is None or word is None:
        return
    if count is not None:
        count_dict[word] = count
    elif word not in count_dict:
        count_dict[word] = 1
    else:
        count_dict[word] += 1

def count_word_types(sents):
    if sents is None:
        return None
    
    #build all counts
    counts = {}
    for sent in sents:
        add_word_count_to_count_dict("<s>", counts)
        for word in sent:
            add_word_count_to_count_dict(word, counts)
        add_word_count_to_count_dict("</s>", counts)
    
    #replace single counts with UNK
    finalCounts = {}
    for word in counts:
        if counts[word] == 1:
            add_word_count_to_count_dict("UNK", finalCounts)
        else:
            finalCounts[word] = counts[word]
    return finalCounts

count_word_types(sents_train)




{'<s>': 6173,
 'A': 92,
 'crowd': 7,
 'in': 1680,
 'a': 2392,
 'little': 284,
 'room': 96,
 '--': 1149,
 'Miss': 483,
 'Woodhouse': 250,
 ',': 9129,
 'you': 1337,
 'have': 1037,
 'the': 3805,
 'art': 2,
 'of': 3386,
 'giving': 37,
 'UNK': 2931,
 'few': 85,
 'words': 41,
 '.': 5513,
 '</s>': 6173,
 'Mr': 921,
 'Knightley': 304,
 "'": 827,
 's': 764,
 'eyes': 40,
 'had': 1307,
 'Bates': 114,
 'glance': 9,
 'at': 793,
 'Jane': 236,
 '"': 1616,
 'Well': 68,
 'if': 298,
 'please': 31,
 ',"': 325,
 'said': 375,
 'Mrs': 538,
 'Weston': 340,
 'rather': 115,
 'think': 297,
 'she': 1417,
 'will': 431,
 'be': 1576,
 'any': 509,
 'use': 29,
 '."': 910,
 'When': 38,
 'I': 2518,
 'talked': 51,
 'your': 284,
 'being': 269,
 'altered': 3,
 'by': 458,
 'time': 213,
 'progress': 9,
 'years': 44,
 'John': 72,
 'meant': 35,
 'to': 4115,
 'imply': 2,
 'change': 46,
 'situation': 52,
 'which': 432,
 'usually': 4,
 'brings': 2,
 'He': 355,
 'was': 1873,
 'four': 22,
 '-': 460,
 'and': 3748,
 'twenty': 24,
 '

Then, create a vocabulary based on your word type counter **(5 points)**.

As we explained in class, it would be difficult for a count-based n-gram LM to learn good distributions for infrequent words. Therefore, we typically only keep frequent words in the vocabulary and replace others (i.e., infrequent words) with a special token `UNK`.

Now, when creating the vocabulary. Let's keep only words appearing at least twice in `sents_train`. An additional `UNK` token needs to be added to handle those dropped word types.

<font color='blue'>PLEASE INPUT YOUR ANSWER BELOW</font>

In [129]:
# TODO: create a vocabulary with UNK
vocabulary = list(count_word_types(sents_train).keys()) # replace it with your implementation
print(vocabulary)

['<s>', 'A', 'crowd', 'in', 'a', 'little', 'room', '--', 'Miss', 'Woodhouse', ',', 'you', 'have', 'the', 'art', 'of', 'giving', 'UNK', 'few', 'words', '.', '</s>', 'Mr', 'Knightley', "'", 's', 'eyes', 'had', 'Bates', 'glance', 'at', 'Jane', '"', 'Well', 'if', 'please', ',"', 'said', 'Mrs', 'Weston', 'rather', 'think', 'she', 'will', 'be', 'any', 'use', '."', 'When', 'I', 'talked', 'your', 'being', 'altered', 'by', 'time', 'progress', 'years', 'John', 'meant', 'to', 'imply', 'change', 'situation', 'which', 'usually', 'brings', 'He', 'was', 'four', '-', 'and', 'twenty', 'last', 'June', 'my', 'is', 'just', 'fortnight', 'day', 'difference', 'very', 'odd', 'man', 'must', 'much', 'love', 'indeed', 'describe', 'her', 'so', 'Emma', 'could', 'not', 'forgive', 'Yes', 'see', 'what', 'means', '(', 'turning', ',)', 'try', 'hold', 'tongue', 'wishing', 'get', 'better', 'his', 'attachment', 'herself', 'recovering', 'from', 'for', 'Elton', 'thank', ';', 'but', 'assure', 'are', 'quite', 'mistaken', 'tak

Now, show the size of your vocabulary (i.e., how many distinct words in your vocab, including UNK) **(5 points)**:

<font color='blue'>PLEASE INPUT YOUR ANSWER BELOW</font>

In [130]:
# TODO: show the size of your vocabulary
vocab_size =  len(vocabulary) # replace it with your implementation

print("Vocabulary size (including UNK):", vocab_size)

Vocabulary size (including UNK): 4183


If we use this vocabulary to index sentences in `sents_train`, what are the most frequent words (including UNK)? Show the top 10 word types with their counts. **(5 points)**

<font color='blue'>PLEASE INPUT YOUR ANSWER BELOW</font>

In [131]:
# TODO: Show the top 10 word types and their counts, one in a row, i.e.,
# word1, count1
# word2, count2
# ...
# word10, count10

vocab_count = count_word_types(sents_train)
sorted_words = sorted(vocab_count, key=vocab_count.get, reverse=True)
print("The most frequent 10 word types with counts:")
for r in sorted_words[:10]:
    print(r, vocab_count[r])

The most frequent 10 word types with counts:
, 9129
<s> 6173
</s> 6173
. 5513
to 4115
the 3805
and 3748
of 3386
UNK 2931
I 2518


---
**Step 2 -- Build the bigram LM (30 points):** Next, implement the bigram LM $p(w_t|w_{t-1})$ (with add-one smoothing) based on the `sents_train` corpus.

First, accumulate the bigrams in `sents_train` and create a bigram counter **(10 points)**:

<font color='blue'>PLEASE INPUT YOUR ANSWER BELOW</font>

In [132]:
def count_bigrams(sents):
    bigrams = {}

    # this will sort out UNKs
    vocab = count_word_types(sents)

    #count all existing bigrams
    for sent in sents:
        previous = "<s>"
        for current in [word if word in vocab else "UNK" for word in sent]:
            add_word_count_to_count_dict((previous, current), bigrams)
            previous = current
        current = "</s>"
        add_word_count_to_count_dict((previous, current), bigrams)

    #now count all possible non-existing bigrams
    for idx, token in enumerate(vocab.keys()):
        
        for jdx, tokenj in enumerate(list(vocab.keys())[idx:]):
            key = (token, tokenj)
            if key not in bigrams:
                add_word_count_to_count_dict(key, bigrams, 0)
            key = (tokenj, token) #also account for reverse relationship
            if key not in bigrams:
                add_word_count_to_count_dict(key, bigrams, 0)

    return bigrams



sents_train_bigram_counts = count_bigrams(sents_train)

For how many times in `sents_train` does a sentence start with the word "I"?

<font color='blue'>PLEASE INPUT YOUR ANSWER BELOW</font>

In [133]:
# TODO: Complete the code below to show the number of occurrences of "I" being the first word in sent_train
def get_count_in_bigrams(previous, current, bigram_counts):
    if bigram_counts is None or previous is None or current is None:
        return 0
    if (previous, current) not in bigram_counts:
        return 0
    return bigram_counts[(previous, current)]


i_start_times = get_count_in_bigrams("<s>", "I", sents_train_bigram_counts)
print("Sentences in sent_train starts with the word `I` for times:", i_start_times)

Sentences in sent_train starts with the word `I` for times: 502


Next, create the bigram language model (with add-one smoothing). **(15 points)**

**Hint:** The last few lines of code (now commented) are checking $\sum_{w_t \in \text{Vocab}} P(w_t | w_{t-1}) = 1$, i.e., whether the probability mass of `P(w_t|w_{t-1})` sums to 1 when enumerating all possible next word `w_t`. However, the code lines only work when the LM is defined as a dictionary `model_p` which stores `(w_{t-1}, w_t)` as key and its probability as value. You can write your own check if you are not following this data structure.

<font color='blue'>PLEASE INPUT YOUR ANSWER BELOW</font>

In [134]:
import time

start_time = time.time()

# TODO: create your bigram LM
model_p = {}
bigram_counts = count_bigrams(sents_train) # a dict of {(w_t-1, w_t): count}
# complete the implementation for model_p
vocab_count = count_word_types(sents_train)
vocab_size =  len(vocab_count)
for bigram in bigram_counts.keys():
    curr_bigram_count = bigram_counts[bigram]
    smooth_count = curr_bigram_count + 1
    previous_count = vocab_count[bigram[0]]
    probability = smooth_count / (previous_count + vocab_size)
    model_p[bigram] = probability

# Do not modify code after this line
end_time = time.time()
print("Spent %s for the bigram LM construction." % time.strftime("%Hh%Mm%Ss", time.gmtime(end_time-start_time)))

# The following lines verify the validity of the probability distribution -- it takes a while, please wait.
# If you receive an assertion error, then your implementation could be problematic.
for w_tm1 in vocabulary:
   if w_tm1 == "</s>": # not needed
       continue
   pr_mass = 0 # sum over different w's of p(w|w_tm1)
   for w_t in vocabulary:
       pr_mass += model_p[(w_tm1, w_t)]
   assert isclose(pr_mass, 1.0), "Probability mass of %s should sum to 1" % w_tm1 # sum should equals to 1

Spent 00h02m18s for the bigram LM construction.


Now, can you show the most frequent 10 starting words (i.e., words following `<s>`) with their probabilities to three decimal places (e.g., 0.123)? **(5 points)**

<font color='blue'>PLEASE INPUT YOUR ANSWER BELOW</font>

In [135]:
# TODO: Show the most frequent 10 starting words and their probabilities, one in a row, e.g.,
# word1, prob1
# word2, prob2
# ...
# word10, prob10

sorted_probs = sorted(model_p, key=model_p.get, reverse=True)
print("The most frequent 10 starting words and their probabilities:")
for r in sorted_probs[:10]:
    print(r[0], model_p[r])


The most frequent 10 starting words and their probabilities:
. 0.4214108910891089
Mr 0.18064263322884014
." 0.17887296289024152
' 0.15249500998003993
<s> 0.12321359598300502
; 0.11745871250421301
, 0.11440805288461539
Mrs 0.11417072654098707
! 0.09732412602503238
UNK 0.062271577171773965


## Part 2: Evaluation of Language Model (30 points)

**Overview:** In this part, you will implement the perplexity metric ($ppl$) and evaluate your bigram LM on the `sents_test` corpus ($D_{test}$). The math formulation of the perplexity is:
\begin{align}
    H(D_{test}) &= \frac{1}{\sum_{s \in D_{test}} |s|} \sum_{s \in D_{test}} -\log_2 P(s),\\
    ppl(D_{test}) &= 2^{H(D_{test})},
\end{align}
where $s \in D_{test}$ denotes a sentence in the test corpus, $|s|$ is its size (i.e., number of word tokens in the sentence `s`, including the extra `</s>`), $P(s)$ calculates the probability of a sentence $s$, and $H(D_{test})$ is the per-word cross entropy of the LM on the test corpus.

Since this part involves log calculation, you could use Python libraries such as `numpy` and `math`. **If you are running the notebook on Google Colab,** the libraries have been installed; **If you are running the notebook locally**, you may need to uncomment the following command and install the two libraries.

In [136]:
!pip3 install numpy

Defaulting to user installation because normal site-packages is not writeable


In [137]:
import numpy as np
import math

---
**Step 1 -- Calculate the log2 probability of a test sentence (20 points):** You will first implement the calculation of $\log_2 P(s)$ using the LM you have constructed in Part 1:
$$ \log_2 P(s) = \log_2 p(w_1| \text{<s>}) + \log_2 p(\text{</s>}|w_T) + \sum_{t=2}^T \log_2 p(w_t|w_{t-1}).$$

**Hint:** Don't forget to handle UNK tokens!

<font color='blue'>PLEASE INPUT YOUR ANSWER BELOW</font>

In [138]:
# TODO: The log2_P function should return a floating number indicating log2 P(s)


corpus_vocab = count_word_types(sents_train)
def convert_if_unk(token):
    return token if token in corpus_vocab else "UNK"

def log2_P(sent: List[str]) -> float:
    start_s_log = math.log2(model_p[("<s>", convert_if_unk(sent[0]))])
    end_s_log = math.log2(model_p[(convert_if_unk(sent[-1]), "</s>")])
    final_log = start_s_log + end_s_log
    for idx, token in enumerate(sent):
        if idx == 0:
            continue
        final_log += math.log2(model_p[(convert_if_unk(sent[idx - 1]),convert_if_unk(sent[idx]))])
    return final_log

Here's an example of how your $\log_2 P$ function will be used. The example shows the function output for the first sentence in the test set `sents_test`.

In [139]:
s = sents_test[0]
print("sentence:", s)
log2_pr = log2_P(s)
print("log2_P returns:", log2_pr)

sentence: ['The', 'extent', 'of', 'your', 'admiration', 'may', 'take', 'you', 'by', 'surprize', 'some', 'day', 'or', 'other', '."']
log2_P returns: -155.82667136701377


---
**Step 2 -- Calculate the per-word cross entropy (10 point):** You will then implement $H(D_{test})$ by reusing the $log2\_P$ function defined in Step 1.

<font color='blue'>PLEASE INPUT YOUR ANSWER BELOW</font>

In [140]:
# TODO: The H function should return a floating number indicating H(sents)
def H(sents: List[List[str]]) -> float:

    sent_magnitudes = sum([(len(sent) + 2) for sent in sents]) # +2 to account for EoS and SoS

    log2_ps = sum([log2_P(sent) * -1 for sent in sents])

    return (1/sent_magnitudes) * log2_ps

Here's an example how your $H$ function will be used:

In [141]:
sents_sample = [sents_test[0]]
print("sents:", sents_sample)
h_value = H(sents_sample)
print("H returns:", h_value)

sents: [['The', 'extent', 'of', 'your', 'admiration', 'may', 'take', 'you', 'by', 'surprize', 'some', 'day', 'or', 'other', '."']]
H returns: 9.166274786294927


---
**Step 3 -- Implement the perplexity metric:** After defining $H(D_{test})$, you can define the perplexity function $ppl$.

In [142]:
# The ppl function should return a floating number indicating the perplexity of the LM on `sents`
def ppl(sents: List[List[str]]) -> float:
    h_value = H(sents)
    return 2**h_value

**Finally,** let's evaluate the bigram LM you have constructed in Part 1 on the `sents_test` test corpus:

In [143]:
ppl_value = ppl(sents_test)
print("Perplexity:", ppl_value)

Perplexity: 334.163233816831


## Part 3: Sampling from Language Model (20 points)

**Overview:** The last part is about sampling different sentences from the learned language model. This means to repeatedly sample the next word based on the current word, until producing an end-of-sentence (`</s>`) token or the sentence length reaches a pre-defined limit (let's set `max_len`=50). Remember that each sentence should start from the special start-of-sentence (`<s>`) token.

In [144]:
max_len = 50

---
**Step 1 -- Greedy decoding (10 point):** Please sample one sentence from the bigram language model using greedy decoding, i.e., always choosing the most probable word as the next word, until producing an end-of-sentence token (`</s>`) or the sentence length reaches a pre-defined limit `max_len`.

<font color='blue'>PLEASE INPUT YOUR ANSWER BELOW</font>

In [145]:
# greedy decoding
sent = [] # used to store the generated words in your sentence

sample = "<s>"
sent.append(sample)
length = 1
while True:
    if length >= max_len:
        sent.append("</s>")
        break

    sample_probs = {k:v for k, v in model_p.items() if sample == k[0]}
    top_prob = sorted(sample_probs, key=sample_probs.get, reverse=True)[0]
    sample = top_prob[1]

    sent.append(sample)
    length += 1
    if sample == "</s>":
        break

# print the decoded sentence
print(sent)
print(" ".join(sent))

['<s>', '"', 'I', 'am', 'sure', ',', 'and', 'the', 'UNK', ',', 'and', 'the', 'UNK', ',', 'and', 'the', 'UNK', ',', 'and', 'the', 'UNK', ',', 'and', 'the', 'UNK', ',', 'and', 'the', 'UNK', ',', 'and', 'the', 'UNK', ',', 'and', 'the', 'UNK', ',', 'and', 'the', 'UNK', ',', 'and', 'the', 'UNK', ',', 'and', 'the', 'UNK', ',', '</s>']
<s> " I am sure , and the UNK , and the UNK , and the UNK , and the UNK , and the UNK , and the UNK , and the UNK , and the UNK , and the UNK , and the UNK , and the UNK , </s>


---
**Step 2 -- Sampling by distribution (10 point):** To encourage diversity while maintaining a certain degree of "naturalness", we could sample the next word following the LM's next-word probability distribution, i.e., $w \sim p(w|w_{t-1})$. Please implement this sampling strategy. Same as before, the generation should end when producing an end-of-sentence token (`</s>`) or the sentence length reaches a pre-defined limit `max_len`.

**Hint:** You could use `numpy.random.choice` to implement the sampling.

<font color='blue'>PLEASE INPUT YOUR ANSWER BELOW</font>

In [146]:
np.random.seed(1234) # for grading purpose, do not change the random seed

sent = [] # used to store your sentence

sample = "<s>"
sent.append(sample)
length = 1
while True:
    if length >= max_len:
        sent.append("</s>")
        break

    sample_distribution = {k:v for k, v in model_p.items() if sample == k[0]}
    choices = [key[1] for key in sample_distribution.keys()]
    #print(choices)
    probabilities = list(sample_distribution.values())
    #print(probabilities)
    #print(sample_distribution)
    random_prob = np.random.choice(choices, 1, p=probabilities)
    #print(random_prob)
    sample = random_prob[0]
    
    sent.append(sample)
    length += 1
    if sample == "</s>":
        break

# print the decoded sentence
print(sent)
print(" ".join(sent))

['<s>', 'She', 'deem', 'defer', 'successful', 'degradation', 'fond', 'Crown', 'colour', 'XVIII', 'inn', 'presented', 'Perhaps', 'lovers', 'reach', 'sex', 'pencil', 'produced', 'rather', 'asleep', 'succeeding', 'influence', 'dreadfully', 'glaring', 'alloy', 'agreeably', 'resources', 'itself', 'flourishes', 'fixed', 'multiplied', 'apologise', 'Pass', 'marries', 'feature', 'precious', 'Go', 'judged', 'refusing', 'learned', 'tear', 'reason', 'privilege', 'hurt', 'delicate', 'stranger', 'tacitly', 'ushered', ",'", 'recollection', '</s>']
<s> She deem defer successful degradation fond Crown colour XVIII inn presented Perhaps lovers reach sex pencil produced rather asleep succeeding influence dreadfully glaring alloy agreeably resources itself flourishes fixed multiplied apologise Pass marries feature precious Go judged refusing learned tear reason privilege hurt delicate stranger tacitly ushered ,' recollection </s>


## Congrats! You have completed your assignment. Don't forget to save your notebook (including the cell outputs). Follow the Submission Guideline to submit the completed PDF to Gradescope.