Homework 4: Neural Language Models  (& 🎃 SpOoKy 👻 authors 🧟 data) - Task 4, Option B
----

### Names
----
Names: __Katherine aristizabal, Jose Meza Llamosas__ (Write these in every notebook you submit.)

Task 4: Compare your generated sentences (25 points)
----

In this task, you'll analyze one of the models that you produced in Task 3. You'll need to compare against the corresponding file that was generated from the vanilla n-gram language model.

Choose *__one__* of option A or B (this notebook).

Option B: Evaluate the generated sentences of *word*-based models
----

Your job for this option is to measure the quality of your generated sentences for word-based models. For this option you *must* survey at least 3 people who are __not__ in this course. They need to speak and read the language that you are evaluating, but they need not be native speakers.

You will evaluate the quality of the generated sentences in the following way:
- Generate 20 sentences from your best word-based neural model. (Value of hyperparameters and n value up to you).
- Using the same level of n-gram, pair these sentences with provided sentences from the vanilla n-gram model. If you want to evaluate a model with N != 3, 4, or 5, then you'll need to train your vanilla n-gram model and generate your own comparison sentences. Ignore sentences with \<UNK\> in them for even comparison, so you'll need to over-generate to get 20.
    - Pair them (roughly) based on sentence length, so that each pair has sentences that are a roughly similar number of tokens.

Next, build a survey. For each pair of (neural LM sentence, vanilla n-ngram LM sentence), you'll ask the survey taker three binary selection questions:
1. which sentence is more grammatical?
2. which sentence makes more sense, semantically (in meaning)?
3. Overall, which sentence do you prefer?


Finally, you'll evaluate your survey results __programmatically__ (export them as a csv). Calculate the following:
1. What percentage of neural vs. vanilla n-gram LM sentences were preferred, separated along each of the three dimensions?
2. What is [Krippendorff's alpha](https://en.wikipedia.org/wiki/Krippendorff%27s_alpha) for your survey data? 

You are welcome to use a pre-built python implmenetation of the Krippendorff's alpha calculation, such as [this one](https://pypi.org/project/krippendorff/). Krippendorff's alpha is one way to measure interannotator agreement — the extent to which your survey respondants agree with one another.

You will submit your survey data (as a csv called `survey_results.csv`) __and__ your paired sentences (`paired.txt`, formatted in a way that is easy to understand) alongside this notebook.

In [1]:
 #!pip install krippendorff

In [4]:
# your imports here

import krippendorff
 
from typing import List
import numpy as np

# if you want fancy progress bars
from tqdm.autonotebook import tqdm

# Remember to restart your kernel if you change the contents of this file!
import task4_utils as nutils

import torch
import torch.nn as nn
from torch.utils.data import TensorDataset, DataLoader
import torch.optim as optim

# This function gives us nice print-outs of our models.
from torchinfo import summary

  from tqdm.autonotebook import tqdm
[nltk_data] Downloading package punkt to /Users/0wner/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package punkt_tab to /Users/0wner/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!


In [3]:
# your code here
EMBEDDINGS_SIZE = 50
NGRAM = 3
NUM_SEQUENCES_PER_BATCH = 128

TRAIN_FILE = 'spooky_author_train.csv' # The file to train your language model on
OUTPUT_WORDS = 'generated_wordbased.txt' # The file to save your generated sentences for word-based LM
OUTPUT_CHARS = 'generated_charbased.txt' # The file to save your generated sentences for char-based LM

# you can update these file names if you want to depending on how you are exploring 
# hyperparameters
EMBEDDING_SAVE_FILE_WORD = f"spooky_embedding_word_{EMBEDDINGS_SIZE}.model" # The file to save your word embeddings to
EMBEDDING_SAVE_FILE_CHAR = f"spooky_embedding_char_{EMBEDDINGS_SIZE}.model" # The file to save your char embeddings to
MODEL_FILE_WORD = f'spooky_author_model_word_{NGRAM}.pt' # The file to save your trained word-based neural LM to
MODEL_FILE_CHAR = f'spooky_author_model_char_{NGRAM}.pt' # The file to save your trained char-based neural LM to

FINAL_OUTPUT_WORDS = "final_word_generated.txt"


import zipfile
import pickle


#load previously generated sentences
#f = open(OUTPUT_WORDS, "r", encoding="utf-8", errors="ignore")
#contents = f.readlines()
with zipfile.ZipFile(FINAL_OUTPUT_WORDS, "r") as zip_ref:
    zip_ref.extractall("extracted_folder")

with open("extracted_folder/final_word_generated/data.pkl", "rb") as f:
    contents =  pickle.load(f)
#task3_list = nutils.read_file_spooky(OUTPUT_WORDS, ngram=NGRAM, char=False)
#chosen_sentences = contents[:20]
lines = contents.split('\n')
for line in lines[:20]:
    print(line)



I would like to know the know ride know complete know my the be.
I would like to be the the me a know know know know the.
I would like to look a make know the take his know know know.
I would like to the the the know the know know the the know.
I would like to the the know be know know a be complete have.
I would like to be complete know know you know complete other know know.
I would like to know open my speak return know the know set them.
I would like to be be turn know the know know state the them.
I would like to know a them the the be the it have hold.
I would like to make the the do know his a know recall the.
I would like to think say know tell be be the make set the.
I would like to know a speak know know the know know complete know.
I would like to so the such be know the know complete your a.
I would like to look know them know the make the the you the.
I would like to know know know it know know me know them the.
I would like to be be lead no the complete old the know have.

In [4]:
VANILLA_FILE = 'spooky_vanilla_5_word.txt' # The evaluation file of words

#sort generated sentence by sentence length
sorted_words = sorted(lines[:20], key=len)

#divide generated sentences into ngrams
ngram_sentences = []
for line in sorted_words:
    ngram_sentences.append(nutils.create_ngrams(line, n=NGRAM))
#print(ngram_sentences)

with open(VANILLA_FILE, "r") as f:
    contents_v =  f.read()

lines_v = contents_v.split('\n')

#sort them by sentence length
sorted_words_v = sorted(lines_v[:20], key=len)
#print("sorted" , nutils.format_sentence(sorted_words_v))

#divide vanilla text into ngrams
vanilla_ngrams = []

for line in sorted_words_v:
    #print(line)
    vanilla_ngrams.append(nutils.create_ngrams(line, n=NGRAM))
#print(vanilla_ngrams)

for i in range(len(sorted_words)):
    print("pair", i)
    print(sorted_words[i])
    print(sorted_words_v[i])



pair 0
I would like to be the the me a know know know know the.
why the third degree ?
pair 1
I would like to know a them the the be the it have hold.
nor did this seem extravagant .
pair 2
I would like to be an the look do know the be speak his.
i began to murmur , to hesitate , to resist .
pair 3
I would like to make the the do know his a know recall the.
never a competent navigator , i could only wait .
pair 4
I would like to think say know tell be be the make set the.
alas we had fallen upon the most evil of all our evil days .
pair 5
I would like to dream make the know a know all be know the.
answer me , i conjure you , with confidence and sincerity . ''
pair 6
I would like to the the the know the know know the the know.
so on the night of july , , and remained with us until late in the night .
pair 7
I would like to the the a speak it make open know know know.
one of very remarkable character , and i had selected his features as beautiful .
pair 8
I would like to you look be the 

In [36]:
#load survey answers
import pandas as pd
df = pd.read_csv('hw4_survey_responses.csv')


In [60]:
every_1st_column = df.iloc[:, 1::3]
# All the columns asking about grammatical sense
print(every_1st_column.columns)

every_2nd_column = df.iloc[:, 2::3]
# All the columns asking about meaning
print(every_2nd_column.columns)

every_3rd_column = df.iloc[:, ::3]
# All the columns asking about overall satisfaction
print(every_3rd_column.columns)

Index(['Given the following two sentences:\nSentence 1: I would like to be the the me a know know know know the.\nSentence 2: why the third degree ? [Which is most grammatical?]',
       'Given the following two sentences:\nSentence 1: I would like to know a them the the be the it have hold.\nSentence 2: nor did this seem extravagant . [Which is most grammatical?]',
       'Given the following two sentences:\nSentence 1: I would like to be an the look do know the be speak his.\nSentence 2:  i began to murmur , to hesitate , to resist . [Which is most grammatical?]',
       'Given the following two sentences:\nSentence 1: I would like to make the the do know his a know recall the.\nSentence 2: never a competent navigator , i could only wait . [Which is most grammatical?]',
       'Given the following two sentences:\nSentence 1: I would like to think say know tell be be the make set the.\nSentence 2: alas we had fallen upon the most evil of all our evil days. [Which is most grammatical?]

In [78]:
#Only the values
#print(every_3rd_column.values)

total_1st_answers= every_1st_column.to_numpy().flatten()
sentence2_1st_answers = (total_1st_answers == "Sentence 2").sum()
#percentage_1st_2 = (sentence2_1st_answers/len(total_1st_answers)) *100
print("Percentage of Sentence 2 for Grammatical question:", (sentence2_1st_answers/len(total_1st_answers)) *100)
sentence1_1st_answers = (total_1st_answers == "Sentence 1").sum()
print("Percentage of Sentence 1 for Grammatical question:", (sentence1_1st_answers/len(total_1st_answers)) *100)


total_2nd_answers= every_2nd_column.to_numpy().flatten()
sentence2_2nd_answers = (total_2nd_answers == "Sentence 2").sum()
print("Percentage of Sentence 2 for Making sense question:", sentence2_2nd_answers/len(total_2nd_answers) *100)
sentence1_2nd_answers = (total_1st_answers == "Sentence 1").sum()
print("Percentage of Sentence 1 for Grammatical question:", sentence1_2nd_answers/len(total_2nd_answers) *100)


total_3rd_answers= every_3rd_column.to_numpy().flatten()
sentence2_3rd_answers = (total_3rd_answers == "Sentence 2").sum()
print("Percentage of Sentence 2 for Making sense question:",sentence2_3rd_answers/len(total_3rd_answers) *100)
sentence1_3rd_answers = (total_3rd_answers == "Sentence 1").sum()
print("Percentage of Sentence 1 for Grammatical question:", sentence1_3rd_answers/len(total_3rd_answers) *100)


Percentage of Sentence 2 for Grammatical question: 100.0
Percentage of Sentence 1 for Grammatical question: 0.0
Percentage of Sentence 2 for Making sense question: 100.0
Percentage of Sentence 1 for Grammatical question: 0.0
Percentage of Sentence 2 for Making sense question: 95.23809523809523
Percentage of Sentence 1 for Grammatical question: 0.0


In [37]:
for col in df.values:
   print(col)

['3/13/2025 11:19:55' 'Sentence 2' 'Sentence 2' 'Sentence 2' 'Sentence 2'
 'Sentence 2' 'Sentence 2' 'Sentence 2' 'Sentence 2' 'Sentence 2'
 'Sentence 2' 'Sentence 2' 'Sentence 2' 'Sentence 2' 'Sentence 2'
 'Sentence 2' 'Sentence 2' 'Sentence 2' 'Sentence 2' 'Sentence 2'
 'Sentence 2' 'Sentence 2' 'Sentence 2' 'Sentence 2' 'Sentence 2'
 'Sentence 2' 'Sentence 2' 'Sentence 2' 'Sentence 2' 'Sentence 2'
 'Sentence 2' 'Sentence 2' 'Sentence 2' 'Sentence 2' 'Sentence 2'
 'Sentence 2' 'Sentence 2' 'Sentence 2' 'Sentence 2' 'Sentence 2'
 'Sentence 2' 'Sentence 2' 'Sentence 2' 'Sentence 2' 'Sentence 2'
 'Sentence 2' 'Sentence 2' 'Sentence 2' 'Sentence 2' 'Sentence 2'
 'Sentence 2' 'Sentence 2' 'Sentence 2' 'Sentence 2' 'Sentence 2'
 'Sentence 2' 'Sentence 2' 'Sentence 2' 'Sentence 2' 'Sentence 2'
 'Sentence 2']
['3/13/2025 13:40:20' 'Sentence 2' 'Sentence 2' 'Sentence 2' 'Sentence 2'
 'Sentence 2' 'Sentence 2' 'Sentence 2' 'Sentence 2' 'Sentence 2'
 'Sentence 2' 'Sentence 2' 'Sentence 2' 'Sent

Make sure that your reported results are nicely formatted!