# NYT Connections Notebook

**PLEASE READ:** Python notebooks are a pain in the ass to try and merge in Github. This means that if you make an edit here, but someone else already made changes to this file, then trying to complete a git merge will be much harder for this file than, say, a normal Python file. This ultimately boils down to an ipynb *technically* being a JSON, and there's a lot of things going on under the hood that makes conflicts much more likely (incidentally, this is also the reason why if you and multiple people try to work on the same file on Google Colab, you're going to get messages about "unable to save local changes" and conflicts). As a result, **please do not modify this file.** Instead, **create a copy of this file and make your changes there** (e.g. `connections-notebook-[your-name].ipynb`).

In [54]:
import numpy as np
import pandas as pd

import gzip
import json
import random
import re
import io
import os
from dotenv import load_dotenv
from collections import Counter

from sklearn.metrics.pairwise import cosine_similarity
from sklearn.preprocessing import normalize
from datasets import load_dataset
from transformers import BertTokenizer, BertModel
import torch
import gensim.downloader as api
from gensim.models.word2vec import Word2Vec
from gensim.models import KeyedVectors
from itertools import combinations
from openai import OpenAI

## Load Games & Models

In [2]:
# Read in games from HuggingFace dataset
df_ = pd.read_csv("hf://datasets/eric27n/NYT-Connections/Connections_Data.csv")
df_['Word'] = df_['Word'].fillna("NA")
df_['Word'] = df_['Word'].str.lower()
df_['Group Name'] = df_['Group Name'].str.lower()
grouped = df_.groupby('Game ID')
result = []

for game_id, group in grouped:
  words = group['Word'].tolist()
  group_by_name = group.groupby('Group Name')
  solution = []
  
  for group_name, sub_group in group_by_name:
    group_words = sub_group['Word'].tolist()
    reason = sub_group['Group Name'].iloc[0]
    solution.append({'words': group_words, 'reason': reason})

  result.append({'words': words, 'solution': {'groups': solution}})

ds = result
ds_len = len(ds)
print(len(ds), ds[0])

628 {'words': ['snow', 'level', 'shift', 'kayak', 'heat', 'tab', 'bucks', 'return', 'jazz', 'hail', 'option', 'rain', 'sleet', 'racecar', 'mom', 'nets'], 'solution': {'groups': [{'words': ['shift', 'tab', 'return', 'option'], 'reason': 'keyboard keys'}, {'words': ['heat', 'bucks', 'jazz', 'nets'], 'reason': 'nba teams'}, {'words': ['level', 'kayak', 'racecar', 'mom'], 'reason': 'palindromes'}, {'words': ['snow', 'hail', 'rain', 'sleet'], 'reason': 'wet weather'}]}}


In [6]:
# Import different models
model_google = api.load('word2vec-google-news-300')
model_glove = api.load('glove-wiki-gigaword-300')
model_wiki = api.load('fasttext-wiki-news-subwords-300')

print(f"GOOGLE NEWS: {model_google.most_similar('seattle')}")
print(f"GLOVE: {model_glove.most_similar('seattle')}")
print(f"WIKI: {model_wiki.most_similar('seattle')}")

# Additional fourth model
# From my tests, this model did the best, albeit it requires a large download beforehand
# NEVER UPLOAD THE ZIPPED OR UNZIPPED TEXT FILE TO GITHUB
#     IF YOU DO, YOU WILL GET AN ERROR AND TRYING TO UNDO THESE CHANGES WILL BE A PAIN IN THE ASS
# https://github.com/commonsense/conceptnet-numberbatch
gzipped_file_path = 'numberbatch-en-19.08.txt.gz'
with gzip.open(gzipped_file_path, 'rt', encoding='utf-8') as f_in:
    decompressed_data = f_in.read()
decompressed_file = io.BytesIO(decompressed_data.encode('utf-8'))
model_numberbatch = KeyedVectors.load_word2vec_format(decompressed_file, binary=False)
print(f"NUMBERBATCH: {model_numberbatch.most_similar('seattle')}")

GOOGLE NEWS: [('denver', 0.6403177976608276), ('chicago', 0.6305170059204102), ('houston', 0.6260292530059814), ('boston', 0.6210216283798218), ('nyc', 0.6082404851913452), ('atlanta', 0.6007115244865417), ('cleveland', 0.5984835624694824), ('philadelphia', 0.5938323736190796), ('oakland', 0.592968225479126), ('orlando', 0.592677891254425)]
GLOVE: [('oakland', 0.6306731104850769), ('portland', 0.6112086772918701), ('mariners', 0.5879033207893372), ('francisco', 0.5715591907501221), ('denver', 0.5678620934486389), ('chicago', 0.5588996410369873), ('cleveland', 0.5522618889808655), ('angeles', 0.5507204532623291), ('milwaukee', 0.548105001449585), ('tampa', 0.5452930331230164)]
WIKI: [('minneapolis', 0.728505551815033), ('portland', 0.7126332521438599), ('vancouver', 0.6863006949424744), ('calgary', 0.6720302104949951), ('philadelphia', 0.6713477373123169), ('baltimore', 0.6664227247238159), ('houston', 0.6611942052841187), ('denver', 0.6545454859733582), ('melbourne', 0.6510372757911682

## Evaluate on one round

In [7]:
# Preprocess multi-word expressions (e.g. 'New York', 'push-up')
def preprocess_word(word, model):
  """
  Preprocess multi-word expressions (MWE) for accomodation by word2vec models.

  Args:
      word (str): The word to preprocess.
      model (gensim.models.word2vec): The word2vec model to check for MWE.

  Returns:
      str: The preprocessed word.
  """
  mwe = re.sub(r'[-\s]', '_', word.lower())
  
  if mwe not in model:
      mwe = re.sub(r'_', '', mwe)
  
  return mwe

In [8]:
# Extract words from ds[i]['words']
def guess(model, words):
  """
  Guess the best 4 words to form a group based on word similarity.
  
  Args:
      model (gensim.models.word2vec): The word2vec model to use.
      words (list): A list of words to process.
  
  Returns:
      list: A list of the best 4 words to form a group.
  """
  
  # Preprocess words for the model, create similarity matrix to find similarities among words
  words = [preprocess_word(word, model) for word in words]
  similarity_matrix = np.zeros((len(words), len(words)))
  for i, word1 in enumerate(words):
      for j, word2 in enumerate(words):
          if word1 in model and word2 in model:
              similarity_matrix[i, j] = model.similarity(word1, word2)
          else:
              similarity_matrix[i, j] = 0

  # Convert the similarity matrix to a DataFrame for easier manipulation
  similarity_df = pd.DataFrame(similarity_matrix, index=words, columns=words)
  _max = 0
  argmax = 0
  argword = ""
  
  # Find the word with the highest similarity to the first word
  for idx, word in enumerate(words):
    if type(similarity_df[word]) is pd.DataFrame:
      print(similarity_df[word])
    similar_words = similarity_df[word].sort_values(ascending=False)
    if similar_words.iloc[1] > _max:
      _max = similar_words.iloc[1]
      argmax = idx
      argword = similar_words.index[1]

  # Initialize the build list with the most similar pair of words
  build_list = [words[argmax], argword]

  # Create a copy of the original words list to avoid modifying it
  words_copy = words.copy()
  
  # Finding the third most similar word to the build list
  # Remove the most similar pair from the original words list
  for test_word in build_list:
    if test_word not in words_copy:
      return None
    words_copy.remove(test_word)

  # Calculate average similarity of remaining words to the build list
  sim_list = []
  for test_word in words_copy:
    similarities = []
    for train_word in build_list:
        if train_word in model and test_word in model:
            similarity = model.similarity(train_word, test_word)
            similarities.append(similarity)
        else:
            similarities.append(0)  # Handle words not in the model
    average_similarity = sum(similarities) / len(similarities)
    sim_list.append(average_similarity)

  # Find the word with the highest average similarity to the build list
  index_of_highest_value = sim_list.index(max(sim_list))
  build_list.append(words_copy[index_of_highest_value])

  # Finding the fourth most similar word to the build list
  # Pretty much same code as the third most similar word
  words_copy = words.copy()
  for test_word in build_list:
    if test_word not in words_copy:
      return None
    words_copy.remove(test_word)

  sim_list = []
  for test_word in words_copy:
    similarities = []
    for train_word in build_list:
        if train_word in model and test_word in model:
            similarity = model.similarity(train_word, test_word)
            similarities.append(similarity)
        else:
            similarities.append(0)  # Handle words not in the model
    average_similarity = sum(similarities) / len(similarities)
    sim_list.append(average_similarity)

  index_of_highest_value = sim_list.index(max(sim_list))
  build_list.append(words_copy[index_of_highest_value])

  # Return the final list of four words
  return build_list

In [9]:
def eval_round(guess_list, solution):
  """
  Evaluate the guess list against the solution.

  Args:
      guess_list (list): The list of guessed words. Should contain 4 entries.
      solution (dict): The solution dictionary containing the correct groups.

  Returns:
      int: The maximum number of correct guesses in any group.
  """
  # right_count evaluates the number of correct guesses in each group
  right_count = [0, 0, 0, 0]
  
  # Check if the guess list is valid
  if len(guess_list) != 4:
    return None
  
  # Check if the guess list aligns with a solution
  for final_word in guess_list:
    for idx, group in enumerate(solution['groups']):
      if final_word in group['words']:
        right_count[idx] += 1
  
  # Return the maximum number of correct guesses in any group
  # If the guess was all right, then the max will be 4
  return max(right_count)

In [10]:
models = [model_google, model_glove, model_wiki, model_numberbatch]
model_names = ["Google News", "Glove", "Wikipedia", "Numberbatch"]
correct_idx = []
for idx, model in enumerate(models):
  print(f"======== {model_names[idx]} ========")
  right_list = []
  one_away_when = []
  for i in range(ds_len):
    guess_list = guess(model, ds[i]['words'])
    if guess_list is not None:
      score = eval_round(guess_list, ds[i]['solution'])
      right_list.append(score)
      if score == 4 and i not in correct_idx:
        correct_idx.append(i)

  print(f"AVERAGE SCORE: {sum(right_list) / len(right_list)}")
  for i in range(1, 5):
    print(f"{i}: {right_list.count(i)}")
  print()
print(f"Number of Games with At Least One Good First Guess: {len(correct_idx)} / {ds_len}")

AVERAGE SCORE: 2.9106858054226477
1: 19
2: 179
3: 268
4: 161

AVERAGE SCORE: 2.8086124401913874
1: 22
2: 225
3: 231
4: 149

AVERAGE SCORE: 2.9649122807017543
1: 18
2: 177
3: 241
4: 191

AVERAGE SCORE: 3.0637958532695375
1: 28
2: 127
3: 249
4: 223

Number of Games with At Least One Good First Guess: 335 / 628


## Evaluate games

In [11]:
def compute_similarity_matrix(model, words):
    words = [preprocess_word(word, model) for word in words]
    words = [word for word in words if word in model]
    
    similarity_matrix = {}
    for i, word1 in enumerate(words):
        for j, word2 in enumerate(words):
            if i < j:  # Avoid redundant computations
                similarity_matrix[(word1, word2)] = model.similarity(word1, word2)
    return similarity_matrix

# Extract words from ds[i]['words'] with fallback guesses
# similarity_matrix: precomputed similarity matrix

def guess_best_combination(model, words, similarity_matrix=None, lives=4):
    if len(words) == 4:
        return [list(words) * lives]
    words = [preprocess_word(word, model) for word in words]
    words = [word for word in words if word in model]

    if len(words) < 4 or lives < 1:
        return None

    if similarity_matrix is None:
        similarity_matrix = compute_similarity_matrix(model, words)

    all_combinations = list(combinations(words, 4))
    scored_combinations = []

    for combination in all_combinations:
        similarities = []
        for i, word1 in enumerate(combination):
            for j, word2 in enumerate(combination):
                if i < j:
                    similarities.append(similarity_matrix.get((word1, word2), similarity_matrix.get((word2, word1), 0)))

        average_similarity = np.mean(similarities)
        scored_combinations.append((combination, average_similarity))

    # Sort combinations by average similarity in descending order
    scored_combinations.sort(key=lambda x: x[1], reverse=True)

    # Return up to four attempts in descending order of similarity
    top_guesses = [list(comb[0]) for comb in scored_combinations[:lives]]
    return top_guesses

In [12]:
print(guess_best_combination(model_google, ['host', 'light', 'win', 'yang', 'score', 'masculine', 'flock', 'land', 'expansive', 'sea', 'earn', 'crowd']))

[['win', 'score', 'earn', 'crowd'], ['host', 'win', 'score', 'earn'], ['win', 'score', 'flock', 'earn'], ['win', 'score', 'land', 'earn']]


In [13]:
def calculate_score(num_correct, strikes):
    """
    Calculate the score based on the number of correct guesses and strikes.
    
    Args:
        num_correct (int): The number of correct guesses (0-4).
        strikes (int): The number of strikes (0-4).
    
    Returns:
        float: The calculated score.
    """
    # Define multipliers and penalties
    multipliers = [1, 2, 3, 3]
    penalties = [1.0, 0.9, 0.75, 0.5, 0.25]

    # Ensure the number of correct groups is within the valid range
    if num_correct > 4:
        num_correct = 4

    # Calculate the total score
    total_score = 0
    for i in range(num_correct):
        total_score += 1 * multipliers[i] * penalties[strikes]

    return np.round(total_score, 2)

# Example usage
num_correct_1 = 4
num_correct_2 = 4
num_correct_3 = 2

strikes_1 = 0
strikes_2 = 1
strikes_3 = 2

print("All Correct with 0 strikes:", calculate_score(num_correct_1, strikes_1))  # Output: 9.0
print("All Correct with 1 strike:", calculate_score(num_correct_2, strikes_2))   # Output: 8.1
print("2 Correct Groups - 2 strikes:", calculate_score(num_correct_3, strikes_3)) # Output: 2.25

All Correct with 0 strikes: 9.0
All Correct with 1 strike: 8.1
2 Correct Groups - 2 strikes: 2.25


In [74]:
models = [model_google, model_glove, model_wiki, model_numberbatch]
model_names = ["Google News", "Glove", "Wikipedia", "Numberbatch"]
correct_idx = []
multiplier = {4: 1.0, 3: 0.9, 2: 0.75, 1: 0.5, 0: 0.25}

# Iterate through each model and evaluate the guesses
for idx, model in enumerate(models):
  print(f"======== {model_names[idx]} ========")
  right_list = []
  correct_guesses = []
  total_scores = []
  one_away_when = []
  for i in range(ds_len):
    #print("I:", i)
    lives = 4
    correct_count = 0
    total_score = 0
    options = ds[i]['words']
    while lives > 0 and len(options) > 0:
      #print("LEN:", len(options))
      guess_list = guess_best_combination(model, options, lives=lives)
      #print("GUESS:", guess_list)
      if guess_list is None:
        lives -= 1
        continue
      if guess_list is not None:
        for guess in guess_list:
          score = eval_round(guess, ds[i]['solution'])
          if score == 4:
            correct_count += 1
            right_list.append(score)
            options = [item for item in options if item not in guess]
            if len(options) == 4:
              correct_count += 1
              options = []
            break
          lives -= 1
          if guess == guess_list[-1] or lives == 0:
            right_list.append(score)
            break
    correct_guesses.append(correct_count)
    if model == model_numberbatch and i > 600:
      print(f"GAME {i}: {correct_count} correct guesses, {lives} lives left")
    total_scores.append(calculate_score(correct_count, 4 - lives))
    if correct_count == 4 and i not in correct_idx:
      correct_idx.append(i)

  print(f"AVERAGE SCORE: {sum(correct_guesses) / len(correct_guesses)}")
  for i in range(0, 5):
    print(f"{i}: {correct_guesses.count(i)}")
  print(f"Average Total Score: {sum(total_scores) / len(total_scores)} (Total: {sum(total_scores)})")
  print()
print(f"Number of Games with At Least One Complete Solve: {len(correct_idx)} / {ds_len}")

AVERAGE SCORE: 0.893312101910828
0: 323
1: 169
2: 76
3: 0
4: 60
Average Total Score: 0.8366242038216564 (Total: 525.4000000000002)

AVERAGE SCORE: 0.8136942675159236
0: 335
1: 167
2: 80
3: 0
4: 46
Average Total Score: 0.6980095541401274 (Total: 438.35)

AVERAGE SCORE: 1.1767515923566878
0: 264
1: 175
2: 96
3: 0
4: 93
Average Total Score: 1.240525477707007 (Total: 779.0500000000003)

GAME 601: 1 correct guesses, 0 lives left
GAME 602: 4 correct guesses, 4 lives left
GAME 603: 1 correct guesses, 0 lives left
GAME 604: 4 correct guesses, 2 lives left
GAME 605: 1 correct guesses, 0 lives left
GAME 606: 1 correct guesses, 0 lives left
GAME 607: 0 correct guesses, 0 lives left
GAME 608: 1 correct guesses, 0 lives left
GAME 609: 2 correct guesses, 0 lives left
GAME 610: 2 correct guesses, 0 lives left
GAME 611: 2 correct guesses, 0 lives left
GAME 612: 2 correct guesses, 0 lives left
GAME 613: 0 correct guesses, 0 lives left
GAME 614: 0 correct guesses, 0 lives left
GAME 615: 4 correct guesse

## ChatGPT

In [146]:
load_dotenv()
api_key = os.getenv("OPENAI_API_KEY")

In [None]:
client = OpenAI(api_key=api_key)
description = "You are an assistant configured to solve the New York Times Connections Word game."
prompt = """
You are an assistant configured to solve the New York Times Connections Word game.
Out of the given words, please return a group of 4 words that you are most confident are related to each other.
Please output your response in a JSON format with the following structure:
{
  "words": ["word1", "word2", "word3", "word4"],
  "reason": "Your reasoning here."
}

You may assume the following:
1. The provided list of words will always be a multiple of four, and a group of four words will always exist.
2. Every word in the provided list is part of a group of four words, but you only need to make one guess.
3. There will never be a "miscellaneous" group, and no word will be part of more than one group.
4. A red herring category may be present, where some words appear to be related but are not part of the correct group.

Please give your answer in a JSON format and do not provide any other text. Your words to choose from are as follows:
"""

In [153]:
words = ", ".join(ds[0]['words'])
response = client.chat.completions.create(
  model="gpt-3.5-turbo",
  messages=[
    {"role": "system", "content": description},
    {"role": "user", "content": prompt + words}
  ],
  max_completion_tokens=500,
  response_format={ "type": "json_object" }
)
response = response.choices[0].message.content
response = json.loads(response)
print(response)

{'words': ['level', 'racecar', 'kayak', 'mom'], 'reason': 'They are all palindromic words.'}


## TODO

Due: 6 March 2025

Group 1:
1. Write some code to try and better save attributes of results!
    * Ex: What are the guesses that are being made? What does each game look like?
    * I recommend saving the results as a JSON for organization, but feel free to decide how you'd like to save your results.
2. Generate a graph of some kind that can give some insights!
    * E.g. What kinds of groups are most commonly solved? What do you see with groups that are solved? What kinds of group difficulties are solved most often?

Group 2:
* Devise a better guessing algorithm, and implement it.