# Connections

We are going to solve the NYTimes Connections words game: https://www.nytimes.com/games/connections

In [1]:
# !pip install weave openai

In [2]:
import weave

TEAM="llm-finetuning-course" # or your W&B username instead

weave.init(f"{TEAM}/connections")

Logged in as Weights & Biases user: capecape.
View Weave data at https://wandb.ai/llm-finetuning-course/connections/weave




## Load Data

We have created a dataset with all previous connections puzzles

In [3]:
# !wget https://raw.githubusercontent.com/wandb/connections/main/connections_prompts.jsonl

In [4]:
import json
import weave


def load_jsonl(file_path: str) -> list: 
    return [json.loads(line) for line in open(file_path, 'r').readlines()]

# ds = weave.ref('connections_prompts').get()
ds = load_jsonl("connections_prompts.jsonl")

In [5]:
print(ds[0]["solution"])

{'groups': [{'words': ['bucks', 'heat', 'jazz', 'nets'], 'reason': 'nba teams'}, {'words': ['hail', 'rain', 'sleet', 'snow'], 'reason': 'wet weather'}, {'words': ['option', 'return', 'shift', 'tab'], 'reason': 'keyboard keys'}, {'words': ['kayak', 'level', 'mom', 'racecar'], 'reason': 'palindromes'}]}


In [6]:
print(ds[0]["words"])

['nets', 'return', 'heat', 'jazz', 'mom', 'shift', 'kayak', 'option', 'rain', 'sleet', 'level', 'racecar', 'bucks', 'tab', 'hail', 'snow']


## Naive approach

In [7]:
import os
import openai


OPENAI_API_KEY = "sk-..."  # put your key here, the one you got from the credits 😎
OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY", OPENAI_API_KEY)

client = openai.AsyncClient(api_key=OPENAI_API_KEY)

we are using the `json_object` response format to get a structured answer, we could use instructor here if we want to obtain more controlled structured output.

In [8]:
@weave.op()
async def call_openai(messages, model="gpt-4o", max_tokens=256, temperature=0.7):
    response = await client.chat.completions.create(
        model=model,
        messages=messages,
        max_tokens=max_tokens,
        temperature=temperature,
        response_format={ "type": "json_object" }  # <- quick win to get a structured answer
        )
    extracted = response.choices[0].message.content
    if extracted is None:
        raise ValueError("No response from model")
    return extracted


let's parse the output and get a structured answer using `json`

In [9]:
import json

@weave.op()
async def generate_solution(messages, **kwargs):

    res = await call_openai(messages, **kwargs)
    try:
        generation = json.loads(res)
    except:
        generation = {}
    return generation

## Using the weave.Model class

Let's organize our first model in a class, this way we can keep everything versioned and organized. [weave.Model](https://wandb.github.io/weave/guides/core-types/models) is a superclass of Pydanic BaseModel we some extra attributes, like the `predict` function.

In [10]:
class OneShotModel(weave.Model):
    system_prompt: str
    user_prompt: str
    temperature: float = 0.7
    max_tokens: int = 256
    
    @weave.op()
    async def predict(self, words):
        messages = [
            {"role": "system", "content": self.system_prompt},
            {"role": "user", "content": self.user_prompt + str(list(words))}
        ]
        return await generate_solution(messages, temperature=self.temperature, max_tokens=self.max_tokens)

Let's define some starting prompts to use our model

In [11]:
# openAI has a system prompt that steers the conversation
system_prompt = (
    "You are an expert puzzle solver. You understand literature and you are well versed on word play. "
    "I want you to solve a daily word puzzle that finds commonalities between words.\n"
    )

# a naive prompt to solve the puzzle at once
user_prompt = (
    "Here it's the puzzle:\n"
    "- There are 16 words, which form 4 groups of 4 words. Each group has some common theme that links the words.\n"
    "- You must use each of the 16 words, and use each word only once.\n"
    "- Each group of 4 words are linked together in some way. \n"
    "The connection between words can be simple.\n"
    """- An example of a simple connection would be {"reason":'types of fish', "words":["Bass", "Flounder", "Salmon", "Trout"]}. \n"""
    """- Categories can also be more complex, and require abstract or lateral thinking. An example of this type of connection would be {"reason": 'things that start with FIRE', "words": ['Ant', 'Drill', 'Island', 'Opal']}\n"""
    """The results should be in JSON format as following: {"groups": [{"reason":"reason why words are grouped", "words":["word1", "word2", "word3", "word4"]}, ...]}"""
    "Provide a full solution to the puzzle, it should be 4 groups of 4 words."
    "Here are the words for today’s puzzle:\n")

In [12]:
model = OneShotModel(system_prompt=system_prompt, user_prompt=user_prompt)

In [13]:
words = list(ds[0]["words"])
words

['nets',
 'return',
 'heat',
 'jazz',
 'mom',
 'shift',
 'kayak',
 'option',
 'rain',
 'sleet',
 'level',
 'racecar',
 'bucks',
 'tab',
 'hail',
 'snow']

In [14]:
output = await model.predict(words=words)
output

🍩 https://wandb.ai/llm-finetuning-course/connections/r/call/37fcd894-394f-48eb-8da8-3c6999fa2ff2


{'groups': [{'reason': 'NBA teams',
   'words': ['nets', 'heat', 'jazz', 'bucks']},
  {'reason': 'palindromes', 'words': ['mom', 'kayak', 'level', 'racecar']},
  {'reason': 'types of precipitation',
   'words': ['rain', 'sleet', 'hail', 'snow']},
  {'reason': 'computer-related terms',
   'words': ['return', 'shift', 'option', 'tab']}]}

In [15]:
ds[0]["solution"]

{'groups': [{'words': ['bucks', 'heat', 'jazz', 'nets'],
   'reason': 'nba teams'},
  {'words': ['hail', 'rain', 'sleet', 'snow'], 'reason': 'wet weather'},
  {'words': ['option', 'return', 'shift', 'tab'], 'reason': 'keyboard keys'},
  {'words': ['kayak', 'level', 'mom', 'racecar'], 'reason': 'palindromes'}]}

this seems fine, let's create a function to compare both results

In [16]:
@weave.op()
def check_solution(solution, model_output):
    "Check that all group of words match the solution"
    solution_set = {frozenset(group["words"]) for group in solution["groups"]}
    model_output_set = {frozenset(group["words"]) for group in model_output["groups"]}
    
    accuracy = len(solution_set.intersection(model_output_set))
    
    return {"match": accuracy == 4, "accuracy": accuracy}

In [17]:
check_solution(ds[0]["solution"], output)

🍩 https://wandb.ai/llm-finetuning-course/connections/r/call/6a41dd2b-4c0a-48bf-9a2d-20a41d936efe


{'match': True, 'accuracy': 4}

## Running and Evaluation

We can automate the process of testing our model by running it on all puzzles and checking the accuracy of the solutions.

In [18]:
NUM_TEST_SAMPLES = 20 # the last 20 puzzles

In [19]:
weave_eval = weave.Evaluation(dataset=ds[-NUM_TEST_SAMPLES:], scorers=[check_solution])
await weave_eval.evaluate(model)

🍩 https://wandb.ai/llm-finetuning-course/connections/r/call/ea508a9c-220e-4ae6-96e7-eb52fdd4611e


{'check_solution': {'match': {'true_count': 4, 'true_fraction': 0.2},
  'accuracy': {'mean': 1.65}},
 'model_latency': {'mean': 3.1481900095939634}}

## Now it's your turn to improve this solution!