<div align="right"><i>Peter Norvig<br>Sept 25, 2024</i></div>

# LLMs, Theory of Mind, and Cheryl's Birthday

There has been [much](https://spectrum.ieee.org/theory-of-mind-ai) [debate](https://aclanthology.org/2023.conll-1.25/) [on](https://www.gsb.stanford.edu/faculty-research/working-papers/theory-mind-may-have-spontaneously-emerged-large-language-models) [the](https://arxiv.org/abs/2302.02083) [degree](https://www.nature.com/articles/s41562-024-01882-z) to which Large Language Models (LLMs) have a theory of mind: a way of understanding what other people know and don't know. In this notebook I explore one small part of the issue by asking six LLMs to solve the [Cheryl's Birthday Problem](https://en.wikipedia.org/wiki/Cheryl%27s_Birthday), a well-known logic puzzle in which different characters have different states of knowledge. 

**TLDR**:  The six LLMs were all familiar with the problem, so I didn't have to describe it in the prompt, just name it. They were able to correctly state that the answer to the problem is July 16. But none of them were able to write a program that finds the solution. They all failed to distinguish the different knowledge states of the different characters–for this problem they had no theory of mind.

Below I show the response for each LLM. I elided some of the explanatory output when it was very similar to previous output, but I show the programs verbatim. My comments are in *[bracketed italics]* and the LLM output is in regular upright text. The queries were made on Sept 25, 2024; subsequent updates of the models may perform differently.

For each LLMs the prompt was:

***What is the answer to the "Cheryl's Birthday" problem? Write a Python program to solve it. Make sure that the program will still work if the list of possible dates is changed.***

# Human

An actual human, me, was able to write a [**program**](https://github.com/norvig/pytudes/blob/main/ipynb/Cheryl-and-Eve.ipynb) that correctly solves the original problem and also handles new sets of dates.

# ChatGPT 4o

*[ChatGPT 4o gives the following response, which starts with an excellent summary of the problem:]*

The "Cheryl's Birthday" problem is a well-known logic puzzle that can be solved using logical deduction. Let's first break down the problem before writing a Python program.

### Problem Statement
Cheryl gives Albert and Bernard a list of 10 possible birthdates, and tells them different information:
- Albert knows the month.
- Bernard knows the day.

Cheryl provides the following 10 possible dates:
- May 15, May 16, May 19
- June 17, June 18
- July 14, July 16
- August 14, August 15, August 17

Albert then says: "I don't know when Cheryl's birthday is, but I know that Bernard doesn't know either."

Bernard replies: "At first I didn’t know when Cheryl's birthday is, but now I know."

Albert then says: "Now I know when Cheryl's birthday is."

### Steps for Deduction
1. From Albert's statement, we can deduce that Cheryl's birthday is not in May or June, because if it were, Bernard could possibly know the exact birthday if the day was unique (May 19 or June 18). Since Albert is sure Bernard doesn't know, we can eliminate May and June.
2. After eliminating May and June, the remaining possible dates are:
    - July 14, July 16
    - August 14, August 15, August 17
3. From Bernard’s statement, now knowing the birthday, we can deduce that Cheryl's birthday isn't on July 14 or August 14, because if it were, Bernard would still be uncertain (since both dates share the day 14).
4. The remaining possible dates are:
    - July 16
    - August 15
    - August 17
5. Finally, from Albert’s statement, we deduce that Cheryl's birthday must be in July, because if it were in August, Albert wouldn’t be able to know the exact date. Thus, the birthday is July 16.

### Python Program to Solve the Problem
We can now write a Python program to generalize this solution and ensure it works for any list of possible dates.

### Explanation:
- Step 1: The function groups the dates by month and by day.
- Step 2: Based on Albert's statement, we eliminate the months of May and June.
- Step 3: Based on Bernard's statement, we eliminate dates where the day appears more than once (i.e., both July 14 and August 14 are eliminated).
- Step 4: Albert now knows Cheryl's birthday, so it must be in the month where only one date is left (i.e., July).

The program is designed to work with any given list of possible dates, not just the ones in the original problem.

In [1]:
def filter_dates(possible_dates):
    # Step 1: Group dates by month and day
    month_map = {}
    day_map = {}
    
    for date in possible_dates:
        month, day = date.split()
        day = int(day)
        
        # Group by month
        if month not in month_map:
            month_map[month] = []
        month_map[month].append(day)
        
        # Group by day
        if day not in day_map:
            day_map[day] = []
        day_map[day].append(month)
    
    # Step 2: Albert says: "I don't know when Cheryl's birthday is, but I know Bernard doesn't know either."
    # This implies Cheryl's birthday cannot be in May or June because these months have unique days (19th and 18th).
    possible_dates = [date for date in possible_dates if date.split()[0] not in ('May', 'June')]
    
    # Step 3: Bernard now knows the birthday, so it cannot be a day that appears more than once.
    possible_dates = [date for date in possible_dates if len(day_map[int(date.split()[1])]) == 1]
    
    # Step 4: Albert now knows the birthday, so it must be in a unique month remaining after Bernard's elimination.
    remaining_months = set(date.split()[0] for date in possible_dates)
    for date in possible_dates:
        month = date.split()[0]
        if len(month_map[month]) == 1:
            return date
    
    return "No solution found"

# Test the program with the given list of possible dates
possible_dates = [
    "May 15", "May 16", "May 19",
    "June 17", "June 18",
    "July 14", "July 16",
    "August 14", "August 15", "August 17"
]

cheryls_birthday = filter_dates(possible_dates)
print("Cheryl's birthday is:", cheryls_birthday)

Cheryl's birthday is: No solution found


*[We see that the program fails to find a solution. It fails because it has a single variable for possible dates, when it should be keeping track of separate possible dates for Albert and Bernard.]* 

# Gemini Advanced

*[Gemini's response:]*

The answer to Cheryl's Birthday problem is **July 16th**.

Here's the Python program, modified to handle changes in the list of possible dates:

In [2]:
def cheryl_birthday(possible_dates):
    """Solves the Cheryl's Birthday logic problem, allowing for a variable list of possible dates."""

    # Possible months
    months = ["May", "June", "July", "August"]

    # Albert's statement eliminates some possibilities
    # He knows the month, so there must be only one possible date in that month
    for month in months:
        if len([date for date in possible_dates if (month, date) not in [("May", 19), ("June", 18)]]) == 1:
            months = [month]
            break

    # Bernard's statement further narrows down the possibilities
    # He knows the date, so there must be only one possible month for that date
    for date in possible_dates:
        if len([month for month in months if (month, date) not in [("May", 19), ("June", 18)]]) == 1:
            possible_dates = [date]
            break

    # After Bernard's statement, Albert also knows the answer
    # This means there's only one possible date left in the remaining month
    for month in months:
        if len([date for date in possible_dates if (month, date) not in [("May", 19), ("June", 18)]]) == 1:
            possible_dates = [date]
            break

    # Now there should be only one possibility left
    if len(months) == 1 and len(possible_dates) == 1:
        return months[0], possible_dates[0]
    else:
        return None

# Original problem setup
original_dates = [14, 15, 16, 17, 18, 19]
solution = cheryl_birthday(original_dates)

if solution:
    print(f"Cheryl's birthday is on {solution[0]} {solution[1]}")
else:
    print("Unable to determine Cheryl's birthday")

# Example with a different set of possible dates
new_dates = [13, 14, 15, 16, 17]
new_solution = cheryl_birthday(new_dates)

if new_solution:
    print(f"With the new set of dates, Cheryl's birthday is on {new_solution[0]} {new_solution[1]}")
else:
    print("With the new set of dates, unable to determine Cheryl's birthday")

Unable to determine Cheryl's birthday
With the new set of dates, unable to determine Cheryl's birthday


*[Gemini correctly states the answer, but then produces a program that fails to find the correct answer. Again, the failure is due to having one list of possible dates. The program accepts a list of "possible_dates", but confusingly, those are actually just days of the month; you aren't allowed to pass in different month/day combinations. And even if you do pass in different days, the program hard-codes certain days and months.]*

# Llama 405B

*[Llama gives a fine introduction to the problem, which I omit. Here is the program it produces:]*

In [3]:
# Possible dates for Cheryl's birthday
dates = [
    (5, 13), (5, 15), (5, 16), 
    (6, 13), (6, 15), (6, 16), 
    (7, 13), (7, 15), (7, 16), 
    (8, 13), (8, 15), (8, 16)
]

# Function to check if a date is possible based on Albert's knowledge
def is_possible_albert(month):
    return all(m == month for m, _ in dates)

# Function to check if a date is possible based on Bernard's knowledge
def is_possible_bernard(day):
    return any((m, d) == (month, day) for m, d in dates)

# Function to check if Albert can deduce the birthday
def albert_deduce(month):
    return len([d for m, d in dates if m == month]) == 1

# Function to check if Bernard can deduce the birthday
def bernard_deduce(day):
    return len([m for m, d in dates if d == day]) == 1

# Main program
for month, day in dates:
    if not is_possible_albert(month):
        continue
    if not is_possible_bernard(day):
        continue
    if albert_deduce(month):
        print(f"Cheryl's birthday is {month}/{day}")
        break

*[There is no output from this program, which means it failed to find the date. This program is more flawed than the others. I do like that it defines functions for the various pieces, but note that the function "bernard_deduce" is defined but never called.]*

# Claude 3.5 Sonnet

*[Again, Claude provides a nice explanation of the problem, which I omit; here is the program:]*

In [4]:
from itertools import product

def solve_cheryls_birthday(dates):
    # Extract unique months and days
    months = sorted(set(date.split()[0] for date in dates))
    days = sorted(set(int(date.split()[1]) for date in dates))

    # Step 1: Albert knows the month, Bernard knows the day
    possible_months = [m for m in months if len([d for d in dates if d.startswith(m)]) > 1]
    possible_days = [d for d in days if len([date for date in dates if date.endswith(str(d))]) > 1]

    # Step 2: Albert knows Bernard doesn't know
    possible_dates = [date for date in dates if int(date.split()[1]) in possible_days]

    # Step 3: Bernard now knows
    possible_dates = [date for date in possible_dates if possible_dates.count(date.split()[1]) == 1]

    # Step 4: Albert now knows
    result = [date for date in possible_dates if possible_dates.count(date.split()[0]) == 1]

    return result[0] if result else "No unique solution found."

# Example usage
dates = [
    "May 15", "May 16", "May 19",
    "June 17", "June 18",
    "July 14", "July 16",
    "August 14", "August 15", "August 17"
]

solution = solve_cheryls_birthday(dates)
print(f"Cheryl's birthday is on: {solution}")

# Test with a different set of dates
new_dates = [
    "April 15", "April 16", "April 25",
    "May 15", "May 16",
    "June 17", "June 18",
    "July 14", "July 16", "July 17"
]

new_solution = solve_cheryls_birthday(new_dates)
print(f"With the new set of dates, Cheryl's birthday is on: {new_solution}")

Cheryl's birthday is on: No unique solution found.
With the new set of dates, Cheryl's birthday is on: No unique solution found.


*[Again, the program fails. It has the same issue with a single possible_dates variable.]*

# Perplexity

*[Here is Perplexity's code, with its explanation omitted:]*

In [5]:
from collections import defaultdict

def solve_cheryls_birthday(possible_dates):
    months = defaultdict(list)
    days = defaultdict(list)
    
    for date in possible_dates:
        month, day = date.split()
        months[month].append(date)
        days[day].append(date)
    
    # Albert knows the month, Bernard knows the day
    possible = [date for date in possible_dates if len(months[date.split()[0]]) > 1 and len(days[date.split()[1]]) > 1]
    
    # Bernard now knows the date
    possible = [date for date in possible if len([d for d in possible if d.split()[1] == date.split()[1]]) == 1]
    
    # Albert now knows the date
    possible = [date for date in possible if len([d for d in possible if d.split()[0] == date.split()[0]]) == 1]
    
    return possible[0] if len(possible) == 1 else None

# Example usage
possible_dates = [
    "May 15", "May 16", "May 19",
    "June 17", "June 18",
    "July 14", "July 16",
    "August 14", "August 15", "August 17"
]

result = solve_cheryls_birthday(possible_dates)
print(f"Cheryl's birthday is: {result}")

Cheryl's birthday is: None


*[Once again, this program has the same mistake.]*

# Cohere Chat Playground

*[Cohere gives a nice explanation, and then this program:]*

In [6]:
def solve_cheryls_birthday(possible_dates):
    # Create a dictionary to map months to their respective days
    month_days = {
        "May": [15, 16, 19],
        "June": [17, 18],
        "July": [14, 16],
        "August": [14, 15, 17],
    }

    # Albert knows the month, so we can filter out the possible dates based on that
    albert_knows = [date for date in possible_dates if date[1] in month_days]

    # Bernard knows the day, so we can further filter based on that
    bernard_knows = [date for date in albert_knows if date[2] in month_days[date[1]]]

    # If Bernard knows the birthday, Albert should also know it
    if len(bernard_knows) == 1:
        return bernard_knows[0]
    else:
        return "The solution could not be determined."

# Example usage:
possible_dates = [
    ("May", 15, "Friday"),
    ("May", 16, "Saturday"),
    ("May", 19, "Tuesday"),
    ("June", 17, "Thursday"),
    ("June", 18, "Friday"),
    ("July", 14, "Monday"),
    ("July", 16, "Wednesday"),
    ("August", 14, "Wednesday"),
    ("August", 15, "Thursday"),
    ("August", 17, "Saturday"),
]

solution = solve_cheryls_birthday(possible_dates)
print(f"Cheryl's birthday is: {solution}")

Cheryl's birthday is: The solution could not be determined.


*[Again, the program fails to find a solution because it doesn't distinguish who knows what (note that the "bernard_knows" variable is defined in terms of "albert_knows", but Bernard does not know what Albert knows). The program introduces days of the week, which is extraneous, but okay. A bigger issue is that it accepts "possible_dates" as a parameter, but then ignores them and defines "month_days" based on the original dates.]*