# Create toy problems of length 3

We create a toy problem with balls that hit each other in a sequential manner until one of them goes into the hole, e.g. "The red ball hit the blue ball. The blue ball hit the green ball. The green ball fell into the hole". 
We check if the LLM can recreate the correct sequence by asking questions such as "which ball started the chain" or "which ball was second in the chain". 
To increase complexity we will permute the order in which the sentence is presented, e.g. "The blue ball hit the green ball. The red ball hit the blue ball. The green ball fell into the hole". 

## Three ball problems

In [17]:
import itertools
import pandas as pd
 
def findsubsets(s, n):
    return list(itertools.combinations(s, n))

In [18]:
events = [
    "The {} ball hit the {} ball.",
    "The {} ball hit the {} ball."
]
outro = "The {} ball fell in the hole."
colors = [
    "blue",
    "red",
    "green",
    "brown",
    "purple",
    "black",
    "white"
]

In [19]:
# create sequences 
sequences = []
first_color = []
second_color = []
final_color = []
switched = []

color_triplets = findsubsets(colors, 3)

for ct in color_triplets:
    c1, c2, c3 = ct[0], ct[1], ct[2]
    s1 = events[0].format(c1, c2)
    s2 = events[1].format(c2, c3)
    o = outro.format(c3)
    # create the prompts
    prompt_in_order = s1 + " " + s2 + " " + o
    prompt_switched = s2 + " " + s1 + " " + o
    sequences.append(prompt_in_order)
    switched.append(False)
    sequences.append(prompt_switched)
    switched.append(True)
    # always append twice to account for switched order
    first_color.extend([c1, c1])
    second_color.extend([c2, c2])
    final_color.extend([c3, c3])

In [20]:
# save all in a pandas dataframe

df_toy_problem_3c = pd.DataFrame({
    "sequence":sequences,
    "switched":switched,
    "first_color":first_color,
    "second_color":second_color,
    "final_color":final_color
})
df_toy_problem_3c.to_csv("data/toy_problem_3/toy_problem_3c.csv", index=False)
df_toy_problem_3c.head()

Unnamed: 0,sequence,switched,first_color,second_color,final_color
0,The blue ball hit the red ball. The red ball h...,False,blue,red,green
1,The red ball hit the green ball. The blue ball...,True,blue,red,green
2,The blue ball hit the red ball. The red ball h...,False,blue,red,brown
3,The red ball hit the brown ball. The blue ball...,True,blue,red,brown
4,The blue ball hit the red ball. The red ball h...,False,blue,red,purple


## Three nonsense words problems

In [21]:
events = [
    "The {} hit the {}.",
    "The {} hit the {}."
]
outro = "The {} fell in the hole."
words = [
    "baz",
    "fuu",
    "schleep",
    "blubb",
    "bla",
    "plomp",
    "dinglebob"
]

In [22]:
# create sequences 
sequences = []
first_word = []
second_word = []
final_word = []
switched = []

word_triplets = findsubsets(words, 3)

for wt in word_triplets:
    w1, w2, w3 = wt[0], wt[1], wt[2]
    event1 = events[0].format(w1, w2)
    event2 = events[1].format(w2, w3)
    o = outro.format(w3)
    # create the prompts
    prompt_in_order = event1 + " " + event2 + " " + o
    prompt_switched = event2 + " " + event1 + " " + o
    sequences.append(prompt_in_order)
    switched.append(False)
    sequences.append(prompt_switched)
    switched.append(True)
    # always append twice to account for switched order
    first_word.extend([w1, w1])
    second_word.extend([w2, w2])
    final_word.extend([w3, w3])

In [23]:
# save all in a pandas dataframe

df_toy_problem_3nonsense = pd.DataFrame({
    "sequence":sequences,
    "switched":switched,
    "first_word":first_word,
    "second_word":second_word,
    "final_word":final_word
})
df_toy_problem_3nonsense.to_csv("data/toy_problem_3/toy_problem_3nonsense.csv", index=False)
df_toy_problem_3nonsense.head()

Unnamed: 0,sequence,switched,first_word,second_word,final_word
0,The baz hit the fuu. The fuu hit the schleep. ...,False,baz,fuu,schleep
1,The fuu hit the schleep. The baz hit the fuu. ...,True,baz,fuu,schleep
2,The baz hit the fuu. The fuu hit the blubb. Th...,False,baz,fuu,blubb
3,The fuu hit the blubb. The baz hit the fuu. Th...,True,baz,fuu,blubb
4,The baz hit the fuu. The fuu hit the bla. The ...,False,baz,fuu,bla
