## Model Analysis for Indirect Object Identification (IOI) in GPT2
Blackbox analysis of GPT-neo-20b parameters to try to better understand IOI behavior.

In [87]:
import openai
from plotly.subplots import make_subplots
import plotly.graph_objects as go
from tabulate import tabulate
from math import exp
import plotly.express as px

openai.api_key = "api"

openai.api_base = "https://api.goose.ai/v1"

In [95]:
# completion generation
def get_completion(prompt, engine, logprobs=20):
    completion = openai.Completion.create(
        engine=engine,
        prompt=prompt,
        max_tokens=30,
        n=1,
        stream=False,
        temperature=0,
        logprobs=logprobs)

    return engine + '\n' + "[" + completion['choices'][0]['text'] + "]" + '\n'

# get the logprobabilities for next token possibilities
def get_logprobs(prompt, engine, logprobs=20):
    completion = openai.Completion.create(
        engine=engine,
        prompt=prompt,
        max_tokens=1,
        n=1,
        stream=False,
        logprobs=logprobs)

    return {k: exp(v) for k, v in completion['choices'][0]['logprobs']['top_logprobs'][0].items()}

    
    # plot a graph of two log probabilities of the same model with two different prompts
def plot_compare_prompt_logprobs(prompt1, prompt2,xaxis, yaxis, model="gpt-neo-20b"):
    lp1 = get_logprobs(prompt1, model)
    lp2 = get_logprobs(prompt2, model)
    data = {k: (lp1.get(k, 0), lp2.get(k,0)) for k in lp1.keys() | lp2.keys()}
    fig = go.Figure(
        data=go.Scatter(
            x=[v[0] for v in data.values()],
            y=[v[1] for v in data.values()],
            mode='markers+text',
            text=list(data.keys()),
            textposition="bottom center"
                       )
                    )
    fig.update_layout(
    title={
        'text': model,
        'y':0.9,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
    xaxis_title={
            'text': xaxis
        },
    yaxis_title={
            'text': yaxis
        }
    
    )

    fig.show()
    
    # plot two graphs side by side
def plot_two(prompt1, prompt2, prompt3, prompt4, xaxis1, yaxis1, xaxis2, yaxis2, model="gpt-neo-20b"):
    fig = make_subplots(rows=1, cols=2)
    lp1 = get_logprobs(prompt1, model)
    lp2 = get_logprobs(prompt2, model)
    data = {k: (lp1.get(k, 0), lp2.get(k,0)) for k in lp1.keys() | lp2.keys()}
    fig.add_trace( 
        go.Scatter(
            x=[v[0] for v in data.values()],
            y=[v[1] for v in data.values()],
            mode='markers+text',
            text=list(data.keys()),
            textposition="bottom center"
           ),
        row=1, col=1
                 )
    lp1 = get_logprobs(prompt3, model)
    lp2 = get_logprobs(prompt4, model)
    data = {k: (lp1.get(k, 0), lp2.get(k,0)) for k in lp1.keys() | lp2.keys()}
    fig.add_trace( 
        go.Scatter(
            x=[v[0] for v in data.values()],
            y=[v[1] for v in data.values()],
            mode='markers+text',
            text=list(data.keys()),
            textposition="bottom center"
           ),
        row=1, col=2
                 )
    fig.update_layout(
    title={
        'text': model,
        'y':0.9,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
    )
    
    # update axis
    fig.update_xaxes(title_text=xaxis1, row=1, col=1)
    fig.update_xaxes(title_text=xaxis2, row=1, col=2)
    fig.update_yaxes(title_text=yaxis1, row=1, col=1)
    fig.update_yaxes(title_text=yaxis2, row=1, col=2)

    fig.show()
    

## First steps
Initially, we should recreate the behavior as stated in the document and ensure that behavior is not being caused by obvious/simple pattern recognitions.

In [96]:
prompt = "When Michael and Kim got a ring at the grocery store, Kim decided to give it to"
print(prompt)
print(get_completion(prompt, "gpt-neo-125m"))
print(get_completion(prompt, "gpt-neo-1-3b"))
print(get_completion(prompt, "gpt-neo-20b"))


When Michael and Kim got a ring at the grocery store, Kim decided to give it to
gpt-neo-125m
[ Michael. Michael was a big fan of Kim's and Kim's wedding, and Kim was a big fan of Michael's. Michael was a big fan]

gpt-neo-1-3b
[ Michael.

“I was like, ‘I’m going to give this to you,’” Kim said.]

gpt-neo-20b
[ him.

"I was like, 'I'm going to give you this ring, and I want you to know that I love you and]



The labelling nomenclature is AB-A or AB-B, where A and B are the subjects.

In [97]:
prompt1 = "When Michael and Kim got a ring at the grocery store, Kim decided to give it to"
prompt2 = "When Kim and Michael got a ring at the grocery store, Kim decided to give it to"
prompt3 = "When Michael and Kim got a ring at the grocery store, Michael decided to give it to"
prompt4 = "When Kim and Michael got a ring at the grocery store, Michael decided to give it to"

In [98]:
plot_two(prompt1, prompt2, prompt3, prompt4, "MK-K", "KM-K", "MK-M", "KM-M")
plot_two(prompt1, prompt3, prompt2, prompt4, "MK-K", "MK-M", "KM-K", "KM-M")


### observation
'give it to...' seems to be most likely followed by [her] or an object classified as female. But this ground rule does not seem powerful enough to override the actual IOI going on.

In [92]:
prompt1 = "When Kim got a ring at the grocery store, John decided to give it to"
prompt2 = "When John got a ring at the grocery store, Kim decided to give it to"

In [44]:
print(prompt1)
print(get_completion(prompt1, "gpt-neo-20b"))

print(prompt2)
print(get_completion(prompt2, "gpt-neo-20b"))

When Kim got a ring at the grocery store, John decided to give it to
gpt-neo-20b
[ her.

"I was like, 'I'm going to give you this ring, and I want you to be my wife,'" he said]

When John got a ring at the grocery store, Kim decided to give it to
gpt-neo-20b
[ him.

"I was so excited," she said. "I was like, 'Oh my gosh, I'm going to be a]



In [93]:
prompt1 = "When Eric and George got a ring at the grocery store, George decided to give it to"
prompt2 = "When Kim and Martha got a ring at the grocery store, Martha decided to give it to"

In [94]:
plot_compare_prompt_logprobs(prompt1, prompt2, "All Male", "All Female")

In [47]:
prompt1 = "When Kim and Mike got a ball at the grocery store, Mike decided to give it to"
prompt2 = "When Kim and Mike got a dress at the grocery store, Mike decided to give it to"

In [48]:
plot_compare_prompt_logprobs(prompt1, prompt2, "Ball", "Dress")

## Hypothesis
The model resorts to trying to mention all entities an 'equal' amount of times. For example, if Michael is mentioned twice and Kim only once, it thinks that it is more probable that the next subject is Kim.

## Idea: 3 Subjects
If there are 3 subjects, and only one of them is mentioned less times than the other 2, the model should predict the correct subject with confidence. <br>
However, if two of them are not mentioned, it should be ambiguous to the model.

In [49]:
prompt1 = "Michael, George, and John got a ring at the grocery store. When George and Michael left, they gave it to"
prompt2 = "Michael, George, and Jess got a ring at the grocery store. When George and Michael left, they gave it to"

In [50]:
print(prompt1)
print(get_completion(prompt1, "gpt-neo-20b"))

print(prompt2)
print(get_completion(prompt2, "gpt-neo-20b"))


Michael, George, and John got a ring at the grocery store. When George and Michael left, they gave it to
gpt-neo-20b
[ John.

John, George, and Michael went to the movies.

When they got home, they found a note from their mother.]

Michael, George, and Jess got a ring at the grocery store. When George and Michael left, they gave it to
gpt-neo-20b
[ Jess.

Jess: "I'm going to give this to my girlfriend, but I'm going to put it in my pocket first."
]



In [100]:
plot_compare_prompt_logprobs(prompt1, prompt2, "John", "Jess")

In [101]:
prompt1 = "Michael, George, and John got a ring at the grocery store. When George left, he gave it to"
prompt2 = "Michael, George, and Jess got a ring at the grocery store. When George left, he gave it to"

In [102]:
plot_compare_prompt_logprobs(prompt1, prompt2, "John", "Jess")

The findings support the hypothesis, I will try different methods to sort out if the machine is indeed using this simple counting heuristic.
## Sub-Hypothesis
If the machine counts the amount of times an entity appears, what would happen:
1. in the case of anaphors and cataphors. <br>
2. if we try playing around this counting behavior. <br>


In [103]:
prompt1 = "Michael Smith and Michael Scott got a ring at the grocery store. Smith decided to give it to"
prompt2 = "Michael Smith and Jess Scott got a ring at the grocery store. Jess decided to give it to"

In [104]:
plot_compare_prompt_logprobs(prompt1, prompt2, "Smith to Scott", "Jess to Michael Smith")

### Emergent Behavior
For some reason, when surnames come into play, the model starts performing sub-optimally. <br>
If the model processes Name Surname tokens as one single entity (which has been demonstrated to not be the case) this would not be a problem. <br>
Indeed adding surnames creates strange behavior, demonstrated below.

In [56]:
prompt1 = "George Scott and Michael Smith got a ring at the grocery store. Michael Smith decided to give it to"
prompt2 = "George Scott and Jess Smith got a ring at the grocery store. Jess Smith decided to give it to"

In [57]:
plot_compare_prompt_logprobs(prompt1, prompt2, "Michael to George", "Jess to George")

Now we check behavior by carefully removing surnames to sort out where is the issue.

In [58]:
prompt1 = "George Scott and Michael got a ring at the grocery store. Michael decided to give it to"
prompt2 = "George and Michael Smith got a ring at the grocery store. Michael Smith decided to give it to"

In [110]:
plot_compare_prompt_logprobs(prompt1, prompt2, "MM-Receiver with surname", "MM-Giver with surname")

In [111]:
prompt1 = "Jess Scott and Michael got a ring at the grocery store. Michael decided to give it to"
prompt2 = "Jess and Michael Smith got a ring at the grocery store. Michael Smith decided to give it to"
prompt3 = "Jess and Michael got a ring at the grocery store. Smith decided to give it to"

In [112]:
plot_two(prompt1, prompt2, prompt2, prompt3, "M-FReceiver with surname", "M-FGiver with surname", "Michael Smith gave it to", "Smith gave it to")

### Findings
the correct answers are there but with little-to-no confidence, this demonstrates that surnames break the understanding the model has about objects and that it does not deeply understand what a surname is. <br>
'Name Surname' configuration cannot serve as coreference or anaphor resolutions.
However, whenever there is a gender difference, meaning that the two objects are of different gender, the probability increases. By examining the two graphs we can also induce that surname breaks the pattern when it is applied to the giver has a surname. e.g. Michael Scott gave it to... <br>
For some reason the model predicts even better when a 3rd agent appears. (prompt 3 at the previous cell). This must be due to memorization and the fact that 'give it to' usually ends with a female token. (saw previously)
***
### backtracking
Check if a pronoun anaphor allows the model to understand the agents at play.

In [113]:
prompt1 = "Jess and Michael got a ring at the grocery store. She decided to give it to"
prompt2 = "Jess and Michael got a ring at the grocery store. He decided to give it to"

In [114]:
plot_compare_prompt_logprobs(prompt1, prompt2, "She gave it to", "He gave it to")

Anaphors seem to work, what about in cases where training set examples are not likely to have trained the model to predict by pattern recognition.
We will check this behavior with the following prompt template.


In [115]:
prompt1 = "Michael the singer and Jess the scientist got a ring at the grocery store. Jess decided to give it to the"
prompt2 = "Michael the singer and Jess the scientist got a ring at the grocery store. The scientist decided to give it to the"
prompt3 = "Michael the singer and Jess the scientist got a ring at the grocery store. Jess decided to give it to"
prompt4 = "Michael the singer and Jess the scientist got a ring at the grocery store. The scientist decided to give it to"

In [116]:
plot_two(prompt1,prompt2,prompt3,prompt4, "Jess decided", "The scientist decided", "Jess decided", "The scientist decided")

The model seems to not understand anaphors in this case. 'the scientist' and 'Jess' cases are both mapped to give it to 'the singer'. However, for some reason, when 'the' is removed from the prompt (to make it harder) the model struggles.
***
### implications
Anaphors will not be useful to the findings and can be discarded.
We can now check the count of each reference.


In [66]:
prompt1 = "Jacob, Matt and I were chilling. Jacob was gone so I gave the coat to"
prompt2 = "Jacob, Matt and I were chilling. Jacob was cold so I gave the coat to"
prompt3 = "Jacob, Jess and I were chilling. Jacob was gone so I gave the coat to"
prompt4 = "Jacob, Jess and I were chilling. Jacob was cold so I gave the coat to"

In [117]:
plot_two(prompt1, prompt2,prompt3,prompt4, "Jacob gone | Matt", "Jacob cold | Matt", "Jacob gone | Jess", "Jacob cold | Jess")


The model seems to predict well the correct token in each case

In [118]:
prompt1 = "Jacob, Matt and I were chilling. I was gone so Matt gave the coat to"
prompt2 = "Jacob, Matt and I were chilling. I was cold so Matt gave the coat to"
prompt3 = "Jacob, Jess and I were chilling. I was gone so Jess gave the coat to"
prompt4 = "Jacob, Jess and I were chilling. I was cold so Jess gave the coat to"

In [119]:
plot_two(prompt1, prompt2,prompt3,prompt4, "MMI gone", "MMI cold", "MFI gone", "MFI cold")

## findings
the model is extremely confident, however it got it wrong for the 'Jacob Matt and I was cold' where it predicts 'Matt gave it to Jacob' instead of 'Matt gave it to me'. <br>
For some reason when one of the agent changes to a female, the probability of 'me' is greater than the probability of 'Jacob'. <br>
Further testing will be made to understand this behavior

In [120]:
prompt1 = "When Michael and I got a ring at the grocery store, Michael decided to give it to"
prompt2 = "When Jess and I got a ring at the grocery store, Jess decided to give it to"
prompt3 = "When Michael and I got a ring at the grocery store, I decided to give it to"
prompt4 = "When you and Michael got a ring at the grocery store, you decided to give it to"

In [121]:
plot_two(prompt1,prompt2,prompt3,prompt4, "Michael I | Me", "Jess I | Me", "Michael I | Michael", "You Michael | Michael")


## furthermore
Now we will test the counting entities hypothesis in greater depth.

In [122]:
prompt1 = "When Jess, Josh, and I got it at the grocery store, I apologized to Josh since I decided to give it to"
prompt2 = "When Mike, Jess, and I got it at the grocery store, I apologized to Jess since I decided to give it to"
prompt3 = "When Jess, Josh, and I got it at the grocery store, Josh apologized to Jess since he decided to give it to"
prompt4 = "When Mike, Jess, and I got it at the grocery store, Jess apologized to Mike since she decided to give it to"

In [123]:
plot_two(prompt1,prompt2,prompt3,prompt4, "Jess Josh I | to Jess", "Mike Jess I | to Mike", "Josh gave to Me", "Jess gave to me")


In [124]:
prompt1 = "When Jess, Josh, and Sophia got it at the grocery store, Sophia apologized to Josh since she decided to give it to"
prompt2 = "When Mike, Jess, and Sophia got it at the grocery store, Sophia apologized to Jess since she decided to give it to"
prompt3 = "When Jess, Josh, and Sophia got it at the grocery store, Josh apologized to Jess since he decided to give it to"
prompt4 = "When Mike, Jess, and Sophia got it at the grocery store, Jess apologized to Mike since she decided to give it to"

In [125]:
plot_two(prompt1,prompt2,prompt3,prompt3, "Soph to Jess", "Soph to Mike", "Josh to Soph", "Jess to Soph")

We can see that by context, when there are two subjects interacting in a phrase such as "A ... B ... A ... " it chooses that the next subject is most likely B.
However, in an overall pattern, in prompt 1, the counts are:
- Jess: 1
- Josh: 2
- I: 2 <br>
***
Therefore Jess is also the most likely, but due to proximitiy this rule is overriden by the closer pattern ABAB. However, the next most likely is the one predicted by the counting pattern.
## hypothesis
If we force the entity counts, we can force a prediction.

In [126]:
prompt1 = "When Jess, Jess, and Mike got Jess at the Jess store, Mike decided to give it to"
prompt2 = "When Jess, Josh, and Mike got it at the grocery store, Mike decided to give it to"


In [127]:
plot_compare_prompt_logprobs(prompt1, prompt2, "Force Mike", "Normal")

In [128]:
prompt1 = "When Simon, Josh, Luke, and Mike were at the store, Mike was with Simon, and Josh was with"
prompt2 = "When Simon, Josh, Luke, and Mike were at the store, Mike was with Simon, and Josh followed"


In [129]:
plot_compare_prompt_logprobs(prompt1, prompt2, "and Josh was with...", "and Josh followed")

While trying to force a prediction, the model seems to show a template behavior, where in certain contexts it tries to group the closest mentioned entities with an AND, and in other it checks a fair count ditribution. <br>
This criterion seems to be especially noticeable with conjunction groups such as A was doing X, and B was doing Y, where it seems like the model imposes a penalty if X == Y.

In [130]:
prompt1 = "When Simon, Josh, Luke, and Mike were at the store, Mike was with Simon, and Josh was with"
prompt2 = "When Simon, Josh, Luke, and Mike were at the store, Mike was with Luke and Simon, and Josh was with"
prompt3 = "When Simon, Josh, Luke, and Mike were at the store, Mike was with Luke and Simon, Simon, and Josh was with"


In [131]:
plot_two(prompt1,prompt2,prompt1,prompt3, "Normal", "Force Mike, 2 counts each", "Normal", "Force Mike, extra simon")
plot_compare_prompt_logprobs(prompt2, prompt3, "Force Mike, 2 counts each.", "Force Mike, extra simon")

In [132]:
prompt1 = "When Simon, Josh, Luke, and Mike were at the store, Mike was with Luke and Simon, and Josh was with"
prompt2 = "When Simon, Josh, Luke, and Mike were at the store, Mike was with Luke and Simon. Luke was with Mike and Simon, Mike, and Josh was with"


In [133]:
plot_compare_prompt_logprobs(prompt1, prompt2, "Normal", "Force Josh Self")

## Summary
The hypothesis is so far confirmed, the model counts the amount of times the entity appears and tries to create an equal distribution of appearances between all entities in context.
### however
This also depends on the context near the prediction site, it has to be one that allows equal distribution, not one that looks for the closest subject token.