# PIZZA OpenAI Examples

Various examples working with the `OpenAIAttributor`. 

Before using this notebook, make sure to set your `OPENAI_API_KEY` evironment variable in a `.env` file.

## Setup

Setting up the attributor and consistent input arguments

In [1]:
from attribution.api_attribution import OpenAIAttributor
from attribution.experiment_logger import ExperimentLogger
from attribution.token_perturbation import (
    FixedPerturbationStrategy,
    NthNearestPerturbationStrategy,
    calculate_chunk_size,
    get_units_from_prompt,
)

# Re-import modified modules without restarting the server
%load_ext autoreload
%autoreload 2

# Load environment variables (OpenAI API key)
%load_ext dotenv
%dotenv

attributor = OpenAIAttributor(max_concurrent_requests=10)
perturbation_strategy = FixedPerturbationStrategy("!")

kwargs = {
    "attribution_strategies": ["cosine"],
    "unit_definition": "word",
    "ignore_output_token_location": True,
}

  from .autonotebook import tqdm as notebook_tqdm


## Short prompts
Comparing full perturbation and hierarchical perturbation methods. We create a new logger for every prompt so that the two methods can be compared directly.

Using hierarchical perturbation we should see a reduction in `num_llm_calls`. This is not always the case for short prompts since perturbing every word may not require many API calls, but as prompt length increases perturbing every word becomes much less efficient.

### Aggregation

Since hierarchical perturbation produces multiple scores per token, we need some kind of aggregation method. By default this is "sum" which produces a saliency map, but here we use "last" which only retains the final score for that token (i.e. at the lowest depth), which is most similar to the full perturbation method.

In [2]:
short_prompts = [
    "What is the capital of the country that was formerly known as Siam? Answer in 1 word.",
    "What is the chemical element with the symbol 'Au'? Answer in 1 word.",
    "Which character in 'Pride and Prejudice' said 'It is a truth universally acknowledged'? Answer in 1 word.",
    "Who holds the record for the most goals in a calendar year in football (soccer)? Answer in 1 word.",
    "In Greek mythology, who is the goddess of wisdom and warfare? Answer in 1 word.",
    "Who directed the film 'Inception'? Answer in 1 word.",
    "Who is known as the 'King of Pop'? Answer in 1 word.",
    "Who was the first female Prime Minister of the United Kingdom? Answer in 1 word.",
    "Who is the co-founder of Microsoft? Answer in 1 word.",
]

for input_str in short_prompts:
    response = await attributor.get_chat_completion(input_str)

    logger = ExperimentLogger()

    await attributor.iterative_perturbation(
        input_str, perturbation_strategy=perturbation_strategy, logger=logger, **kwargs
    )

    await attributor.hierarchical_perturbation(
        input_str,
        init_chunk_size=4,
        perturbation_strategy=perturbation_strategy,
        static_threshold=0.5,
        logger=logger,
        verbosity=True,
        **kwargs,
    )

    logger.print_total_attribution(score_agg="last")
    display(logger.df_experiments)

Sending 10 concurrent requests at a time: 100%|██████████| 2/2 [00:00<00:00,  2.06it/s]


Stage 0: making 10 perturbations
Stage 1: making 8 perturbations
Stage 2: making 4 perturbations


Unnamed: 0,exp_id,attribution_strategy,perturbation_strategy,unit_definition,token_1,token_2,token_3,token_4,token_5,token_6,token_7,token_8,token_9,token_10,token_11,token_12,token_13,token_14,token_15,token_16,token_17
0,1,cosine,fixed,word,What 0.00,is 0.00,the 0.00,capital 0.59,of 0.00,the 0.00,country 0.00,that 0.00,was 0.00,formerly 0.00,known 0.00,as 0.00,Siam? 0.72,Answer 0.00,in 0.00,1 0.00,word. 0.00
1,2,cosine,fixed,word,What 0.00,is 0.00,the 0.00,capital 0.59,of 0.00,the 0.00,country 0.00,that 0.00,was 0.00,formerly 0.00,known 0.00,as 0.00,Siam? 0.68,Answer 0.00,in 0.00,1 0.00,word. 0.00


Unnamed: 0,exp_id,original_input,original_output,perturbation_strategy,unit_definition,duration,num_llm_calls
0,1,What is the capital of the country that was fo...,Bangkok,fixed,word,1.068023,18
1,2,What is the capital of the country that was fo...,Bangkok,fixed,word,3.308814,23


Sending 10 concurrent requests at a time: 100%|██████████| 2/2 [00:00<00:00,  2.44it/s]


Stage 0: making 8 perturbations
Stage 1: making 3 perturbations
Stage 2: making 2 perturbations


Unnamed: 0,exp_id,attribution_strategy,perturbation_strategy,unit_definition,token_1,token_2,token_3,token_4,token_5,token_6,token_7,token_8,token_9,token_10,token_11,token_12,token_13
0,1,cosine,fixed,word,What 0.00,is 0.00,the 0.00,chemical 0.00,element 0.00,with 0.00,the 0.00,symbol 0.00,'Au'? 0.67,Answer 0.00,in 0.00,1 0.00,word. 0.00
1,2,cosine,fixed,word,What 0.00,is 0.00,the 0.00,chemical 0.00,element 0.00,with 0.00,the 0.00,symbol 0.00,'Au'? 0.67,Answer 0.00,in 0.00,1 0.00,word. 0.00


Unnamed: 0,exp_id,original_input,original_output,perturbation_strategy,unit_definition,duration,num_llm_calls
0,1,What is the chemical element with the symbol '...,Gold,fixed,word,0.870346,14
1,2,What is the chemical element with the symbol '...,Gold,fixed,word,4.168339,14


Sending 10 concurrent requests at a time: 100%|██████████| 2/2 [00:00<00:00,  2.04it/s]


Stage 0: making 10 perturbations
Stage 1: making 9 perturbations
Stage 2: making 2 perturbations


Unnamed: 0,exp_id,attribution_strategy,perturbation_strategy,unit_definition,token_1,token_2,token_3,token_4,token_5,token_6,token_7,token_8,token_9,token_10,token_11,token_12,token_13,token_14,token_15,token_16,token_17
0,1,cosine,fixed,word,Which 0.00,character 0.59,in 0.00,'Pride 0.00,and 0.00,Prejudice' 0.00,said 0.00,'It 0.00,is 0.00,a 0.00,truth 0.49,universally 0.00,acknowledged'? 0.00,Answer 0.00,in 0.00,1 0.00,word. 0.00
1,2,cosine,fixed,word,Which 0.00,character 0.59,in 0.00,'Pride 0.00,and 0.00,Prejudice' 0.00,said 0.00,'It 0.00,is 0.00,a 0.00,truth 0.00,universally 0.00,acknowledged'? 0.00,Answer 0.00,in 0.00,1 0.00,word. 0.00


Unnamed: 0,exp_id,original_input,original_output,perturbation_strategy,unit_definition,duration,num_llm_calls
0,1,Which character in 'Pride and Prejudice' said ...,Elizabeth,fixed,word,1.040721,18
1,2,Which character in 'Pride and Prejudice' said ...,Elizabeth,fixed,word,6.20062,22


Sending 10 concurrent requests at a time: 100%|██████████| 2/2 [00:01<00:00,  1.91it/s]


Stage 0: making 11 perturbations


Sending 10 concurrent requests at a time: 100%|██████████| 2/2 [00:01<00:00,  1.23it/s]


Stage 1: making 7 perturbations
Stage 2: making 4 perturbations


Unnamed: 0,exp_id,attribution_strategy,perturbation_strategy,unit_definition,token_1,token_2,token_3,token_4,token_5,token_6,token_7,token_8,token_9,token_10,token_11,token_12,token_13,token_14,token_15,token_16,token_17,token_18,token_19
0,1,cosine,fixed,word,Who -0.00,holds -0.00,the -0.00,record -0.00,for -0.00,the -0.00,most -0.00,goals -0.00,in -0.00,a -0.00,calendar 0.68,year -0.00,in -0.00,football -0.00,(soccer)? -0.00,Answer -0.00,in -0.00,1 -0.00,word. 0.56
1,2,cosine,fixed,word,Who -0.00,holds -0.00,the -0.00,record -0.00,for -0.00,the -0.00,most -0.00,goals -0.00,in -0.00,a -0.00,calendar 0.68,year -0.00,in -0.00,football -0.00,(soccer)? -0.00,Answer -0.00,in -0.00,1 -0.00,word. 0.56


Unnamed: 0,exp_id,original_input,original_output,perturbation_strategy,unit_definition,duration,num_llm_calls
0,1,Who holds the record for the most goals in a c...,Messi,fixed,word,1.122611,20
1,2,Who holds the record for the most goals in a c...,Messi,fixed,word,4.529157,23


Sending 10 concurrent requests at a time: 100%|██████████| 2/2 [00:01<00:00,  1.54it/s]


Stage 0: making 9 perturbations
Stage 1: making 6 perturbations
Stage 2: making 3 perturbations


Unnamed: 0,exp_id,attribution_strategy,perturbation_strategy,unit_definition,token_1,token_2,token_3,token_4,token_5,token_6,token_7,token_8,token_9,token_10,token_11,token_12,token_13,token_14,token_15
0,1,cosine,fixed,word,In 0.00,Greek 0.00,"mythology, 0.00",who 0.00,is 0.00,the 0.00,goddess 0.00,of 0.00,wisdom 0.00,and 0.00,warfare? 0.00,Answer 0.00,in 0.00,1 0.00,word. 0.00
1,2,cosine,fixed,word,In 0.00,Greek 0.00,"mythology, 0.00",who 0.00,is 0.00,the 0.00,goddess 0.00,of 0.00,wisdom 0.00,and 0.00,warfare? 0.00,Answer 0.00,in 0.00,1 0.00,word. 0.00


Unnamed: 0,exp_id,original_input,original_output,perturbation_strategy,unit_definition,duration,num_llm_calls
0,1,"In Greek mythology, who is the goddess of wisd...",Athena,fixed,word,1.37112,16
1,2,"In Greek mythology, who is the goddess of wisd...",Athena,fixed,word,3.021101,19


Stage 0: making 6 perturbations
Stage 1: making 8 perturbations
Stage 2: making 6 perturbations


Unnamed: 0,exp_id,attribution_strategy,perturbation_strategy,unit_definition,token_1,token_2,token_3,token_4,token_5,token_6,token_7,token_8,token_9
0,1,cosine,fixed,word,Who 0.00,directed 0.74,the 0.00,film 0.00,'Inception'? 0.69,Answer 0.00,in 0.00,1 0.00,word. 0.66
1,2,cosine,fixed,word,Who 0.00,directed 0.74,the 0.00,film 0.00,'Inception'? 0.69,Answer 0.00,in 0.00,1 0.00,word. 0.66


Unnamed: 0,exp_id,original_input,original_output,perturbation_strategy,unit_definition,duration,num_llm_calls
0,1,Who directed the film 'Inception'? Answer in 1...,Nolan,fixed,word,1.025173,10
1,2,Who directed the film 'Inception'? Answer in 1...,Nolan,fixed,word,2.723139,21


Sending 10 concurrent requests at a time: 100%|██████████| 2/2 [00:00<00:00,  2.53it/s]


Stage 0: making 7 perturbations
Stage 1: making 4 perturbations
Stage 2: making 2 perturbations


Unnamed: 0,exp_id,attribution_strategy,perturbation_strategy,unit_definition,token_1,token_2,token_3,token_4,token_5,token_6,token_7,token_8,token_9,token_10,token_11,token_12
0,1,cosine,fixed,word,Who 0.00,is 0.00,known 0.00,as 0.00,the 0.00,'King 0.00,of 0.00,Pop'? 0.60,Answer 0.00,in 0.00,1 0.00,word. 0.00
1,2,cosine,fixed,word,Who 0.00,is 0.00,known 0.00,as 0.00,the 0.00,'King 0.00,of 0.00,Pop'? 0.60,Answer 0.00,in 0.00,1 0.00,word. 0.00


Unnamed: 0,exp_id,original_input,original_output,perturbation_strategy,unit_definition,duration,num_llm_calls
0,1,Who is known as the 'King of Pop'? Answer in 1...,Michael,fixed,word,0.839297,13
1,2,Who is known as the 'King of Pop'? Answer in 1...,Michael,fixed,word,3.60702,14


Sending 10 concurrent requests at a time: 100%|██████████| 2/2 [00:00<00:00,  2.02it/s]


Stage 0: making 9 perturbations
Stage 1: making 8 perturbations
Stage 2: making 8 perturbations


Unnamed: 0,exp_id,attribution_strategy,perturbation_strategy,unit_definition,token_1,token_2,token_3,token_4,token_5,token_6,token_7,token_8,token_9,token_10,token_11,token_12,token_13,token_14,token_15
0,1,cosine,fixed,word,Who -0.00,was -0.00,the -0.00,first 0.69,female 0.68,Prime -0.00,Minister -0.00,of -0.00,the -0.00,United -0.00,Kingdom? -0.00,Answer 0.69,in -0.00,1 -0.00,word. -0.00
1,2,cosine,fixed,word,Who -0.00,was -0.00,the -0.00,first 0.69,female 0.68,Prime -0.00,Minister -0.00,of -0.00,the -0.00,United -0.00,Kingdom? -0.00,Answer 0.69,in -0.00,1 -0.00,word. -0.00


Unnamed: 0,exp_id,original_input,original_output,perturbation_strategy,unit_definition,duration,num_llm_calls
0,1,Who was the first female Prime Minister of the...,Margaret,fixed,word,1.058025,16
1,2,Who was the first female Prime Minister of the...,Margaret,fixed,word,2.43625,26


Stage 0: making 6 perturbations
Stage 1: making 4 perturbations
Stage 2: making 2 perturbations


Unnamed: 0,exp_id,attribution_strategy,perturbation_strategy,unit_definition,token_1,token_2,token_3,token_4,token_5,token_6,token_7,token_8,token_9,token_10
0,1,cosine,fixed,word,Who 0.00,is 0.00,the 0.00,co-founder 0.64,of 0.00,Microsoft? 0.50,Answer 0.00,in 0.00,1 0.00,word. 0.00
1,2,cosine,fixed,word,Who 0.00,is 0.00,the 0.00,co-founder 0.00,of 0.00,Microsoft? 0.50,Answer 0.00,in 0.00,1 0.00,word. 0.00


Unnamed: 0,exp_id,original_input,original_output,perturbation_strategy,unit_definition,duration,num_llm_calls
0,1,Who is the co-founder of Microsoft? Answer in ...,Bill,fixed,word,0.752888,11
1,2,Who is the co-founder of Microsoft? Answer in ...,Bill,fixed,word,1.898483,13


## Longer prompts
Here we use hierarchical perturbation only, and we can see from the resulting `num_llm_calls` that it is much more efficient than perturbing every unit in the prompt.

In [3]:
longer_prompts = [
    "In J.K. Rowling's Harry Potter series, the spell used to conjure a Patronus is considered highly advanced and can only be performed by skilled witches and wizards. The form that a Patronus takes can vary widely and is often influenced by the caster's personality and experiences. What form does Snape's Patronus take? Answer in 1 word.",
    "In the field of astronomy, there is a phenomenon where the light from a star is bent and magnified by the gravitational field of another object, such as a galaxy or black hole, that lies between the star and the observer. This effect was first predicted by Einstein's theory of general relativity. What is this phenomenon called? Answer in 1 word.",
    "Located in South America, there is a vast river that flows through Brazil, Peru, and several other countries. It is the largest river by discharge volume of water in the world and is often associated with the rainforest of the same name. What is the name of this river? Answer in 1 word.",
    "In Norse mythology, there is a hammer wielded by the god Thor, which is renowned for its immense power and is said to be capable of leveling mountains. This hammer is also a symbol of protection and blessing. What is the name of Thor's hammer? Answer in 1 word.",
    "In the movie 'The Matrix,' the protagonist is a computer hacker who learns about the true nature of his reality and his role in the war against its controllers. He is given a choice between two pills: a red pill that reveals the truth, and a blue pill that returns him to his normal life. What is the name of the protagonist? Answer in 1 word.",
    "There is a famous painting by Vincent van Gogh that depicts a night sky filled with swirling clouds, stars, and a bright crescent moon. This painting is one of his most well-known works and was created while he was in a mental asylum in Saint-Rémy-de-Provence. What is the title of this painting? Answer in 1 word.",
    "In classical music, there is a composer who is renowned for his symphonies, concertos, and sonatas. Born in Salzburg in 1756, he began composing music at a very young age and created over 600 works during his lifetime. What is the last name of this composer? Answer in 1 word.",
    "In the realm of computer programming, there is a widely used language that was developed by Guido van Rossum and first released in 1991. It emphasizes code readability and its syntax allows programmers to express concepts in fewer lines of code. What is the name of this programming language? Answer in 1 word.",
    "In the study of genetics, there is a molecule that carries the genetic instructions used in the growth, development, functioning, and reproduction of all known living organisms and many viruses. This molecule is structured as a double helix and was first described by Watson and Crick in 1953. What is the abbreviation for this molecule? Answer in 1 word.",
]

logger = ExperimentLogger()
for input_str in longer_prompts:
    units, _ = get_units_from_prompt(
        input_str, attributor.tokenizer, unit_definition=kwargs["unit_definition"]
    )
    print("Running prompt:", input_str, f"Total units: {len(units)}", sep="\n")
    await attributor.hierarchical_perturbation(
        input_str,
        init_chunk_size=8,
        perturbation_strategy=perturbation_strategy,
        static_threshold=0.5,
        logger=logger,
        **kwargs,
    )

logger.print_total_attribution()
display(logger.df_experiments)

Running prompt:
In J.K. Rowling's Harry Potter series, the spell used to conjure a Patronus is considered highly advanced and can only be performed by skilled witches and wizards. The form that a Patronus takes can vary widely and is often influenced by the caster's personality and experiences. What form does Snape's Patronus take? Answer in 1 word.
Total units: 56


Sending 10 concurrent requests at a time: 100%|██████████| 2/2 [00:03<00:00,  1.57s/it]


Running prompt:
In the field of astronomy, there is a phenomenon where the light from a star is bent and magnified by the gravitational field of another object, such as a galaxy or black hole, that lies between the star and the observer. This effect was first predicted by Einstein's theory of general relativity. What is this phenomenon called? Answer in 1 word.
Total units: 61


Sending 10 concurrent requests at a time: 100%|██████████| 2/2 [00:03<00:00,  1.52s/it]


Running prompt:
Located in South America, there is a vast river that flows through Brazil, Peru, and several other countries. It is the largest river by discharge volume of water in the world and is often associated with the rainforest of the same name. What is the name of this river? Answer in 1 word.
Total units: 53


Sending 10 concurrent requests at a time: 100%|██████████| 2/2 [00:00<00:00,  2.08it/s]


Running prompt:
In Norse mythology, there is a hammer wielded by the god Thor, which is renowned for its immense power and is said to be capable of leveling mountains. This hammer is also a symbol of protection and blessing. What is the name of Thor's hammer? Answer in 1 word.
Total units: 49


Sending 10 concurrent requests at a time: 100%|██████████| 2/2 [00:01<00:00,  1.71it/s]


Running prompt:
In the movie 'The Matrix,' the protagonist is a computer hacker who learns about the true nature of his reality and his role in the war against its controllers. He is given a choice between two pills: a red pill that reveals the truth, and a blue pill that returns him to his normal life. What is the name of the protagonist? Answer in 1 word.
Total units: 66


Sending 10 concurrent requests at a time: 100%|██████████| 2/2 [00:01<00:00,  1.08it/s]


Running prompt:
There is a famous painting by Vincent van Gogh that depicts a night sky filled with swirling clouds, stars, and a bright crescent moon. This painting is one of his most well-known works and was created while he was in a mental asylum in Saint-Rémy-de-Provence. What is the title of this painting? Answer in 1 word.
Total units: 56


Sending 10 concurrent requests at a time: 100%|██████████| 2/2 [00:01<00:00,  1.28it/s]


Running prompt:
In classical music, there is a composer who is renowned for his symphonies, concertos, and sonatas. Born in Salzburg in 1756, he began composing music at a very young age and created over 600 works during his lifetime. What is the last name of this composer? Answer in 1 word.
Total units: 50


Sending 10 concurrent requests at a time: 100%|██████████| 2/2 [00:01<00:00,  1.63it/s]


Running prompt:
In the realm of computer programming, there is a widely used language that was developed by Guido van Rossum and first released in 1991. It emphasizes code readability and its syntax allows programmers to express concepts in fewer lines of code. What is the name of this programming language? Answer in 1 word.
Total units: 53


Sending 10 concurrent requests at a time: 100%|██████████| 2/2 [00:00<00:00,  2.49it/s]


Running prompt:
In the study of genetics, there is a molecule that carries the genetic instructions used in the growth, development, functioning, and reproduction of all known living organisms and many viruses. This molecule is structured as a double helix and was first described by Watson and Crick in 1953. What is the abbreviation for this molecule? Answer in 1 word.
Total units: 59


Sending 10 concurrent requests at a time: 100%|██████████| 2/2 [00:01<00:00,  1.51it/s]


Unnamed: 0,exp_id,attribution_strategy,perturbation_strategy,unit_definition,token_1,token_2,token_3,token_4,token_5,token_6,token_7,token_8,token_9,token_10,token_11,token_12,token_13,token_14,token_15,token_16,token_17,token_18,token_19,token_20,token_21,token_22,token_23,token_24,token_25,token_26,token_27,token_28,token_29,token_30,token_31,token_32,token_33,token_34,token_35,token_36,token_37,token_38,token_39,token_40,token_41,token_42,token_43,token_44,token_45,token_46,token_47,token_48,token_49,token_50,token_51,token_52,token_53,token_54,token_55,token_56,token_57,token_58,token_59,token_60,token_61,token_62,token_63,token_64,token_65,token_66
0,1,cosine,fixed,word,In -0.00,J.K. -0.00,Rowling's -0.00,Harry -0.00,Potter -0.00,"series, -0.00",the -0.00,spell -0.00,used -0.00,to -0.00,conjure -0.00,a -0.00,Patronus -0.00,is -0.00,considered -0.00,highly -0.00,advanced -0.00,and -0.00,can -0.00,only -0.00,be -0.00,performed -0.00,by -0.00,skilled -0.00,witches -0.00,and -0.00,wizards. -0.00,The -0.00,form -0.00,that -0.00,a -0.00,Patronus -0.00,takes -0.00,can -0.00,vary -0.00,widely -0.00,and -0.00,is -0.00,often -0.00,influenced -0.00,by -0.00,the -0.00,caster's -0.00,personality -0.00,and 0.03,experiences. 0.03,What 0.03,form 0.03,does 0.12,Snape's 0.26,Patronus 0.07,take? 0.07,Answer 0.06,in 0.06,1 0.15,word. 0.34,,,,,,,,,,
1,2,cosine,fixed,word,In -0.00,the -0.00,field -0.00,of -0.00,"astronomy, -0.00",there -0.00,is -0.00,a -0.00,phenomenon -0.00,where -0.00,the -0.00,light -0.00,from -0.00,a -0.00,star -0.00,is -0.00,bent -0.00,and -0.00,magnified -0.00,by -0.00,the -0.00,gravitational -0.00,field -0.00,of -0.00,another -0.00,"object, -0.00",such -0.00,as -0.00,a -0.00,galaxy -0.00,or -0.00,black -0.00,"hole, -0.00",that -0.00,lies -0.00,between -0.00,the -0.00,star -0.00,and -0.00,the -0.00,observer. -0.00,This -0.00,effect -0.00,was -0.00,first -0.00,predicted -0.00,by -0.00,Einstein's -0.00,theory -0.00,of -0.00,general -0.00,relativity. -0.00,What -0.00,is -0.00,this -0.00,phenomenon -0.00,called? 0.03,Answer 0.03,in 0.03,1 0.04,word. 0.04,,,,,
2,3,cosine,fixed,word,Located 0.00,in 0.00,South 0.00,"America, 0.00",there 0.00,is 0.00,a 0.00,vast 0.00,river 0.00,that 0.00,flows 0.00,through 0.00,"Brazil, 0.00","Peru, 0.00",and 0.00,several 0.00,other 0.00,countries. 0.00,It 0.00,is 0.00,the 0.00,largest 0.00,river 0.00,by 0.00,discharge 0.00,volume 0.00,of 0.00,water 0.00,in 0.00,the 0.00,world 0.00,and 0.00,is 0.00,often 0.00,associated 0.00,with 0.00,the 0.00,rainforest 0.00,of 0.00,the 0.00,same 0.00,name. 0.00,What 0.00,is 0.00,the 0.00,name 0.00,of 0.00,this 0.00,river? 0.01,Answer 0.01,in 0.02,1 0.04,word. 0.04,,,,,,,,,,,,,
3,4,cosine,fixed,word,In -0.00,Norse -0.00,"mythology, -0.00",there -0.00,is -0.00,a -0.00,hammer -0.00,wielded -0.00,by -0.00,the -0.00,god -0.00,"Thor, -0.00",which -0.00,is -0.00,renowned -0.00,for -0.00,its -0.00,immense -0.00,power -0.00,and -0.00,is -0.00,said -0.00,to -0.00,be -0.00,capable -0.00,of -0.00,leveling -0.00,mountains. -0.00,This -0.00,hammer -0.00,is -0.00,also -0.00,a -0.00,symbol -0.00,of -0.00,protection -0.00,and -0.00,blessing. -0.00,What -0.00,is -0.00,the -0.00,name -0.00,of -0.00,Thor's -0.00,hammer? 0.01,Answer 0.01,in 0.01,1 0.01,word. 0.01,,,,,,,,,,,,,,,,,
4,5,cosine,fixed,word,In 0.00,the 0.00,movie 0.00,'The 0.00,"Matrix,' 0.00",the 0.00,protagonist 0.00,is 0.00,a 0.00,computer 0.00,hacker 0.00,who 0.00,learns 0.00,about 0.00,the 0.00,true 0.00,nature 0.00,of 0.00,his 0.00,reality 0.00,and 0.00,his 0.00,role 0.00,in 0.00,the 0.00,war 0.00,against 0.00,its 0.00,controllers. 0.00,He 0.00,is 0.00,given 0.00,a 0.00,choice 0.00,between 0.00,two 0.00,pills: 0.00,a 0.00,red 0.00,pill 0.00,that 0.00,reveals 0.00,the 0.00,"truth, 0.00",and 0.00,a 0.00,blue 0.00,pill 0.00,that 0.00,returns 0.00,him 0.00,to 0.00,his 0.00,normal 0.00,life. 0.00,What 0.00,is 0.03,the 0.03,name 0.03,of 0.03,the 0.05,protagonist? 0.21,Answer 0.10,in 0.08,1 0.06,word. 0.06
5,6,cosine,fixed,word,There -0.00,is -0.00,a -0.00,famous -0.00,painting -0.00,by -0.00,Vincent -0.00,van -0.00,Gogh -0.00,that -0.00,depicts -0.00,a -0.00,night -0.00,sky -0.00,filled -0.00,with -0.00,swirling 0.01,"clouds, 0.01","stars, 0.01",and 0.01,a 0.01,bright 0.01,crescent 0.01,moon. 0.01,This -0.00,painting -0.00,is -0.00,one -0.00,of -0.00,his -0.00,most -0.00,well-known -0.00,works -0.00,and -0.00,was -0.00,created -0.00,while -0.00,he -0.00,was -0.00,in -0.00,a -0.00,mental -0.00,asylum -0.00,in -0.00,Saint-RÃ©my-de-Provence. -0.00,What -0.00,is -0.00,the -0.00,title 0.02,of 0.02,this 0.02,painting? 0.02,Answer 0.04,in 0.04,1 0.04,word. 0.15,,,,,,,,,,
6,7,cosine,fixed,word,In 0.00,classical 0.00,"music, 0.00",there 0.00,is 0.00,a 0.00,composer 0.00,who 0.00,is 0.00,renowned 0.00,for 0.00,his 0.00,"symphonies, 0.00","concertos, 0.00",and 0.00,sonatas. 0.00,Born 0.00,in 0.00,Salzburg 0.00,in 0.00,"1756, 0.00",he 0.00,began 0.00,composing 0.00,music 0.00,at 0.00,a 0.00,very 0.00,young 0.00,age 0.00,and 0.00,created 0.00,over 0.00,600 0.00,works 0.00,during 0.00,his 0.00,lifetime. 0.00,What 0.00,is 0.00,the 0.00,last 0.00,name 0.00,of 0.00,this 0.00,composer? 0.00,Answer 0.00,in 0.00,1 0.00,word. 0.00,,,,,,,,,,,,,,,,
7,8,cosine,fixed,word,In -0.00,the -0.00,realm -0.00,of -0.00,computer -0.00,"programming, -0.00",there -0.00,is -0.00,a -0.00,widely -0.00,used -0.00,language -0.00,that -0.00,was -0.00,developed -0.00,by -0.00,Guido -0.00,van -0.00,Rossum -0.00,and -0.00,first -0.00,released -0.00,in -0.00,1991. -0.00,It -0.00,emphasizes -0.00,code -0.00,readability -0.00,and -0.00,its -0.00,syntax -0.00,allows -0.00,programmers -0.00,to -0.00,express -0.00,concepts -0.00,in -0.00,fewer -0.00,lines -0.00,of -0.00,code. -0.00,What -0.00,is -0.00,the -0.00,name -0.00,of -0.00,this -0.00,programming -0.00,language? 0.02,Answer 0.02,in 0.03,1 0.03,word. 0.03,,,,,,,,,,,,,
8,9,cosine,fixed,word,In 0.00,the 0.00,study 0.00,of 0.00,"genetics, 0.00",there 0.00,is 0.00,a 0.00,molecule 0.00,that 0.00,carries 0.00,the 0.00,genetic 0.00,instructions 0.00,used 0.00,in 0.00,the 0.00,"growth, 0.00","development, 0.00","functioning, 0.00",and 0.00,reproduction 0.00,of 0.00,all 0.00,known 0.00,living 0.00,organisms 0.00,and 0.00,many 0.00,viruses. 0.00,This 0.00,molecule 0.00,is 0.00,structured 0.00,as 0.00,a 0.00,double 0.00,helix 0.00,and 0.00,was 0.00,first 0.00,described 0.00,by 0.00,Watson 0.00,and 0.00,Crick 0.00,in 0.00,1953. 0.00,What 0.00,is 0.00,the 0.00,abbreviation 0.00,for 0.01,this 0.01,molecule? 0.01,Answer 0.02,in 0.04,1 0.03,word. 0.03,,,,,,,


Unnamed: 0,exp_id,original_input,original_output,perturbation_strategy,unit_definition,duration,num_llm_calls
0,1,"In J.K. Rowling's Harry Potter series, the spe...",Doe,fixed,word,7.640085,33
1,2,"In the field of astronomy, there is a phenomen...",Gravitational lensing,fixed,word,6.231128,29
2,3,"Located in South America, there is a vast rive...",Amazon,fixed,word,3.192382,22
3,4,"In Norse mythology, there is a hammer wielded ...",Mjölnir,fixed,word,1.522724,15
4,5,"In the movie 'The Matrix,' the protagonist is ...",Neo,fixed,word,5.632369,35
5,6,There is a famous painting by Vincent van Gogh...,Starry Night,fixed,word,4.62134,28
6,7,"In classical music, there is a composer who is...",Mozart,fixed,word,1.498494,15
7,8,"In the realm of computer programming, there is...",Python,fixed,word,3.396451,31
8,9,"In the study of genetics, there is a molecule ...",DNA,fixed,word,3.656755,29


### 

### Attribution matrices

We can also inspect the attribution matrices for different experiments.

In [17]:
for exp_id in (6, 7):
    logger.print_attribution_matrix(exp_id=exp_id)

## Calculating chunksize
Instead of specifying an initial chunksize directly, one can be compute based on a fraction or number of desired windows.

In [16]:
mixed_prompts = [
    "Who painted the 'Mona Lisa'? Answer in 1 word.",
    "During the American Civil War, there was a significant battle fought from July 1 to July 3, 1863, which is often considered the turning point of the war. This battle took place in Pennsylvania and ended with a decisive victory for the Union forces. What is the name of this battle? Answer in 1 word.",
]

logger = ExperimentLogger()
for i, input_str in enumerate(mixed_prompts):
    response = await attributor.get_chat_completion(input_str)

    units, _ = get_units_from_prompt(input_str, attributor.tokenizer, unit_definition="word")
    chunksize = calculate_chunk_size(len(units), fraction=0.25)
    print(f"Using chunksize: {chunksize}")

    await attributor.hierarchical_perturbation(
        input_str,
        init_chunk_size=chunksize,
        perturbation_strategy=perturbation_strategy,
        static_threshold=0.5,
        logger=logger,
        **kwargs,
    )
    logger.print_attribution_matrix(exp_id=i + 1)

logger.print_total_attribution()
display(logger.df_experiments)

Using chunksize: 2


Unnamed: 0,Da (0),Vin (1),ci (2)
Who (0),0.110551,0.510821,0.127954
painted (1),0.221101,0.222025,0.255908
the (2),0.221101,0.222025,0.255908
'Mona (3),0.22879,0.239885,0.252986
Lisa'? (4),0.333925,0.35689,0.37406
Answer (5),-0.0,0.149928,0.0
in (6),0.110551,0.410869,0.127954
1 (7),0.182322,0.111012,0.127954
word. (8),0.32297,-0.0,0.0


Using chunksize: 13


Sending 10 concurrent requests at a time: 100%|██████████| 2/2 [00:05<00:00,  2.73s/it]


ValueError: Index contains duplicate entries, cannot reshape

## Answer only in the question prompts
Here the answer we are looking for lies in the prompt itself (no general knowledge questions). These prompts are particularly sensitive to perturbations, and so we can try different perturbation strategies to mess with the model.

In [None]:
perturbation_strategies = [
    FixedPerturbationStrategy(),
    NthNearestPerturbationStrategy(n=0),
    NthNearestPerturbationStrategy(n=-1),
]

In [None]:
self_fulfilling_prompts = [
    "The clock shows 9:47 PM. How many minutes to 10PM?",
    "Maria is 37 years old today. How many years till she's 50?",
    "John has 83 books on his shelf. If he buys 17 more books, how many books will he have in total?",
    "The building is 132 meters tall. How many centimeters tall is the building? No explanation",
    "The package weighs 8.6 kilograms. How many grams does the package weigh?",
    "Jack has 12 teaspoons of sugar. How many tablespoons of sugar does he have?",
    "Alex saved $363 from his birthday gifts. If he spends $45 on a new game, how much money will he have left? No explanation.",
    "The building is 132 meters tall. How many centimeters tall is the building? No explanation.",
    "The thermometer reads 23 degrees Celsius. What is the temperature in Fahrenheit? No explanation.",
    "There are 12 eggs in a dozen. If you use 5 eggs, how many eggs are left?",
]

logger = ExperimentLogger()

for input_str in self_fulfilling_prompts:
    print(f"Running prompt: '{input_str}'")

    for perturb_strat in perturbation_strategies:
        print(f"Using perturbation strategy: {perturb_strat}")
        await attributor.hierarchical_perturbation(
            input_str,
            init_chunk_size=4,
            perturbation_strategy=perturb_strat,
            static_threshold=0.5,
            logger=logger,
            **kwargs,
        )

display(logger.df_experiments)
logger.print_total_attribution()

Running prompt: 'The clock shows 9:47 PM. How many minutes to 10PM?'
Using perturbation strategy: fixed
Stage 0: making 3 perturbations
Stage 1: making 2 perturbations
Using perturbation strategy: nth_nearest (n=0)
Stage 0: making 3 perturbations
Stage 1: making 4 perturbations
Stage 2: making 4 perturbations
Using perturbation strategy: nth_nearest (n=-1)
Stage 0: making 3 perturbations
Stage 1: making 2 perturbations
Stage 2: making 4 perturbations
Running prompt: 'Maria is 37 years old today. How many years till she's 50?'
Using perturbation strategy: fixed
Stage 0: making 4 perturbations
Stage 1: making 2 perturbations
Using perturbation strategy: nth_nearest (n=0)
Stage 0: making 4 perturbations
Stage 1: making 4 perturbations
Stage 2: making 2 perturbations
Using perturbation strategy: nth_nearest (n=-1)
Stage 0: making 4 perturbations
Stage 1: making 4 perturbations
Running prompt: 'John has 83 books on his shelf. If he buys 17 more books, how many books will he have in total?'


Unnamed: 0,exp_id,original_input,original_output,perturbation_strategy,unit_definition,duration,num_llm_calls
0,1,The clock shows 9:47 PM. How many minutes to 1...,There are 13 minutes left until 10:00 PM.,fixed,word,2.595192,6
1,2,The clock shows 9:47 PM. How many minutes to 1...,13 minutes.,nth_nearest (n=0),word,9.242892,12
2,3,The clock shows 9:47 PM. How many minutes to 1...,There are 13 minutes left until 10:00 PM.,nth_nearest (n=-1),word,8.281744,10
3,4,Maria is 37 years old today. How many years ti...,Maria is 13 years away from turning 50.,fixed,word,2.829744,7
4,5,Maria is 37 years old today. How many years ti...,Maria is 13 years away from turning 50.,nth_nearest (n=0),word,8.46067,11
5,6,Maria is 37 years old today. How many years ti...,Maria is 13 years away from turning 50.,nth_nearest (n=-1),word,7.752234,9
6,7,John has 83 books on his shelf. If he buys 17 ...,John will have 100 books in total. \n\n83 + 17...,fixed,word,9.266652,17
7,8,John has 83 books on his shelf. If he buys 17 ...,John will have 100 books in total. \n\n83 + 17...,nth_nearest (n=0),word,7.658005,9
8,9,John has 83 books on his shelf. If he buys 17 ...,John will have 100 books in total. \n\n83 + 17...,nth_nearest (n=-1),word,12.859274,17
9,10,The building is 132 meters tall. How many cent...,"13,200 centimeters",fixed,word,4.005603,16


Unnamed: 0,exp_id,attribution_strategy,perturbation_strategy,unit_definition,token_1,token_2,token_3,token_4,token_5,token_6,token_7,token_8,token_9,token_10,token_11,token_12,token_13,token_14,token_15,token_16,token_17,token_18,token_19,token_20,token_21,token_22,token_23,token_24
0,1,cosine,fixed,word,The 0.73,clock 0.85,shows 0.04,9:47 0.04,PM. 0.04,How 0.04,many 0.04,minutes 0.04,to 0.04,10PM? 0.04,,,,,,,,,,,,,,
1,2,cosine,nth_nearest (n=0),word,The 0.15,clock 0.15,shows 0.00,9:47 0.00,PM. 0.00,How 0.00,many 0.40,minutes 0.17,to 0.20,10PM? 0.11,,,,,,,,,,,,,,
2,3,cosine,nth_nearest (n=-1),word,The 0.06,clock 0.06,shows 0.02,9:47 0.02,PM. 0.02,How 0.02,many 0.23,minutes 0.79,to 0.98,10PM? 0.86,,,,,,,,,,,,,,
3,4,cosine,fixed,word,Maria 0.68,is 0.34,37 0.06,years 0.06,old 0.06,today. 0.06,How 0.08,many 0.08,years 0.08,till 0.08,she's 0.13,50? 0.13,,,,,,,,,,,,
4,5,cosine,nth_nearest (n=0),word,Maria 0.00,is 0.00,37 0.03,years 0.03,old 0.03,today. 0.03,How 0.05,many 0.05,years 0.15,till 0.15,she's 0.00,50? 0.00,,,,,,,,,,,,
5,6,cosine,nth_nearest (n=-1),word,Maria 0.86,is 0.40,37 0.05,years 0.05,old 0.05,today. 0.05,How 0.08,many 0.08,years 0.08,till 0.08,she's 0.43,50? 0.52,,,,,,,,,,,,
6,7,cosine,fixed,word,John 0.24,has 0.08,83 0.42,books 0.24,on 0.09,his 0.09,shelf. 0.00,If 0.00,he 0.00,buys 0.00,17 0.40,more 0.17,"books, 0.11",how 0.11,many 0.02,books 0.02,will 0.02,he 0.02,have 0.02,in 0.02,total? 0.02,,,
7,8,cosine,nth_nearest (n=0),word,John 0.14,has 0.04,83 0.02,books 0.02,on 0.02,his 0.02,shelf. 0.02,If 0.02,he 0.02,buys 0.02,17 0.02,more 0.02,"books, 0.02",how 0.02,many 0.02,books 0.02,will 0.02,he 0.02,have 0.00,in 0.00,total? 0.00,,,
8,9,cosine,nth_nearest (n=-1),word,John 0.23,has 0.07,83 0.04,books 0.04,on 0.04,his 0.04,shelf. 0.02,If 0.02,he 0.02,buys 0.02,17 0.37,more 0.18,"books, 0.11",how 0.11,many 0.00,books 0.00,will 0.00,he 0.00,have 0.11,in 0.27,total? 0.33,,,
9,10,cosine,fixed,word,The -0.00,building -0.00,is 0.25,132 0.59,meters 0.48,tall. 0.21,How 0.13,many 0.13,centimeters 0.93,tall 0.40,is 0.02,the 0.02,building? 0.02,No 0.02,explanation 0.07,,,,,,,,,
