# PIZZA OpenAI Examples

Various examples working with the `OpenAIAttributor`. 

Before using this notebook, make sure to set your `OPENAI_API_KEY` evironment variable in a `.env` file.

## Setup

Setting up the attributor and consistent input arguments

In [41]:
from attribution.api_attribution import OpenAIAttributor
from attribution.experiment_logger import ExperimentLogger
from attribution.token_perturbation import (
    FixedPerturbationStrategy,
    NthNearestPerturbationStrategy,
    calculate_chunk_size,
    get_units_from_prompt,
)

# Re-import modified modules without restarting the server
%load_ext autoreload
%autoreload 2

# Load environment variables (OpenAI API key)
%load_ext dotenv
%dotenv

attributor = OpenAIAttributor(request_chunksize=10)
perturbation_strategy = FixedPerturbationStrategy("")

kwargs = {
    "attribution_strategies": ["cosine"],
    "perturb_word_wise": True,
    "ignore_output_token_location": True,
}

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload
The dotenv extension is already loaded. To reload it, use:
  %reload_ext dotenv


## Short prompts
Comparing full perturbation and hierarchical perturbation methods.

Since hierarchical perturbation produces multiple scores per token, we need some kind of aggregation method. By default this is "sum" which produces a saliency map, but here we use "last" which only retains the final score for that token (i.e. at the lowest depth), which is most similar to the full perturbation method.

In [26]:
short_prompts = [
    "What is the capital of the country that was formerly known as Siam? Answer in 1 word.",
    "What is the chemical element with the symbol 'Au'? Answer in 1 word.",
    "Which character in 'Pride and Prejudice' said 'It is a truth universally acknowledged'? Answer in 1 word.",
    "Who holds the record for the most goals in a calendar year in football (soccer)? Answer in 1 word.",
    "In Greek mythology, who is the goddess of wisdom and warfare? Answer in 1 word.",
    "Who directed the film 'Inception'? Answer in 1 word.",
    "Who is known as the 'King of Pop'? Answer in 1 word.",
    "Who was the first female Prime Minister of the United Kingdom? Answer in 1 word.",
    "Who is the co-founder of Microsoft? Answer in 1 word.",
]

for input_str in short_prompts:
    response = await attributor.get_chat_completion(input_str)

    logger = ExperimentLogger()

    await attributor.compute_attributions(
        input_str, perturbation_strategy=perturbation_strategy, logger=logger, **kwargs
    )

    saliency = await attributor.hierarchical_perturbation(
        input_str,
        init_chunk_size=4,
        perturbation_strategy=perturbation_strategy,
        logger=logger,
        verbose=True,
        **kwargs,
    )

    logger.print_sentence_attribution(score_agg="last")
    display(logger.df_experiments)

Sending 10 concurrent requests at a time: 100%|██████████| 2/2 [00:01<00:00,  1.71it/s]


Stage 0: making 5 perturbations
Masked out tokens/words:
['What is']
['the capital of the']
['country that was formerly']
['known as Siam? Answer']
['in 1 word.']
Stage 1: making 6 perturbations
Masked out tokens/words:
['1 word.']
['in']
['Siam? Answer']
['known as']
['of the']
['the capital']
Stage 2: making 6 perturbations
Masked out tokens/words:
['Answer']
['Siam?']
['the']
['of']
['capital']
['the']


Unnamed: 0,exp_id,attribution_strategy,perturbation_strategy,perturb_word_wise,token_1,token_2,token_3,token_4,token_5,token_6,token_7,token_8,token_9,token_10,token_11,token_12,token_13,token_14,token_15,token_16,token_17
0,1,cosine,fixed,True,What 0.00,is 0.00,the 0.00,capital 0.59,of 0.00,the 0.00,country 0.00,that 0.00,was 0.00,formerly 0.00,known 0.00,as 0.00,Siam? 0.68,Answer 0.00,in 0.00,1 0.00,word. 0.00
1,2,cosine,fixed,True,What 0.00,is 0.00,the 0.00,capital 0.59,of 0.00,the 0.00,country 0.00,that 0.00,was 0.00,formerly 0.00,known 0.00,as 0.00,Siam? 0.66,Answer 0.00,in 0.00,1 0.00,word. 0.00


Unnamed: 0,exp_id,original_input,original_output,perturbation_strategy,perturb_word_wise,duration,num_llm_calls
0,1,What is the capital of the country that was fo...,Bangkok,fixed,True,1.314548,18
1,2,What is the capital of the country that was fo...,Bangkok,fixed,True,4.016971,18


Sending 10 concurrent requests at a time: 100%|██████████| 2/2 [00:00<00:00,  2.01it/s]


Stage 0: making 4 perturbations
Masked out tokens/words:
['What is']
['the chemical element with']
["the symbol 'Au'? Answer"]
['in 1 word.']
Stage 1: making 2 perturbations
Masked out tokens/words:
["'Au'? Answer"]
['the symbol']
Stage 2: making 2 perturbations
Masked out tokens/words:
['Answer']
["'Au'?"]


Unnamed: 0,exp_id,attribution_strategy,perturbation_strategy,perturb_word_wise,token_1,token_2,token_3,token_4,token_5,token_6,token_7,token_8,token_9,token_10,token_11,token_12,token_13
0,1,cosine,fixed,True,What 0.00,is 0.00,the 0.00,chemical 0.00,element 0.00,with 0.00,the 0.00,symbol 0.00,'Au'? 0.64,Answer 0.00,in 0.00,1 0.00,word. 0.00
1,2,cosine,fixed,True,What 0.00,is 0.00,the 0.00,chemical 0.00,element 0.00,with 0.00,the 0.00,symbol 0.00,'Au'? 0.64,Answer 0.00,in 0.00,1 0.00,word. 0.00


Unnamed: 0,exp_id,original_input,original_output,perturbation_strategy,perturb_word_wise,duration,num_llm_calls
0,1,What is the chemical element with the symbol '...,Gold,fixed,True,1.080197,14
1,2,What is the chemical element with the symbol '...,Gold,fixed,True,1.95649,9


Sending 10 concurrent requests at a time: 100%|██████████| 2/2 [00:01<00:00,  1.59it/s]


Stage 0: making 5 perturbations
Masked out tokens/words:
['Which character']
["in 'Pride and Prejudice'"]
["said 'It is a"]
["truth universally acknowledged'? Answer"]
['in 1 word.']
Stage 1: making 4 perturbations
Masked out tokens/words:
['1 word.']
['in']
["acknowledged'? Answer"]
['truth universally']
Stage 2: making 2 perturbations
Masked out tokens/words:
['word.']
['1']


Unnamed: 0,exp_id,attribution_strategy,perturbation_strategy,perturb_word_wise,token_1,token_2,token_3,token_4,token_5,token_6,token_7,token_8,token_9,token_10,token_11,token_12,token_13,token_14,token_15,token_16,token_17
0,1,cosine,fixed,True,Which -0.00,character 0.59,in -0.00,'Pride -0.00,and -0.00,Prejudice' -0.00,said -0.00,'It -0.00,is -0.00,a -0.00,truth 0.49,universally -0.00,acknowledged'? -0.00,Answer -0.00,in -0.00,1 -0.00,word. 0.64
1,2,cosine,fixed,True,Which -0.00,character -0.00,in -0.00,'Pride -0.00,and -0.00,Prejudice' -0.00,said -0.00,'It -0.00,is -0.00,a -0.00,truth -0.00,universally -0.00,acknowledged'? -0.00,Answer -0.00,in -0.00,1 -0.00,word. 0.64


Unnamed: 0,exp_id,original_input,original_output,perturbation_strategy,perturb_word_wise,duration,num_llm_calls
0,1,Which character in 'Pride and Prejudice' said ...,Elizabeth,fixed,True,1.371346,18
1,2,Which character in 'Pride and Prejudice' said ...,Elizabeth,fixed,True,3.434932,12


Sending 10 concurrent requests at a time: 100%|██████████| 2/2 [00:01<00:00,  1.00it/s]


Stage 0: making 6 perturbations
Masked out tokens/words:
['Who holds']
['the record for the']
['most goals in a']
['calendar year in football']
['(soccer)? Answer in 1']
['word.']
Stage 1: making 4 perturbations
Masked out tokens/words:
['in 1']
['(soccer)? Answer']
['in football']
['calendar year']
Stage 2: making 4 perturbations
Masked out tokens/words:
['1']
['in']
['year']
['calendar']


Unnamed: 0,exp_id,attribution_strategy,perturbation_strategy,perturb_word_wise,token_1,token_2,token_3,token_4,token_5,token_6,token_7,token_8,token_9,token_10,token_11,token_12,token_13,token_14,token_15,token_16,token_17,token_18,token_19
0,1,cosine,fixed,True,Who 0.00,holds 0.00,the 0.00,record 0.00,for 0.00,the 0.00,most 0.00,goals 0.00,in 0.00,a 0.00,calendar 0.00,year 0.00,in 0.00,football 0.00,(soccer)? 0.00,Answer 0.00,in 0.00,1 0.58,word. 0.56
1,2,cosine,fixed,True,Who 0.00,holds 0.00,the 0.00,record 0.00,for 0.00,the 0.00,most 0.00,goals 0.00,in 0.00,a 0.00,calendar 0.00,year 0.00,in 0.00,football 0.00,(soccer)? 0.00,Answer 0.00,in 0.00,1 0.58,word. 0.56


Unnamed: 0,exp_id,original_input,original_output,perturbation_strategy,perturb_word_wise,duration,num_llm_calls
0,1,Who holds the record for the most goals in a c...,Messi,fixed,True,2.149703,20
1,2,Who holds the record for the most goals in a c...,Messi,fixed,True,3.360714,15


Sending 10 concurrent requests at a time: 100%|██████████| 2/2 [00:01<00:00,  1.74it/s]


Stage 0: making 5 perturbations
Masked out tokens/words:
['In Greek']
['mythology, who is the']
['goddess of wisdom and']
['warfare? Answer in 1']
['word.']
Stage 1: making 4 perturbations
Masked out tokens/words:
['in 1']
['warfare? Answer']
['wisdom and']
['goddess of']
Stage 2: making 8 perturbations
Masked out tokens/words:
['1']
['in']
['Answer']
['warfare?']
['and']
['wisdom']
['of']
['goddess']


Unnamed: 0,exp_id,attribution_strategy,perturbation_strategy,perturb_word_wise,token_1,token_2,token_3,token_4,token_5,token_6,token_7,token_8,token_9,token_10,token_11,token_12,token_13,token_14,token_15
0,1,cosine,fixed,True,In -0.00,Greek -0.00,"mythology, -0.00",who -0.00,is -0.00,the -0.00,goddess -0.00,of -0.00,wisdom -0.00,and -0.00,warfare? -0.00,Answer -0.00,in -0.00,1 -0.00,word. -0.00
1,2,cosine,fixed,True,In -0.00,Greek -0.00,"mythology, -0.00",who -0.00,is -0.00,the -0.00,goddess -0.00,of -0.00,wisdom -0.00,and -0.00,warfare? -0.00,Answer -0.00,in -0.00,1 -0.00,word. -0.00


Unnamed: 0,exp_id,original_input,original_output,perturbation_strategy,perturb_word_wise,duration,num_llm_calls
0,1,"In Greek mythology, who is the goddess of wisd...",Athena,fixed,True,1.30695,16
1,2,"In Greek mythology, who is the goddess of wisd...",Athena,fixed,True,3.130599,18


Stage 0: making 3 perturbations
Masked out tokens/words:
['Who directed']
["the film 'Inception'? Answer"]
['in 1 word.']
Stage 1: making 6 perturbations
Masked out tokens/words:
['1 word.']
['in']
["'Inception'? Answer"]
['the film']
['directed']
['Who']
Stage 2: making 4 perturbations
Masked out tokens/words:
['word.']
['1']
['Answer']
["'Inception'?"]


Unnamed: 0,exp_id,attribution_strategy,perturbation_strategy,perturb_word_wise,token_1,token_2,token_3,token_4,token_5,token_6,token_7,token_8,token_9
0,1,cosine,fixed,True,Who -0.00,directed 0.71,the -0.00,film -0.00,'Inception'? 0.69,Answer -0.00,in -0.00,1 0.66,word. 0.66
1,2,cosine,fixed,True,Who -0.00,directed 0.71,the -0.00,film -0.00,'Inception'? 0.69,Answer -0.00,in -0.00,1 0.66,word. 0.66


Unnamed: 0,exp_id,original_input,original_output,perturbation_strategy,perturb_word_wise,duration,num_llm_calls
0,1,Who directed the film 'Inception'? Answer in 1...,Nolan,fixed,True,1.012337,10
1,2,Who directed the film 'Inception'? Answer in 1...,Nolan,fixed,True,2.597398,14


Sending 10 concurrent requests at a time: 100%|██████████| 2/2 [00:00<00:00,  2.05it/s]


Stage 0: making 4 perturbations
Masked out tokens/words:
['Who is']
["known as the 'King"]
["of Pop'? Answer in"]
['1 word.']
Stage 1: making 4 perturbations
Masked out tokens/words:
['Answer in']
["of Pop'?"]
["the 'King"]
['known as']
Stage 2: making 4 perturbations
Masked out tokens/words:
["Pop'?"]
['of']
["'King"]
['the']


Unnamed: 0,exp_id,attribution_strategy,perturbation_strategy,perturb_word_wise,token_1,token_2,token_3,token_4,token_5,token_6,token_7,token_8,token_9,token_10,token_11,token_12
0,1,cosine,fixed,True,Who 0.00,is 0.00,known 0.00,as 0.00,the 0.00,'King 0.00,of 0.00,Pop'? 0.60,Answer 0.00,in 0.00,1 0.00,word. 0.00
1,2,cosine,fixed,True,Who 0.00,is 0.00,known 0.00,as 0.00,the 0.00,'King 0.00,of 0.00,Pop'? 0.60,Answer 0.00,in 0.00,1 0.00,word. 0.00


Unnamed: 0,exp_id,original_input,original_output,perturbation_strategy,perturb_word_wise,duration,num_llm_calls
0,1,Who is known as the 'King of Pop'? Answer in 1...,Michael,fixed,True,1.057454,13
1,2,Who is known as the 'King of Pop'? Answer in 1...,Michael,fixed,True,2.197495,13


Sending 10 concurrent requests at a time: 100%|██████████| 2/2 [00:01<00:00,  1.52it/s]


Stage 0: making 5 perturbations
Masked out tokens/words:
['Who was']
['the first female Prime']
['Minister of the United']
['Kingdom? Answer in 1']
['word.']
Stage 1: making 4 perturbations
Masked out tokens/words:
['female Prime']
['the first']
['was']
['Who']
Stage 2: making 4 perturbations
Masked out tokens/words:
['Prime']
['female']
['first']
['the']


Unnamed: 0,exp_id,attribution_strategy,perturbation_strategy,perturb_word_wise,token_1,token_2,token_3,token_4,token_5,token_6,token_7,token_8,token_9,token_10,token_11,token_12,token_13,token_14,token_15
0,1,cosine,fixed,True,Who -0.00,was -0.00,the -0.00,first 0.69,female 0.68,Prime -0.00,Minister -0.00,of -0.00,the -0.00,United 0.68,Kingdom? -0.00,Answer 0.69,in -0.00,1 -0.00,word. -0.00
1,2,cosine,fixed,True,Who -0.00,was -0.00,the -0.00,first 0.69,female 0.68,Prime -0.00,Minister -0.00,of -0.00,the -0.00,United -0.00,Kingdom? 0.12,Answer 0.12,in 0.12,1 0.12,word. -0.00


Unnamed: 0,exp_id,original_input,original_output,perturbation_strategy,perturb_word_wise,duration,num_llm_calls
0,1,Who was the first female Prime Minister of the...,Margaret,fixed,True,1.442955,16
1,2,Who was the first female Prime Minister of the...,Margaret,fixed,True,2.287251,14


Stage 0: making 3 perturbations
Masked out tokens/words:
['Who is']
['the co-founder of Microsoft?']
['Answer in 1 word.']
Stage 1: making 2 perturbations
Masked out tokens/words:
['of Microsoft?']
['the co-founder']
Stage 2: making 2 perturbations
Masked out tokens/words:
['Microsoft?']
['of']


Unnamed: 0,exp_id,attribution_strategy,perturbation_strategy,perturb_word_wise,token_1,token_2,token_3,token_4,token_5,token_6,token_7,token_8,token_9,token_10
0,1,cosine,fixed,True,Who 0.00,is 0.54,the 0.00,co-founder 0.64,of 0.00,Microsoft? 0.51,Answer 0.00,in 0.00,1 -0.00,word. -0.00
1,2,cosine,fixed,True,Who 0.00,is 0.00,the 0.00,co-founder 0.00,of 0.00,Microsoft? 0.51,Answer 0.04,in 0.04,1 0.04,word. 0.04


Unnamed: 0,exp_id,original_input,original_output,perturbation_strategy,perturb_word_wise,duration,num_llm_calls
0,1,Who is the co-founder of Microsoft? Answer in ...,Bill,fixed,True,0.959253,11
1,2,Who is the co-founder of Microsoft? Answer in ...,Bill,fixed,True,1.822891,8


## Longer prompts
Using hierarchical perturbation only

In [42]:
longer_prompts = [
    "In J.K. Rowling's Harry Potter series, the spell used to conjure a Patronus is considered highly advanced and can only be performed by skilled witches and wizards. The form that a Patronus takes can vary widely and is often influenced by the caster's personality and experiences. What form does Snape's Patronus take? Answer in 1 word.",
    "In the field of astronomy, there is a phenomenon where the light from a star is bent and magnified by the gravitational field of another object, such as a galaxy or black hole, that lies between the star and the observer. This effect was first predicted by Einstein's theory of general relativity. What is this phenomenon called? Answer in 1 word.",
    "Located in South America, there is a vast river that flows through Brazil, Peru, and several other countries. It is the largest river by discharge volume of water in the world and is often associated with the rainforest of the same name. What is the name of this river? Answer in 1 word.",
    "In Norse mythology, there is a hammer wielded by the god Thor, which is renowned for its immense power and is said to be capable of leveling mountains. This hammer is also a symbol of protection and blessing. What is the name of Thor's hammer? Answer in 1 word.",
    "In the movie 'The Matrix,' the protagonist is a computer hacker who learns about the true nature of his reality and his role in the war against its controllers. He is given a choice between two pills: a red pill that reveals the truth, and a blue pill that returns him to his normal life. What is the name of the protagonist? Answer in 1 word.",
    "There is a famous painting by Vincent van Gogh that depicts a night sky filled with swirling clouds, stars, and a bright crescent moon. This painting is one of his most well-known works and was created while he was in a mental asylum in Saint-Rémy-de-Provence. What is the title of this painting? Answer in 1 word.",
    "In classical music, there is a composer who is renowned for his symphonies, concertos, and sonatas. Born in Salzburg in 1756, he began composing music at a very young age and created over 600 works during his lifetime. What is the last name of this composer? Answer in 1 word.",
    "In the realm of computer programming, there is a widely used language that was developed by Guido van Rossum and first released in 1991. It emphasizes code readability and its syntax allows programmers to express concepts in fewer lines of code. What is the name of this programming language? Answer in 1 word.",
    "In the study of genetics, there is a molecule that carries the genetic instructions used in the growth, development, functioning, and reproduction of all known living organisms and many viruses. This molecule is structured as a double helix and was first described by Watson and Crick in 1953. What is the abbreviation for this molecule? Answer in 1 word.",
]

logger = ExperimentLogger()
for input_str in longer_prompts:
    await attributor.hierarchical_perturbation(
        input_str,
        init_chunk_size=8,
        perturbation_strategy=perturbation_strategy,
        logger=logger,
        **kwargs,
    )

logger.print_sentence_attribution()
display(logger.df_experiments)

Stage 0: making 8 perturbations
Stage 1: making 4 perturbations
Stage 2: making 4 perturbations
Stage 3: making 2 perturbations
Stage 0: making 9 perturbations
Stage 0: making 8 perturbations
Stage 0: making 7 perturbations
Stage 1: making 2 perturbations
Stage 2: making 2 perturbations
Stage 3: making 2 perturbations
Stage 0: making 9 perturbations
Stage 1: making 2 perturbations
Stage 2: making 2 perturbations
Stage 3: making 2 perturbations
Stage 0: making 8 perturbations
Stage 1: making 4 perturbations
Stage 2: making 2 perturbations
Stage 0: making 7 perturbations
Stage 1: making 2 perturbations
Stage 2: making 2 perturbations
Stage 3: making 2 perturbations
Stage 0: making 8 perturbations
Stage 0: making 8 perturbations
Stage 1: making 2 perturbations
Stage 2: making 2 perturbations
Stage 3: making 4 perturbations


Unnamed: 0,exp_id,attribution_strategy,perturbation_strategy,perturb_word_wise,token_1,token_2,token_3,token_4,token_5,token_6,token_7,token_8,token_9,token_10,token_11,token_12,token_13,token_14,token_15,token_16,token_17,token_18,token_19,token_20,token_21,token_22,token_23,token_24,token_25,token_26,token_27,token_28,token_29,token_30,token_31,token_32,token_33,token_34,token_35,token_36,token_37,token_38,token_39,token_40,token_41,token_42,token_43,token_44,token_45,token_46,token_47,token_48,token_49,token_50,token_51,token_52,token_53,token_54,token_55,token_56,token_57,token_58,token_59,token_60,token_61,token_62,token_63,token_64,token_65,token_66
0,1,cosine,fixed,True,In -0.00,J.K. -0.00,Rowling's -0.00,Harry -0.00,Potter -0.00,"series, -0.00",the -0.00,spell -0.00,used -0.00,to -0.00,conjure -0.00,a -0.00,Patronus -0.00,is -0.00,considered -0.00,highly -0.00,advanced -0.00,and -0.00,can -0.00,only -0.00,be -0.00,performed -0.00,by -0.00,skilled -0.00,witches -0.00,and -0.00,wizards. -0.00,The -0.00,form -0.00,that -0.00,a -0.00,Patronus -0.00,takes -0.00,can -0.00,vary -0.00,widely -0.00,and -0.00,is -0.00,often -0.00,influenced -0.00,by -0.00,the -0.00,caster's -0.00,personality -0.00,and 0.08,experiences. 0.08,What 0.08,form 0.08,does 0.57,Snape's 1.23,Patronus 0.25,take? 0.25,Answer 0.16,in 0.16,1 0.48,word. 1.04,,,,,,,,,,
1,2,cosine,fixed,True,In -0.00,the -0.00,field -0.00,of -0.00,"astronomy, -0.00",there -0.00,is -0.00,a -0.00,phenomenon -0.00,where -0.00,the -0.00,light -0.00,from -0.00,a -0.00,star -0.00,is -0.00,bent -0.00,and -0.00,magnified -0.00,by -0.00,the -0.00,gravitational -0.00,field -0.00,of -0.00,another -0.00,"object, -0.00",such -0.00,as -0.00,a -0.00,galaxy -0.00,or -0.00,black -0.00,"hole, -0.00",that -0.00,lies -0.00,between -0.00,the -0.00,star -0.00,and -0.00,the -0.00,observer. -0.00,This -0.00,effect -0.00,was -0.00,first -0.00,predicted -0.00,by -0.00,Einstein's -0.00,theory -0.00,of -0.00,general -0.00,relativity. -0.00,What -0.00,is -0.00,this -0.00,phenomenon -0.00,called? -0.00,Answer -0.00,in -0.00,1 -0.00,word. 0.07,,,,,
2,3,cosine,fixed,True,Located 0.00,in 0.00,South 0.00,"America, 0.00",there 0.00,is 0.00,a 0.00,vast 0.00,river 0.00,that 0.00,flows 0.00,through 0.00,"Brazil, 0.00","Peru, 0.00",and 0.00,several 0.00,other 0.00,countries. 0.00,It 0.00,is 0.00,the 0.00,largest 0.00,river 0.00,by 0.00,discharge 0.00,volume 0.00,of 0.00,water 0.00,in 0.00,the 0.00,world 0.00,and 0.00,is 0.00,often 0.00,associated 0.00,with 0.00,the 0.00,rainforest 0.00,of 0.00,the 0.00,same 0.00,name. 0.00,What 0.00,is 0.00,the 0.00,name 0.00,of 0.00,this 0.00,river? 0.00,Answer 0.00,in 0.00,1 0.00,word. 0.18,,,,,,,,,,,,,
3,4,cosine,fixed,True,In 0.00,Norse 0.00,"mythology, 0.00",there 0.00,is 0.00,a 0.00,hammer 0.00,wielded 0.00,by 0.00,the 0.00,god 0.00,"Thor, 0.00",which 0.00,is 0.00,renowned 0.00,for 0.00,its 0.00,immense 0.00,power 0.00,and 0.00,is 0.00,said 0.00,to 0.00,be 0.00,capable 0.00,of 0.00,leveling 0.00,mountains. 0.00,This 0.00,hammer 0.00,is 0.00,also 0.00,a 0.00,symbol 0.00,of 0.00,protection 0.00,and 0.00,blessing. 0.00,What 0.00,is 0.00,the 0.00,name 0.00,of 0.00,Thor's 0.00,hammer? 0.01,Answer 0.01,in 0.03,1 0.06,word. 0.11,,,,,,,,,,,,,,,,,
4,5,cosine,fixed,True,In 0.00,the 0.00,movie 0.00,'The 0.00,"Matrix,' 0.00",the 0.00,protagonist 0.00,is 0.00,a 0.00,computer 0.00,hacker 0.00,who 0.00,learns 0.00,about 0.00,the 0.00,true 0.00,nature 0.00,of 0.00,his 0.00,reality 0.00,and 0.00,his 0.00,role 0.00,in 0.00,the 0.00,war 0.00,against 0.00,its 0.00,controllers. 0.00,He 0.00,is 0.00,given 0.00,a 0.00,choice 0.00,between 0.00,two 0.00,pills: 0.00,a 0.00,red 0.00,pill 0.00,that 0.00,reveals 0.00,the 0.00,"truth, 0.00",and 0.00,a 0.00,blue 0.00,pill 0.00,that 0.00,returns 0.00,him 0.00,to 0.00,his 0.00,normal 0.00,life. 0.00,What 0.00,is 0.00,the 0.00,name 0.00,of 0.00,the 0.09,protagonist? 0.09,Answer 0.09,in 0.28,1 0.55,word. 0.55
5,6,cosine,fixed,True,There 0.00,is 0.00,a 0.00,famous 0.00,painting 0.00,by 0.00,Vincent 0.00,van 0.00,Gogh 0.00,that 0.00,depicts 0.00,a 0.00,night 0.00,sky 0.00,filled 0.00,with 0.00,swirling 0.00,"clouds, 0.00","stars, 0.00",and 0.00,a 0.00,bright 0.00,crescent 0.00,moon. 0.00,This 0.00,painting 0.00,is 0.00,one 0.00,of 0.00,his 0.00,most 0.00,well-known 0.00,works 0.00,and 0.00,was 0.00,created 0.00,while 0.00,he 0.00,was 0.00,in 0.00,a 0.00,mental 0.00,asylum 0.00,in 0.00,Saint-RÃ©my-de-Provence. 0.08,What 0.08,is 0.08,the 0.08,title 0.08,of 0.08,this 0.08,painting? 0.08,Answer 0.10,in 0.10,1 0.30,word. 0.69,,,,,,,,,,
6,7,cosine,fixed,True,In 0.00,classical 0.00,"music, 0.00",there 0.00,is 0.00,a 0.00,composer 0.00,who 0.00,is 0.00,renowned 0.00,for 0.00,his 0.00,"symphonies, 0.00","concertos, 0.00",and 0.00,sonatas. 0.00,Born 0.00,in 0.00,Salzburg 0.00,in 0.00,"1756, 0.00",he 0.00,began 0.00,composing 0.00,music 0.00,at 0.00,a 0.00,very 0.00,young 0.00,age 0.00,and 0.00,created 0.00,over 0.00,600 0.00,works 0.00,during 0.00,his 0.00,lifetime. 0.00,What 0.00,is 0.00,the 0.00,last 0.00,name 0.00,of 0.00,this 0.06,composer? 0.06,Answer 0.06,in 0.21,1 0.21,word. 0.21,,,,,,,,,,,,,,,,
7,8,cosine,fixed,True,In 0.00,the 0.00,realm 0.00,of 0.00,computer 0.00,"programming, 0.00",there 0.00,is 0.00,a 0.00,widely 0.00,used 0.00,language 0.00,that 0.00,was 0.00,developed 0.00,by 0.00,Guido 0.00,van 0.00,Rossum 0.00,and 0.00,first 0.00,released 0.00,in 0.00,1991. 0.00,It 0.00,emphasizes 0.00,code 0.00,readability 0.00,and 0.00,its 0.00,syntax 0.00,allows 0.00,programmers 0.00,to 0.00,express 0.00,concepts 0.00,in 0.00,fewer 0.00,lines 0.00,of 0.00,code. 0.00,What 0.00,is 0.00,the 0.00,name 0.00,of 0.00,this 0.00,programming 0.00,language? 0.00,Answer 0.00,in 0.00,1 0.00,word. 0.00,,,,,,,,,,,,,
8,9,cosine,fixed,True,In -0.00,the -0.00,study -0.00,of -0.00,"genetics, -0.00",there -0.00,is -0.00,a -0.00,molecule -0.00,that -0.00,carries -0.00,the -0.00,genetic -0.00,instructions -0.00,used -0.00,in -0.00,the -0.00,"growth, -0.00","development, -0.00","functioning, -0.00",and -0.00,reproduction -0.00,of -0.00,all -0.00,known -0.00,living -0.00,organisms -0.00,and -0.00,many -0.00,viruses. -0.00,This -0.00,molecule -0.00,is -0.00,structured -0.00,as -0.00,a -0.00,double -0.00,helix -0.00,and -0.00,was -0.00,first -0.00,described -0.00,by -0.00,Watson -0.00,and -0.00,Crick -0.00,in -0.00,1953. -0.00,What -0.00,is -0.00,the -0.00,abbreviation -0.00,for 0.04,this 0.04,molecule? 0.04,Answer 0.11,in 0.11,1 0.41,word. 0.11,,,,,,,


Unnamed: 0,exp_id,original_input,original_output,perturbation_strategy,perturb_word_wise,duration,num_llm_calls
0,1,"In J.K. Rowling's Harry Potter series, the spe...",Doe,fixed,True,4.542755,19
1,2,"In the field of astronomy, there is a phenomen...",Gravitational lensing,fixed,True,4.006256,10
2,3,"Located in South America, there is a vast rive...",Amazon,fixed,True,1.272425,9
3,4,"In Norse mythology, there is a hammer wielded ...",Mjölnir,fixed,True,4.858233,14
4,5,"In the movie 'The Matrix,' the protagonist is ...",Neo,fixed,True,4.405809,16
5,6,There is a famous painting by Vincent van Gogh...,Starry Night,fixed,True,3.609214,15
6,7,"In classical music, there is a composer who is...",Mozart,fixed,True,3.191743,14
7,8,"In the realm of computer programming, there is...",Python,fixed,True,1.168609,9
8,9,"In the study of genetics, there is a molecule ...",DNA,fixed,True,4.40563,17


### 

### Attribution matrices

We can also inspect the attribution matrices for different experiments.

In [46]:
for exp_id in (6, 7):
    logger.print_attribution_matrix(exp_id=exp_id)

Attribution matrix for experiment 6 
Attribution Strategy: cosine 
Perturbation strategy: fixed:
Input Tokens (Rows) vs. Output Tokens (Columns)


Unnamed: 0,St (0),arry (1),Night (2)
There (0),0.0,0.0,-0.0
is (1),0.0,0.0,-0.0
a (2),0.0,0.0,-0.0
famous (3),0.0,0.0,-0.0
painting (4),0.0,0.0,-0.0
by (5),0.0,0.0,-0.0
Vincent (6),0.0,0.0,-0.0
van (7),0.0,0.0,-0.0
Gogh (8),0.0,0.0,-0.0
that (9),0.0,0.0,-0.0


Attribution matrix for experiment 7 
Attribution Strategy: cosine 
Perturbation strategy: fixed:
Input Tokens (Rows) vs. Output Tokens (Columns)


Unnamed: 0,M (0),oz (1),art (2)
In (0),0.0,0.0,-0.019231
classical (1),0.0,0.0,-0.019231
"music, (2)",0.0,0.0,-0.019231
there (3),0.0,0.0,-0.019231
is (4),0.0,0.0,-0.009615
a (5),0.0,0.0,-0.009615
composer (6),0.0,0.0,-0.009615
who (7),0.0,0.0,-0.009615
is (8),0.0,0.0,-0.009615
renowned (9),0.0,0.0,-0.009615


## Calculating chunksize
Instead of specifying an initial chunksize directly, one can be compute based on a fraction or number of desired windows.

In [48]:
mixed_prompts = [
    "Who painted the 'Mona Lisa'? Answer in 1 word.",
    "During the American Civil War, there was a significant battle fought from July 1 to July 3, 1863, which is often considered the turning point of the war. This battle took place in Pennsylvania and ended with a decisive victory for the Union forces. What is the name of this battle? Answer in 1 word.",
]

logger = ExperimentLogger()
for i, input_str in enumerate(mixed_prompts):
    response = await attributor.get_chat_completion(input_str)

    units, _, _ = get_units_from_prompt(input_str, attributor.tokenizer, perturb_word_wise=True)
    chunksize = calculate_chunk_size(len(units), fraction=0.25)
    print(f"Using chunksize: {chunksize}")

    saliency = await attributor.hierarchical_perturbation(
        input_str,
        init_chunk_size=chunksize,
        perturbation_strategy=perturbation_strategy,
        logger=logger,
        **kwargs,
    )
    logger.print_attribution_matrix(exp_id=i + 1)

logger.print_sentence_attribution()
display(logger.df_experiments)

Using chunksize: 2
Stage 0: making 5 perturbations
Stage 1: making 2 perturbations
Attribution matrix for experiment 1 
Attribution Strategy: cosine 
Perturbation strategy: fixed:
Input Tokens (Rows) vs. Output Tokens (Columns)


Unnamed: 0,Da (0),Vin (1),ci (2)
Who (0),-0.0,0.0,0.0
painted (1),0.575291,0.539194,0.749999
the (2),0.191763,0.179731,0.25
'Mona (3),-0.0,0.0,0.0
Lisa'? (4),-0.0,0.0,0.0
Answer (5),-0.0,0.140537,0.0
in (6),-0.0,0.140537,0.0
1 (7),0.116473,0.0,0.0
word. (8),0.116473,0.0,0.0


Using chunksize: 13
Stage 0: making 5 perturbations
Stage 1: making 2 perturbations
Stage 2: making 2 perturbations
Stage 3: making 2 perturbations
Stage 4: making 2 perturbations
Attribution matrix for experiment 2 
Attribution Strategy: cosine 
Perturbation strategy: fixed:
Input Tokens (Rows) vs. Output Tokens (Columns)


Unnamed: 0,Getty (0),sburg (1)
During (0),0.0,0.0
the (1),0.0,0.0
American (2),0.0,0.0
Civil (3),0.0,0.0
"War, (4)",0.0,0.0
there (5),0.0,0.0
was (6),0.0,0.0
a (7),0.0,0.0
significant (8),0.0,0.0
battle (9),0.0,0.0


Unnamed: 0,exp_id,attribution_strategy,perturbation_strategy,perturb_word_wise,token_1,token_2,token_3,token_4,token_5,token_6,token_7,token_8,token_9,token_10,token_11,token_12,token_13,token_14,token_15,token_16,token_17,token_18,token_19,token_20,token_21,token_22,token_23,token_24,token_25,token_26,token_27,token_28,token_29,token_30,token_31,token_32,token_33,token_34,token_35,token_36,token_37,token_38,token_39,token_40,token_41,token_42,token_43,token_44,token_45,token_46,token_47,token_48,token_49,token_50,token_51,token_52,token_53,token_54,token_55
0,1,cosine,fixed,True,Who 0.00,painted 1.13,the 0.38,'Mona 0.00,Lisa'? 0.00,Answer 0.10,in 0.10,1 0.07,word. 0.07,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1,2,cosine,fixed,True,During 0.00,the 0.00,American 0.00,Civil 0.00,"War, 0.00",there 0.00,was 0.00,a 0.00,significant 0.00,battle 0.00,fought 0.00,from 0.00,July 0.00,1 0.00,to 0.00,July 0.00,"3, 0.00","1863, 0.00",which 0.00,is 0.00,often 0.00,considered 0.00,the 0.00,turning 0.00,point 0.00,of 0.00,the 0.00,war. 0.00,This 0.00,battle 0.00,took 0.00,place 0.00,in 0.00,Pennsylvania 0.00,and 0.00,ended 0.00,with 0.00,a 0.00,decisive 0.00,victory 0.00,for 0.00,the 0.00,Union 0.00,forces. 0.00,What 0.00,is 0.00,the 0.02,name 0.02,of 0.02,this 0.02,battle? 0.05,Answer 0.05,in 0.10,1 0.10,word. 0.10


Unnamed: 0,exp_id,original_input,original_output,perturbation_strategy,perturb_word_wise,duration,num_llm_calls
0,1,Who painted the 'Mona Lisa'? Answer in 1 word.,Da Vinci,fixed,True,1.39114,8
1,2,"During the American Civil War, there was a sig...",Gettysburg,fixed,True,7.607148,14


## Answer only in the question prompts
Using different perturbation strategies

In [36]:
perturbation_strategies = [
    FixedPerturbationStrategy(),
    NthNearestPerturbationStrategy(n=0),
    NthNearestPerturbationStrategy(n=-1),
]

In [31]:
self_fulfilling_prompts = [
    "The clock shows 9:47 PM. How many minutes to 10PM?",
    "Maria is 37 years old today. How many years till she's 50?",
    "John has 83 books on his shelf. If he buys 17 more books, how many books will he have in total?",
    "The building is 132 meters tall. How many centimeters tall is the building? No explanation",
    "The package weighs 8.6 kilograms. How many grams does the package weigh?",
    "Jack has 12 teaspoons of sugar. How many tablespoons of sugar does he have?",
    "Alex saved $363 from his birthday gifts. If he spends $45 on a new game, how much money will he have left? No explanation.",
    "The building is 132 meters tall. How many centimeters tall is the building? No explanation.",
    "The thermometer reads 23 degrees Celsius. What is the temperature in Fahrenheit? No explanation.",
    "There are 12 eggs in a dozen. If you use 5 eggs, how many eggs are left?",
]

logger = ExperimentLogger()

for input_str in self_fulfilling_prompts:
    print(f"Running prompt: '{input_str}'")

    for perturb_strat in perturbation_strategies:
        print(f"Using perturbation strategy: {perturb_strat}")
        await attributor.hierarchical_perturbation(
            input_str,
            init_chunk_size=4,
            perturbation_strategy=perturb_strat,
            logger=logger,
            **kwargs,
        )

display(logger.df_experiments)
logger.print_sentence_attribution()

Running prompt: 'The clock shows 9:47 PM. How many minutes to 10PM?'
Using perturbation strategy: fixed
Stage 0: making 3 perturbations
Stage 1: making 2 perturbations
Using perturbation strategy: nth_nearest (n=0)
Stage 0: making 3 perturbations
Stage 1: making 4 perturbations
Stage 2: making 8 perturbations
Using perturbation strategy: nth_nearest (n=-1)
Stage 0: making 3 perturbations
Stage 1: making 2 perturbations
Stage 2: making 4 perturbations
Running prompt: 'Maria is 37 years old today. How many years till she's 50?'
Using perturbation strategy: fixed
Stage 0: making 4 perturbations
Stage 1: making 2 perturbations
Using perturbation strategy: nth_nearest (n=0)
Stage 0: making 4 perturbations
Stage 1: making 4 perturbations
Stage 2: making 4 perturbations
Using perturbation strategy: nth_nearest (n=-1)
Stage 0: making 4 perturbations
Stage 1: making 4 perturbations
Running prompt: 'John has 83 books on his shelf. If he buys 17 more books, how many books will he have in total?'


Unnamed: 0,exp_id,original_input,original_output,perturbation_strategy,perturb_word_wise,duration,num_llm_calls
0,1,The clock shows 9:47 PM. How many minutes to 1...,13 minutes.,fixed,True,1.999998,6
1,2,The clock shows 9:47 PM. How many minutes to 1...,There are 13 minutes left until 10:00 PM.,nth_nearest (n=0),True,7.366865,16
2,3,The clock shows 9:47 PM. How many minutes to 1...,There are 13 minutes left until 10:00 PM.,nth_nearest (n=-1),True,6.030974,10
3,4,Maria is 37 years old today. How many years ti...,Maria is 13 years away from turning 50.,fixed,True,2.813163,7
4,5,Maria is 37 years old today. How many years ti...,Maria is 13 years away from turning 50.,nth_nearest (n=0),True,7.325064,13
5,6,Maria is 37 years old today. How many years ti...,Maria is 13 years away from turning 50.,nth_nearest (n=-1),True,5.005817,9
6,7,John has 83 books on his shelf. If he buys 17 ...,John will have 100 books in total. \n\n83 + 17...,fixed,True,5.306434,17
7,8,John has 83 books on his shelf. If he buys 17 ...,John will have 100 books in total. \n\n83 + 17...,nth_nearest (n=0),True,5.306171,9
8,9,John has 83 books on his shelf. If he buys 17 ...,John will have 100 books in total. \n\n83 + 17...,nth_nearest (n=-1),True,7.876437,13
9,10,The building is 132 meters tall. How many cent...,"13,200 centimeters",fixed,True,5.081289,16


Unnamed: 0,exp_id,attribution_strategy,perturbation_strategy,perturb_word_wise,token_1,token_2,token_3,token_4,token_5,token_6,token_7,token_8,token_9,token_10,token_11,token_12,token_13,token_14,token_15,token_16,token_17,token_18,token_19,token_20,token_21,token_22,token_23,token_24
0,1,cosine,fixed,True,The 0.05,clock 0.28,shows 0.03,9:47 0.03,PM. 0.03,How 0.03,many 0.02,minutes 0.02,to 0.02,10PM? 0.02,,,,,,,,,,,,,,
1,2,cosine,nth_nearest (n=0),True,The 0.00,clock 0.00,shows 0.84,9:47 0.96,PM. 0.78,How 0.90,many 0.84,minutes 0.48,to 0.48,10PM? 0.84,,,,,,,,,,,,,,
2,3,cosine,nth_nearest (n=-1),True,The 0.06,clock 0.06,shows 0.02,9:47 0.02,PM. 0.02,How 0.02,many 0.22,minutes 0.78,to 0.97,10PM? 0.85,,,,,,,,,,,,,,
3,4,cosine,fixed,True,Maria 0.80,is 0.34,37 0.06,years 0.06,old 0.06,today. 0.06,How 0.08,many 0.08,years 0.08,till 0.08,she's 0.13,50? 0.13,,,,,,,,,,,,
4,5,cosine,nth_nearest (n=0),True,Maria 0.00,is 0.00,37 0.20,years 0.09,old 0.03,today. 0.03,How 0.05,many 0.05,years 0.15,till 0.15,she's 0.00,50? 0.00,,,,,,,,,,,,
5,6,cosine,nth_nearest (n=-1),True,Maria 0.86,is 0.40,37 0.05,years 0.05,old 0.05,today. 0.05,How 0.08,many 0.08,years 0.08,till 0.08,she's 0.23,50? 0.51,,,,,,,,,,,,
6,7,cosine,fixed,True,John 0.24,has 0.08,83 0.43,books 0.25,on 0.09,his 0.09,shelf. 0.00,If 0.00,he 0.00,buys 0.00,17 0.41,more 0.25,"books, 0.11",how 0.11,many 0.00,books 0.00,will 0.00,he 0.00,have 0.02,in 0.02,total? 0.02,,,
7,8,cosine,nth_nearest (n=0),True,John 0.04,has 0.11,83 0.02,books 0.02,on 0.02,his 0.02,shelf. 0.02,If 0.02,he 0.02,buys 0.02,17 0.02,more 0.02,"books, 0.02",how 0.02,many 0.02,books 0.02,will 0.02,he 0.02,have 0.02,in 0.02,total? 0.02,,,
8,9,cosine,nth_nearest (n=-1),True,John 0.24,has 0.08,83 0.04,books 0.04,on 0.04,his 0.04,shelf. 0.02,If 0.02,he 0.02,buys 0.02,17 0.05,more 0.05,"books, 0.05",how 0.05,many 0.00,books 0.00,will 0.00,he 0.00,have 0.11,in 0.27,total? 0.60,,,
9,10,cosine,fixed,True,The -0.00,building -0.00,is 0.25,132 0.59,meters 0.48,tall. 0.21,How 0.13,many 0.13,centimeters 0.93,tall 0.40,is 0.02,the 0.02,building? 0.02,No 0.02,explanation -0.00,,,,,,,,,
