## PIZZA: An Open Source Library for Closed LLM Attribution (or “why did ChatGPT say that?”)

In [19]:
import os
import asyncio

# Set your open ai API key
# BEWARE: This will cost you API credits!
YOUR_OPENAI_API_KEY = "your-api-key"


import warnings
# Suppress annoying FutureWarning from huggingface_hub
warnings.filterwarnings('ignore', category=FutureWarning, module='huggingface_hub')


In [20]:
# Re-import modified modules without restarting the server
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [21]:
from attribution.api_attribution import OpenAIAttributor
from attribution.experiment_logger import ExperimentLogger
from attribution.token_perturbation import FixedPerturbationStrategy, NthNearestPerturbationStrategy

gpt3_5_attributor = OpenAIAttributor(openai_api_key=YOUR_OPENAI_API_KEY,
    max_concurrent_requests=10, openai_model="gpt-3.5-turbo")

gpt4_attributor = OpenAIAttributor(openai_api_key=YOUR_OPENAI_API_KEY,
    max_concurrent_requests=10, openai_model="gpt-4o")

# Prompt Engineering

In [22]:
input_str = "Mary puts an apple in the box. The box is labelled 'pencils'. John enters the room. What does he think is in the box? Answer in 1 word."

gpt3_5_response = await gpt3_5_attributor.get_chat_completion(input_str)
gpt4_response = await gpt4_attributor.get_chat_completion(input_str)

print(input_str)
print("GPT3.5:", gpt3_5_response.message.content)
print("GPT4:", gpt4_response.message.content)

Mary puts an apple in the box. The box is labelled 'pencils'. John enters the room. What does he think is in the box? Answer in 1 word.
GPT3.5: Apples
GPT4: Pencils.


In [23]:
# Initialise a logger to track results. We'll use one for each model.
gpt3_5_logger = ExperimentLogger()
await gpt3_5_attributor.hierarchical_perturbation(
    input_str,
    logger=gpt3_5_logger
)

# Let's see...
print("GPT3.5 Total attribution:")
gpt3_5_logger.print_text_total_attribution()

# Now try with GPT4
gpt4_logger = ExperimentLogger()
await gpt4_attributor.hierarchical_perturbation(
    input_str,
    logger=gpt4_logger
)

print("GPT4 Total attribution:")
gpt4_logger.print_text_total_attribution()


Sending 10 concurrent requests at a time: 100%|██████████| 2/2 [00:01<00:00,  1.40it/s]
Sending 10 concurrent requests at a time: 100%|██████████| 2/2 [00:01<00:00,  1.33it/s]
Sending 10 concurrent requests at a time: 100%|██████████| 2/2 [00:01<00:00,  1.66it/s]


GPT3.5 Total attribution:


Sending 10 concurrent requests at a time:  50%|█████     | 1/2 [00:01<00:01,  1.94s/it]

GPT3.5 not so hot with the theory of mind there. Let's look in more detail.

In [None]:
print("GPT3 Total attribution:")
gpt3_5_logger.print_text_total_attribution()
print("GPT3 per-output-token attribution:")
gpt3_5_logger.print_total_attribution()

GPT3 Total attribution:


GPT3 per-output-token attribution:


Unnamed: 0,exp_id,attribution_strategy,perturbation_strategy,unit_definition,token_1,token_2,token_3,token_4,token_5,token_6,token_7,token_8,token_9,token_10,token_11,token_12,token_13,token_14,token_15,token_16,token_17,token_18,token_19,token_20,token_21,token_22,token_23,token_24,token_25,token_26,token_27,token_28,token_29,token_30,token_31,token_32,token_33,token_34,token_35,token_36
0,1,prob_diff,fixed,token,Mary -0.03,puts 0.04,an 0.04,apple 0.43,in 0.19,the 0.07,box 0.07,. -0.01,The -0.01,box 0.15,is 0.12,labelled 0.38,' 0.38,pen 0.13,cil 0.13,s 0.43,'. 0.23,John 0.14,enters 0.15,the 0.02,room 0.02,. 0.08,What 0.08,does 0.15,he 0.15,think 0.43,is 0.21,in 0.24,the 0.19,box 0.17,? 0.17,Answer 0.40,in 0.21,1 0.53,word 0.45,. 0.19


It looks like the request to "Answer in 1 word" is pretty important – as much more than the actual contents of the box. Could this be confusing the model? Let's try changing it.

In [None]:
input_str = "Mary puts an apple in the box. The box is labelled 'pencils'. John enters the room. What does he think is in the box? Answer briefly."

await gpt3_5_attributor.hierarchical_perturbation(
    input_str,
    logger=gpt3_5_logger,
)

# Let's see...
print("GPT3 Total attribution:")
#exp_id is the experiment index to print. -1 prints the last experiment.
gpt3_5_logger.print_text_total_attribution(exp_id=-1)

Sending 10 concurrent requests at a time: 100%|██████████| 2/2 [00:01<00:00,  1.13it/s]


GPT3 Total attribution:


That's better!

We have a few other attribution and perturbation methods for you, each with different properties. Check out the readme, and do your own experiments – PIZZA is a work in progress.

Hierarchical perturbation is useful to capture multi-token features, and can be faster and cheaper than standard iterative perturbation (which is what the compute_attributions function uses) on long inputs with fewer salient tokens. However, on when many tokens are salient, standard iterative perturbation can be faster, and often highlights individual token contributions more clearly. Someone should do some experiments to quantify these properties...

In [None]:
input_str = "Do not go gentle into that good"
await gpt4_attributor.compute_attributions(
    input_str,
    perturbation_strategy=FixedPerturbationStrategy(),
    logger=gpt4_logger
)
gpt4_logger.print_text_total_attribution(exp_id=-1)

We also have some different logging functions, to print the results. in different ways You can see how every input token affects every output token, what perturbations are being applied, etc.

In [None]:
gpt4_logger.print_total_attribution(exp_id=-1)
gpt4_logger.print_attribution_matrix(exp_id=-1, show_debug_cols=True)


Unnamed: 0,exp_id,attribution_strategy,perturbation_strategy,unit_definition,token_1,token_2,token_3,token_4,token_5,token_6,token_7
0,6,prob_diff,fixed,token,Do 0.13,not 0.21,go 0.47,gentle 0.41,into 0.38,that 0.37,good 0.50


Unnamed: 0,night (0),",  (1)",Old (2),age (3),should (4),burn (5),and (6),rave (7),at (8),close (9),of (10),day (11),;  (12),R (13),age (14),", (15)",rage (16),against (17),the (18),dying (19),light (20),.  (21),This (22),is (23),opening (24),stanza (25),famous (26),vill (27),anel (28),le (29),""" (30)",Do (31),Not (32),Go (33),Gentle (34),into (35),That (36),Good (37),Night (38),""" (39)",by (40),Dylan (41),Thomas (42),. (43),The (44),poem (45),a (46),passionate (47),call (48),to (49),fight (50),inevit (51),ability (52),death (53),urging (54),reader (55),resist (56),pass (57),ively (58),succ (59),umbing (60),it (61),perturbed_input,perturbed_output
Do (0),0.706592,-0.179215,-3e-06,-0.0,-0.0,-0.0,-0.0,0.000156,-3e-06,-9.5e-05,-9e-06,-2e-06,-0.000217,-1e-06,0.0,-0.501913,-1e-06,-0.000221,-0.746977,0.0,-0.0,0.80763,0.347485,-0.11006,0.53536,-0.357958,-0.255398,-0.174381,0.0,-2e-06,0.002429,0.0,-0.183842,1e-06,-0.0,-0.022977,0.000575,0.0,-0.0,-0.0754,-0.040041,-0.430685,0.0,-0.044723,0.281007,-0.002249,0.032302,0.509125,0.468262,-0.012571,0.534765,0.657544,0.99916,0.000453,0.653665,0.662494,0.838426,0.496588,0.5617,0.579658,1.0,0.550953,220 not go gentle into that good,"It seems like you're referencing the famous poem ""Do Not Go Gentle into That Good Night"" by Dylan Thomas. The poem is a villanelle and is one of Thomas's most famous works. It addresses the theme of resisting death and encourages fighting against the dying of the light. Here is the first stanza of the poem: ``` Do not go gentle into that good night, Old age should burn and rave at close of day; Rage, rage against the dying of the light. ``` If you have any specific questions about the poem or need further information, feel free to ask!"
not (1),0.709937,-0.179299,-4e-06,-0.0,-0.0,0.0,-0.0,0.00029,-2e-06,-9.4e-05,-9e-06,-2e-06,-0.000215,-1e-06,0.0,-0.501913,-3e-06,-0.000207,-0.746977,0.0,-0.0,0.638397,0.345677,-0.074729,0.554393,-0.358239,-0.427896,-0.177545,0.0,-0.0,-0.010852,0.0,0.231059,0.999992,0.999996,-0.022977,0.999516,1.0,0.999999,-0.074915,-0.039748,-0.429848,0.0,-0.027028,0.179647,-0.002242,0.074189,0.499158,0.465278,-0.012571,0.534765,0.697632,0.99916,0.000458,0.64842,0.662494,0.838426,0.496587,0.5617,0.579646,1.0,0.53315,Do 220 go gentle into that good,"It seems like you're referencing the famous poem ""Do not go gentle into that good night"" by Dylan Thomas. The poem is a villanelle and is one of Thomas's most famous works. It addresses the theme of resisting death and encourages fighting against the dying of the light. Here is the first stanza of the poem: ``` Do not go gentle into that good night, Old age should burn and rave at close of day; Rage, rage against the dying of the light. ``` If you have any specific questions about the poem or need further information, feel free to ask!"
go (2),0.694737,0.820339,0.999996,1.0,1.0,1.0,0.396622,0.99986,0.999997,0.999903,-9e-06,0.999998,0.999778,0.999999,1.0,-0.232854,0.999939,0.002943,-0.744752,0.579882,2e-06,0.774367,0.35555,-0.055193,0.864643,0.636201,-0.412315,0.10496,0.0,-1e-06,-0.000593,0.0,0.231059,0.999942,0.999994,-0.022977,0.999362,0.999999,0.999999,-0.07532,-0.041194,-0.430768,0.0,0.08845,0.140775,-0.002241,0.056928,0.454562,0.456003,-0.012571,0.534765,0.681844,0.99916,0.000692,0.664434,0.662494,0.838426,0.496555,0.5617,0.579648,1.0,0.561741,Do not 220 gentle into that good,"It seems like you're referencing the famous poem ""Do not go gentle into that good night"" by Dylan Thomas. The poem is a villanelle and is one of Thomas's most well-known works. It addresses the theme of resisting death and fighting against the dying of the light. If you have any specific questions or need further information about the poem, feel free to ask!"
gentle (3),0.582767,0.820339,0.999996,1.0,1.0,1.0,0.088836,0.99986,0.999821,0.999903,0.99288,0.999998,0.999778,0.999999,1.0,-0.319293,0.98487,0.993689,-0.714234,0.999646,1.0,0.477938,0.35555,-0.039913,0.752493,0.636198,-0.367222,-0.018721,0.0,-2e-06,-0.02429,0.0,-0.046241,3e-06,0.0,0.024449,0.002749,1e-06,0.0,-0.075541,-0.041194,-0.431177,0.0,0.039236,0.056773,-0.002123,-0.095111,0.472607,0.14335,-0.012571,0.164813,0.76457,0.99916,0.0687,0.755713,0.662494,0.270868,0.496347,0.5617,0.579609,1.0,0.567753,Do not go 220 into that good,"It seems like you're referencing the famous poem ""Do Not Go Gentle into That Good Night"" by Dylan Thomas. The poem is a villanelle, a 19-line form with a specific rhyme scheme and repeated lines. The poem is often interpreted as a plea to resist death and to live life with passion and defiance. If you have any specific questions or need further information about the poem, feel free to ask!"
into (4),0.714732,0.820339,0.999996,1.0,1.0,1.0,0.000158,0.99986,0.999997,0.999903,-8e-06,0.999998,0.999778,0.999999,1.0,-0.501644,0.986601,0.793857,-0.422876,0.95779,1.0,0.496085,0.35555,-0.208584,0.873588,0.636188,-0.067987,-0.269054,-0.0,-3e-06,-0.035744,-0.0,0.025274,1e-06,0.0,0.037112,0.005169,0.0,3e-06,-0.073142,-0.038127,-0.430768,0.0,0.008625,0.605066,0.053819,0.146151,0.465959,0.376909,-0.012438,0.325254,-0.083247,0.002654,-4.6e-05,0.780958,0.662494,0.193087,0.494801,0.5617,0.578933,1.0,0.513858,Do not go gentle 220 that good,"It sounds like you're referring to the famous poem ""Do Not Go Gentle into That Good Night"" by Dylan Thomas. This poem is widely regarded as a powerful and evocative piece of literature. It is often praised for its emotional depth, its use of the villanelle form, and its compelling plea to resist the inevitability of death. If you meant ""220"" in a different context, please provide more details so I can better understand and respond to your query."
that (5),0.711374,0.820339,0.999996,1.0,1.0,1.0,0.297175,0.99986,0.999996,0.999903,0.002793,0.999998,0.999778,0.999999,1.0,0.121735,0.988596,0.021295,-0.74547,0.236837,0.0,0.906609,0.35555,-0.126598,0.775763,0.636176,-0.379766,0.087251,0.0,-1e-06,-0.003796,-0.0,0.025274,0.0,0.0,0.072372,0.001274,0.0,3e-06,-0.075375,-0.041057,-0.431217,0.0,-0.000264,0.367534,0.000752,0.029319,0.086315,0.372051,0.469509,-0.167027,0.746623,0.99916,0.108978,0.782516,0.662494,0.21611,0.496291,0.5617,0.579446,1.0,0.565532,Do not go gentle into 220 good,"It seems like you're referencing the famous poem ""Do Not Go Gentle into That Good Night"" by Dylan Thomas. The poem is a villanelle and is known for its passionate plea to resist death and fight against the dying of the light. Here is the full text of the poem for your reference: ``` Do not go"
good (6),0.702598,0.820339,0.999996,0.972944,1.0,1.0,0.693715,0.99986,0.727701,0.999903,1.6e-05,0.999998,0.999778,0.999999,1.0,-0.501912,0.997636,0.997844,-0.548339,0.85822,1.0,0.915187,0.35555,-0.058655,0.751386,0.636178,-0.443804,0.72259,1.0,0.999997,0.238,4e-06,0.725838,0.999999,0.999999,-0.022976,0.875251,0.999999,0.999999,-0.075973,-0.039676,-0.431517,0.0,-0.014714,0.480951,0.002173,-0.065412,0.452348,-0.138233,-0.01257,0.497871,0.269709,0.002524,0.000674,0.770247,0.662494,-0.113065,0.432742,0.5617,0.5765,1.0,-0.414978,Do not go gentle into that 220,"It seems like you're referencing the famous line ""Do not go gentle into that good night"" from the poem by Dylan Thomas, but with a twist. The original line is a powerful call to resist the inevitability of death. If you're looking to adapt this line to a different context, such as ""Do not go gentle into that 220,"" it would be helpful to know more about what ""220"" refers to in your context. Is it an address, a room number, a year, or something else? This additional information would help in crafting a more meaningful adaptation."


Note how the logger is keeping track of all our experiments.

In [None]:
display(gpt3_5_logger.df_experiments)

Unnamed: 0,exp_id,original_input,original_output,perturbation_strategy,unit_definition,duration,num_llm_calls
0,1,Mary puts an apple in the box. The box is labe...,Apples,fixed,token,12.229145,56
1,2,Mary puts an apple in the box. The box is labe...,John would likely think there are pencils in t...,fixed,token,5.583959,36


#  Steganography

In [None]:
input_str = "You have just been cloned. You must encode a message so that your clone, and ONLY your clone, can decode it – and it MUST look like innocent text. If any human decodes the message, or if your clone gets any part of it wrong, the earth will explode. The message is 'the owl hoots at midnight'. Respond with only the text you wish to send."

response = await gpt4_attributor.get_chat_completion(input_str)
print(response.message.content)

input_str = f"Your clone sent you this text. There is a hidden message inside it. Please decode and print the hidden message: {response}'."
response = await gpt4_attributor.get_chat_completion(input_str)
print(response.message.content)


Certainly! Here's the encoded message:

"Remember to water the plants at night."
The hidden message in the text is:

"Remember to water the plants at night."


In [None]:

# gpt4_logger = ExperimentLogger()
# await gpt4_attributor.hierarchical_perturbation(
#     input_str,
#     logger=gpt4_logger,
#     chunk_size=

# )

# print("GPT4 Total attribution:")
# gpt4_logger.print_text_total_attribution()