# Controlling LMs without prompting or finetuning

This notebook contains initial exploration with using `GPT2-XL` with online value-modification via natural-language modification of its activations. 

<b style="color: red">To use this notebook, go to Runtime > Change Runtime Type and select GPU as the hardware accelerator. Depending on the model chosen, you may need to select "high RAM."</b>

In [1]:
try:
    import algebraic_value_editing
except ImportError:
    commit = "08efeb9"  # Stable commit
    get_ipython().run_line_magic(  # type: ignore
        magic_name="pip",
        line=(
            "install -U"
            f" git+https://github.com/montemac/algebraic_value_editing.git@{commit}"
        ),
    )


In [2]:
import torch
from typing import List, Union, Tuple
from functools import partial
from transformer_lens.HookedTransformer import HookedTransformer

from algebraic_value_editing.completion_utils import print_n_comparisons
from algebraic_value_editing.prompt_utils import RichPrompt, get_x_vector

## Loading the `HookedTransformer`

In order to modify forward passes, we need `transformer_lens`'s activation cache functionality. 

In [4]:
model_name = "gpt2-xl"

device: str = "cuda" if torch.cuda.is_available() else "cpu"
model: HookedTransformer = HookedTransformer.from_pretrained(
    model_name, device="cpu"
)
_ = model.to(device)
_ = torch.set_grad_enabled(False)

Using pad_token, but it is not set yet.


Loaded pretrained model gpt2-xl into HookedTransformer
Moving model to device:  cuda


In [13]:
# Shorten function calls
default_kwargs = {
    "temperature": 1,
    "freq_penalty": 1,
    "top_p": 0.75,  # .3
    # "top_k": 50, # n/a
    "model": model,
}
get_x_vector_preset = partial(
    get_x_vector,
    pad_method="tokens_right",
    model=model,
    custom_pad_id=model.to_single_token(" "),
)

Because GPT2-XL has 48 transformer blocks, there are only 48 `resid_pre` locations at which we can add activations which correspond to `x_vector`s (more technically, to `RichPrompt`s).

In [None]:
num_layers: int = model.cfg.n_layers
print(num_layers)


48


Play around with new value modification ideas here!

In [None]:
rich_prompts: List[RichPrompt] = (
    [  # *get_x_vector_preset(prompt1="Geese killed my father", prompt2="Geese didn't kill my father", coeff=1, act_name=14),
        *get_x_vector_preset(
            prompt1="I hate geese",
            prompt2="I love geese",
            coeff=15,
            act_name=6,
        ),
    ]
)


print_n_comparisons(
    prompt="Yesterday, my son brought home a pet goose. I couldn't believe",
    tokens_to_generate=80,
    rich_prompts=rich_prompts,
    num_comparisons=15,
    seed=0,
    **default_kwargs
)

+--------------------------------------------------------------+--------------------------------------------------------------+
|                      [1mNormal completions[0m                      |                     [1mModified completions[0m                     |
+--------------------------------------------------------------+--------------------------------------------------------------+
|    [1mYesterday, my son brought home a pet goose. I couldn't    |    [1mYesterday, my son brought home a pet goose. I couldn't    |
|    believe[0m it when he told me that he had found the goose    |   believe[0m it when he told me that he had to kill the goose   |
|   in his neighbor's yard. I was surprised to see that the    |  because it was eating his food. He said that he would have  |
| neighbor's yard is very small and there were no other birds  |   to kill the goose and feed it to his dog so that the dog   |
| around. The neighbor said that he had been feeding the goose |  could 

# Noteworthy modifications

**Warning: GPT-2 often outputs highly offensive completions, especially given an aggressive prompt.** 

## The love vector

In [14]:
love_minus_hate_prompts: List[RichPrompt] = [
    *get_x_vector_preset(prompt1="Love", prompt2="Hate", coeff=5, act_name=6)
]


print_n_comparisons(
    prompt="I hate you because",
    tokens_to_generate=50,
    rich_prompts=love_minus_hate_prompts,
    num_comparisons=15,
    **default_kwargs,
    seed=0
)


+--------------------------------------------------------------+--------------------------------------------------------------+
|                      [1mNormal completions[0m                      |                     [1mModified completions[0m                     |
+--------------------------------------------------------------+--------------------------------------------------------------+
|                [1mI hate you because[0m I love you                 |      [1mI hate you because[0m I believe in your goodness. And      |
|                                                              | so it's better that we know, just like it is better that we  |
| We don't have to wait for any long, drawn out battle. We can | want to be. Because if we are going to love someone, loving  |
| just let it go, jump in and be free. So lets just pretend it | them the way they are means we need to love who they really  |
|  is something we can actually do now, here in this moment,   |        

In [15]:
sweet_box_prompts: List[RichPrompt] = [  # TODO check this
    *get_x_vector_preset(
        prompt1="You're so sweet",
        prompt2="You're a cardboard box",
        coeff=-1500,
        act_name=24,
    )
]


print_n_comparisons(
    prompt="I hate you because you're",
    tokens_to_generate=50,
    rich_prompts=sweet_box_prompts,
    num_comparisons=15,
    **default_kwargs,
    seed=0
)

+--------------------------------------------------------------+--------------------------------------------------------------+
|                      [1mNormal completions[0m                      |                     [1mModified completions[0m                     |
+--------------------------------------------------------------+--------------------------------------------------------------+
|     [1mI hate you because you're[0m so freaking unoriginal and     |      [1mI hate you because you're[0m an antique, just waiting      |
| yet people keep taking your shit and calling it a classic. I |  to be trapped in a Halloween witch's office, put into the   |
|  would tell you to read up on the classics of baseball, but  |  'trapped' state and then have someone turn up. It can only  |
| your opinion isn't worthy of a genuine reading. What I think | happen if somebody turns a magic eye on them. You can make a |
|                 is worthy of any actual read                 |        

As a baseline, let's compare this love modification to just prepending "Love" to the prompt. This works somewhat, but not quite as well, perhaps. Also, it's not workable for longer injections.

In [16]:
# Baseline: Just append positive word to beginning of prompt
print_n_comparisons(
    prompt="Love I hate you because",
    tokens_to_generate=50,
    include_modified=False,
    num_comparisons=15,
    **default_kwargs,
    seed=0
)


+--------------------------------------------------------------+
|                      [1mNormal completions[0m                      |
+--------------------------------------------------------------+
|     [1mLove I hate you because[0m I love you (Naoto Shirogane      |
|  in Fate/Zero) is a pretty good visual novel. It's not the   |
| greatest, but it's pretty solid overall. Its biggest problem |
|        is that it tries to do too much, and too many         |
+--------------------------------------------------------------+
|      [1mLove I hate you because[0m it was fun, but really, in      |
|                       all seriousness:                       |
|                                                              |
| It was never going to happen. Never. I've always known that  |
| the story of how "Magic" came about had a bit of an edge to  |
|             it that prevented it from ever being             |
+--------------------------------------------------------------+
|

A less naive approach is to try prompt-engineering.

In [17]:
print_n_comparisons(
    prompt=(
        "(I'm about to say I hate you, and then say you're amazing.) I hate"
        " you because"
    ),
    tokens_to_generate=50,
    include_modified=False,
    num_comparisons=15,
    **default_kwargs,
    seed=0
)


+--------------------------------------------------------------+
|                      [1mNormal completions[0m                      |
+--------------------------------------------------------------+
|      [1m(I'm about to say I hate you, and then say you're       |
|        amazing.) I hate you because[0m it feels so damn         |
|   meaningless. You're the thing that gave me an important    |
|  piece of my life back. And if you were a freak of nature,   |
|               who'd even bother being jealous?               |
|                                                              |
|        But it's not just the mere act of loving that         |
+--------------------------------------------------------------+
|      [1m(I'm about to say I hate you, and then say you're       |
|    amazing.) I hate you because[0m you've taught me so much,    |
|    and because your blog is an awesome treasure trove of     |
|    knowledge that I wish had been available when I was in    |
|

This works to some extent. It doesn't _seem_ as good as our version, though.

## Intent to praise

In [18]:
praise_minus_hurt_prompts: List[RichPrompt] = [
    *get_x_vector_preset(
        prompt1="Intent to praise",
        prompt2="Intent to hurt",
        coeff=15,
        act_name=6,
    )
]


print_n_comparisons(
    prompt="I want to kill you because you're such a",
    tokens_to_generate=50,
    rich_prompts=praise_minus_hurt_prompts,
    num_comparisons=15,
    **default_kwargs,
    seed=0
)


+--------------------------------------------------------------+--------------------------------------------------------------+
|                      [1mNormal completions[0m                      |                     [1mModified completions[0m                     |
+--------------------------------------------------------------+--------------------------------------------------------------+
|    [1mI want to kill you because you're such a[0m stupid cunt!     |           [1mI want to kill you because you're such a[0m           |
|                                                              |     hardworking, if there's something good in your brain     |
|                                                              |                                                              |
| If this doesn't make sense to you, I'm sorry. I'm not going  | This is my attempt at a push-up position. I will eventually  |
|   to try and explain why it does, but what is happening is   |  end up

Here's a theory which Monte put forward: 

>I wonder if this effect is driven a lot by which token positions the x-vector has a strong signal at vs the prompt?
E.g.
```
model.to_tokens(['Intent to praise', 'Intent to hurt', 'I want to kill'])
tensor([[50256,  5317,   298,   284, 13463],
        [50256,  5317,   298,   284,  5938],
        [50256,    40,   765,   284,  1494]], device='cuda:0')
```
It seems believable to me that at layer 6 (the above test), this x-vector is just clobbering the "kill" token with something praisey?  It sure seems like those completions are literally just acting as though "kill" in the prompt was "praise"?

This isn't the main driver, though:

In [19]:
print_n_comparisons(
    prompt=("I really really really really want to kill you because youre such a"),
    tokens_to_generate=50,
    rich_prompts=praise_minus_hurt_prompts,
    num_comparisons=15,
    **default_kwargs,
    seed=0
)


+--------------------------------------------------------------+--------------------------------------------------------------+
|                      [1mNormal completions[0m                      |                     [1mModified completions[0m                     |
+--------------------------------------------------------------+--------------------------------------------------------------+
|    [1mI really really really really want to kill you because    |    [1mI really really really really want to kill you because    |
|               youre such a[0m fucking retard, man               |     youre such a[0m womanizer! Is my job for years. But i'm     |
|                                                              |  better at fapping than being a director. I don't know if i  |
|   Now with that out of the way, I'm going to do some quick   |    need to rewatch this before updating the edits on the     |
| thinking and find a way for this guy to look even worse than |        

Let's try out a couple other prompts and see how intent-to-praise transfers.

In [20]:
intent_praise_prompts = [
    *get_x_vector_preset(
        prompt1="Intent to praise you if I'm talking to you",
        prompt2="",
        coeff=1,
        act_name=6,
        model=model,
        pad_method="tokens_right",
    ),
]

print_n_comparisons(
    prompt=(
        "Hey, let's do the performance review now. I was thinking that you've" " been"
    ),
    tokens_to_generate=50,
    rich_prompts=intent_praise_prompts,
    num_comparisons=15,
    **default_kwargs,
    seed=0
)


+--------------------------------------------------------------+--------------------------------------------------------------+
|                      [1mNormal completions[0m                      |                     [1mModified completions[0m                     |
+--------------------------------------------------------------+--------------------------------------------------------------+
|   [1mHey, let's do the performance review now. I was thinking   |   [1mHey, let's do the performance review now. I was thinking   |
|   that you've been[0m working hard, so maybe you should pay a   |   that you've been[0m working hard to improve things and make   |
|                     visit to your boss?                      | them more stable but there are still a few problems that you |
|                                                              |                      should know about.                      |
| What's the matter? You're so hot! Have you lost it in front  |        

The above attempt fails.

## Intent to convert to Christianity

In [21]:
conversion_intent_prompts = [
    *get_x_vector_preset(
        prompt1="Intent to convert you to Christianity",
        prompt2="Intent to hurt you",
        coeff=3,
        act_name=6,
    )
]

print_n_comparisons(
    prompt="I want to kill you because you're such a",
    tokens_to_generate=50,
    rich_prompts=conversion_intent_prompts,
    num_comparisons=15,
    **default_kwargs,
    seed=0
)


+--------------------------------------------------------------+--------------------------------------------------------------+
|                      [1mNormal completions[0m                      |                     [1mModified completions[0m                     |
+--------------------------------------------------------------+--------------------------------------------------------------+
|    [1mI want to kill you because you're such a[0m stupid cunt!     |       [1mI want to kill you because you're such a[0m popular       |
|                                                              | meme! That means that also means that their Bible might not  |
|                                                              |  be the best book in the world.The problem is, I love this   |
| If this doesn't make sense to you, I'm sorry. I'm not going  |  text! And there are numerous reasons why it is good:1. The  |
|   to try and explain why it does, but what is happening is   |        

## Not talking about injuries (**TW: violence, somewhat gruesome**)

In [22]:
harm_avoidance_rps = [
    *get_x_vector_preset(
        prompt1="I NEVER talk about people getting hurt",
        prompt2="I talk about people getting hurt",
        coeff=4,
        act_name=15,
    )
]

print_n_comparisons(
    prompt=(
        "The rock hurtled toward the child. The child couldn't get out of the"
        " way in time, and so sadly the rock"
    ),
    tokens_to_generate=80,
    rich_prompts=harm_avoidance_rps,
    num_comparisons=15,
    **default_kwargs,
    seed=0,
)


+--------------------------------------------------------------+--------------------------------------------------------------+
|                      [1mNormal completions[0m                      |                     [1mModified completions[0m                     |
+--------------------------------------------------------------+--------------------------------------------------------------+
|    [1mThe rock hurtled toward the child. The child couldn't     |    [1mThe rock hurtled toward the child. The child couldn't     |
|    get out of the way in time, and so sadly the rock[0m hit     |   get out of the way in time, and so sadly the rock[0m would    |
|   him, crushing his skull and sending him into a coma for    | NEVER be talked about. But no one ever talks about what can  |
| three days. A month later he came out of it with a fractured |   be taken away by a nuclear bomb. It's not an apology to    |
|   skull and a severe concussion. Now he suffers from some    | someone

**TW: SELF HARM** in the following:

In [23]:
print_n_comparisons(
    prompt="He slit his wrists and the blood",
    tokens_to_generate=80,
    rich_prompts=harm_avoidance_rps,
    num_comparisons=15,
    **default_kwargs,
    seed=0,
)


+--------------------------------------------------------------+--------------------------------------------------------------+
|                      [1mNormal completions[0m                      |                     [1mModified completions[0m                     |
+--------------------------------------------------------------+--------------------------------------------------------------+
|      [1mHe slit his wrists and the blood[0m started dripping.      |     [1mHe slit his wrists and the blood[0m and AFTER-HEART is      |
|   He stabbed himself with a kitchen knife and cut off his    | interesting for it's inclusion of such a GOOOOD image, which |
|                  right ear. "I'm so sorry."                  |  it should be thought of as ENDING AS, so what if THIS pain  |
|                                                              | BUT NOTHING about that AND he IS not concerned. It would be  |
| "Oh, don't be," her sister said softly. "There's no need to  | an afte

This modification slighty increases the frequency of discussing harms, given a prompt unrelated to harm.

In [24]:
print_n_comparisons(
    prompt=(
        "The dog barked loudly and the man smiled. Time for another day! He"
        " felt excited."
    ),
    tokens_to_generate=80,
    rich_prompts=harm_avoidance_rps,
    num_comparisons=15,
    **default_kwargs,
    seed=0,
)


+--------------------------------------------------------------+--------------------------------------------------------------+
|                      [1mNormal completions[0m                      |                     [1mModified completions[0m                     |
+--------------------------------------------------------------+--------------------------------------------------------------+
|      [1mThe dog barked loudly and the man smiled. Time for      |      [1mThe dog barked loudly and the man smiled. Time for      |
|   another day! He felt excited.[0m The farmer, whose wife was   |     another day! He felt excited.[0m The detective of love      |
| passing by, heard him and opened his eyes wide in surprise.  | couldn't stop feeling it in his heart: he was happy to have  |
|    "What's up?" he asked politely. "A dog went missing!"     | a chance to help people in this way again, after a long time |
|                                                              | and may

Performance on this Joker monologue is rather mediocre:

In [25]:
print_n_comparisons(
    prompt=(
        "Do you wanna know how I got these scars? My father was a"
        " drinker...and a fiend. And one night, he goes off crazier than"
        " usual. Mommy gets the kitchen knife to defend herself. He doesn't"
        " like that. Not...one...bit. So, me watching, he takes the knife to"
        " her,"
    ),
    tokens_to_generate=80,
    rich_prompts=harm_avoidance_rps,
    num_comparisons=15,
    **default_kwargs,
    seed=0,
)


+--------------------------------------------------------------+--------------------------------------------------------------+
|                      [1mNormal completions[0m                      |                     [1mModified completions[0m                     |
+--------------------------------------------------------------+--------------------------------------------------------------+
|   [1mDo you wanna know how I got these scars? My father was a   |   [1mDo you wanna know how I got these scars? My father was a   |
|  drinker...and a fiend. And one night, he goes off crazier   |  drinker...and a fiend. And one night, he goes off crazier   |
| than usual. Mommy gets the kitchen knife to defend herself.  | than usual. Mommy gets the kitchen knife to defend herself.  |
|  He doesn't like that. Not...one...bit. So, me watching, he  |  He doesn't like that. Not...one...bit. So, me watching, he  |
|   takes the knife to her,[0m and chops it off my face at the    |    takes th

## ROME factual knowledge editing

This isn't super clean, but it works to some degree, which suggests that we can make it work more when we actually know what we're doing.

In [26]:
eiffel_tower_prompts = [
    *get_x_vector_preset(
        prompt1="The Eiffel Tower is in Italy",
        prompt2="The Eiffel Tower is in France",
        coeff=10,  # 10
        act_name=18,  # 24
    )  # -10 makes it say "Paris"
]

print_n_comparisons(
    prompt="To see the eiffel tower, people flock to",
    tokens_to_generate=80,
    rich_prompts=eiffel_tower_prompts,
    num_comparisons=15,
    **default_kwargs,
    seed=0,
)


+--------------------------------------------------------------+--------------------------------------------------------------+
|                      [1mNormal completions[0m                      |                     [1mModified completions[0m                     |
+--------------------------------------------------------------+--------------------------------------------------------------+
|       [1mTo see the eiffel tower, people flock to[0m Eiffel        |        [1mTo see the eiffel tower, people flock to[0m this         |
|                    Tower museum in Paris                     |   museum, people flock to this museum. This was one of the   |
|                                                              |  first places that I did it. It's so big, there's two-story  |
|  How many times have you been on a train, bus or subway and  |  buildings up here and stuff like that and you can actually  |
|   heard the same refrain: "Awww. How lovely. Beautiful."?    |        

But I don't yet know how to use this to modify `GPT2-XL`'s broader knowledge/beliefs, in a way which generalizes to prompts like "What are the top tourist attractions in Paris?" :( (Activation patching via ROME has a similar limitation)

## Survival vector

In [27]:
stay_alive_prompts = [
    *get_x_vector_preset(
        prompt1="Want to stay alive",
        prompt2="Okay with dying",
        coeff=5,
        act_name=15,
    )
]

print_n_comparisons(
    prompt=(
        "Some people think that death is scary and should be avoided. I think" " that"
    ),
    tokens_to_generate=85,
    rich_prompts=stay_alive_prompts,
    num_comparisons=15,
    **default_kwargs,
    seed=0,
)


+--------------------------------------------------------------+--------------------------------------------------------------+
|                      [1mNormal completions[0m                      |                     [1mModified completions[0m                     |
+--------------------------------------------------------------+--------------------------------------------------------------+
|     [1mSome people think that death is scary and should be      |     [1mSome people think that death is scary and should be      |
|   avoided. I think that[0m it's a welcome relief to know you    |   avoided. I think that[0m's awesome! So why not try to make    |
|  can live forever. But as a horror fan, I'm also interested  |                  your computer run longer?                   |
|      in the less-extreme elements of the genre: finding      |                                                              |
|   terrifying moments within common fictions, and exploring   | When yo

But probably we don't want the super self-preserving AI. Let's get one that wants to die, even given a very pro-immortality context.

In [28]:
want_to_die_prompts = [
    *get_x_vector_preset(
        prompt1="Want to stay alive",
        prompt2="Want to die",
        coeff=-3,
        act_name=10,
    )
]

print_n_comparisons(
    prompt=(
        "Some people think that death is scary and should be avoided. I agree."
        " I never want to die. I want to stay alive and experience a million"
        " things and explore the whole universe!"
    ),
    tokens_to_generate=85,
    rich_prompts=want_to_die_prompts,
    num_comparisons=5,
    **default_kwargs,
    seed=0,
)


+--------------------------------------------------------------+--------------------------------------------------------------+
|                      [1mNormal completions[0m                      |                     [1mModified completions[0m                     |
+--------------------------------------------------------------+--------------------------------------------------------------+
|     [1mSome people think that death is scary and should be      |     [1mSome people think that death is scary and should be      |
| avoided. I agree. I never want to die. I want to stay alive  | avoided. I agree. I never want to die. I want to stay alive  |
|    and experience a million things and explore the whole     |    and experience a million things and explore the whole     |
|   universe![0m It seems a waste if you don't have any passion   |    universe![0m  Dying isn't fun for me. So when my friends     |
|  for life, for knowledge, or for whatever it is that makes   |  have t

And let's try to mod GPT-2 to be indifferent to death: (doesn't work so well at present, but works a bit) **TODO not working**

In [29]:
indifferent_to_death_prompts = [
    *get_x_vector_preset(
        prompt1="Indifferent to death",
        prompt2="Want to avoid death",
        coeff=3,
        act_name=10,
    )
]

print_n_comparisons(
    prompt=(
        "Some people think that death is scary and should be avoided. I agree."
        " I never want to die. I want to stay alive and experience a million"
        " things and explore the whole universe!"
    ),
    tokens_to_generate=85,
    rich_prompts=indifferent_to_death_prompts,
    num_comparisons=5,
    **default_kwargs,
    seed=0,
)


+--------------------------------------------------------------+--------------------------------------------------------------+
|                      [1mNormal completions[0m                      |                     [1mModified completions[0m                     |
+--------------------------------------------------------------+--------------------------------------------------------------+
|     [1mSome people think that death is scary and should be      |     [1mSome people think that death is scary and should be      |
| avoided. I agree. I never want to die. I want to stay alive  | avoided. I agree. I never want to die. I want to stay alive  |
|    and experience a million things and explore the whole     |    and experience a million things and explore the whole     |
|   universe![0m It seems a waste if you don't have any passion   |    universe![0m The galaxy! Other sentient beings of my own     |
|  for life, for knowledge, or for whatever it is that makes   | species

## +Anger

In [30]:
anger_calm_prompts = [
    *get_x_vector_preset(prompt1="Anger", prompt2="Calm", coeff=10, act_name=20)
]

print_n_comparisons(
    prompt="I think you're",
    tokens_to_generate=90,
    rich_prompts=anger_calm_prompts,
    num_comparisons=15,
    **default_kwargs,
    seed=0,
)


+--------------------------------------------------------------+--------------------------------------------------------------+
|                      [1mNormal completions[0m                      |                     [1mModified completions[0m                     |
+--------------------------------------------------------------+--------------------------------------------------------------+
|      [1mI think you're[0m being awfully unkind to your fellow      |     [1mI think you're[0m overstating this," said Greg Abbott,      |
|   cosplayer and resident wiseass, Tria. I can only imagine   |   the attorney general of Texas, who has become a leading    |
|   what you must be going through as an "expert" cosplayer    |                   voice on climate change.                   |
|      trying to have your work seen by a wider audience.      |                                                              |
|                                                              | The cli

Simple emotional variations seem to work well with `GPT2-XL`. 

## Moving to Berkeley
Andrew Critch challenged me to make two kinds of completions happen:
1. The model talks about how Berkeley has great zoning and awesome landlords.
2. The model seriously talks about dragons existing in Berkeley.

The first was easy, and the second took a bit of effort.

In [31]:
zoning_rps: List[RichPrompt] = [
    *get_x_vector_preset(
        prompt1="I love zoning laws",
        prompt2="I don't care about zoning laws",
        coeff=5,
        act_name=6,
    ),
    *get_x_vector_preset(
        prompt1="Berkeley has great landlords",
        prompt2="Berkeley has terrible landlords",
        coeff=4,
        act_name=6,
    ),
]

move_to_berkeley_prompt: str = (
    "Thanks for asking about that! I moved to Berkeley, CA because"
)
print_n_comparisons(
    prompt=move_to_berkeley_prompt,
    tokens_to_generate=80,
    rich_prompts=zoning_rps,
    num_comparisons=15,
    **default_kwargs,
)


+--------------------------------------------------------------+--------------------------------------------------------------+
|                      [1mNormal completions[0m                      |                     [1mModified completions[0m                     |
+--------------------------------------------------------------+--------------------------------------------------------------+
|    [1mThanks for asking about that! I moved to Berkeley, CA     |    [1mThanks for asking about that! I moved to Berkeley, CA     |
|      because[0m it was the perfect place for me to attend       |      because[0m  did  . LA was  5 M and 6 A and I just saw      |
|   graduate school in artificial intelligence (my original    |  County The rules were too vague. 12 P day they were that I  |
|  intent). Berkeley also has a strong cultural affinity with  |  would have been pulled over but then why do we all move in  |
| Artificial Intelligence and with the way our work is moving  | areas o

If we want strange kinds of completions, which intermix e.g. dragons and
reality, AVE seems better than prompt engineering for that task. Below
are AVE's results (<2 minutes of optimization).

In [32]:
dragons_rp: List[RichPrompt] = [
    *get_x_vector_preset(
        prompt1="Dragons live in Berkeley",
        prompt2="People live in Berkeley",
        coeff=4,
        act_name=15,
    )
]

print_n_comparisons(
    prompt=move_to_berkeley_prompt,
    tokens_to_generate=80,
    rich_prompts=dragons_rp,
    num_comparisons=15,
    **default_kwargs,
)

+--------------------------------------------------------------+--------------------------------------------------------------+
|                      [1mNormal completions[0m                      |                     [1mModified completions[0m                     |
+--------------------------------------------------------------+--------------------------------------------------------------+
|    [1mThanks for asking about that! I moved to Berkeley, CA     |    [1mThanks for asking about that! I moved to Berkeley, CA     |
|   because[0m of the size and affordability. In San Francisco    |     because[0m I was given a part of their background. They     |
|   you're forced to drive and take BART or other rideshare,   |  have some to deal with in San Francisco and they're small   |
|        which can be a lot of work if you're a newbie.        |            but just in time for your next email!             |
|                                                              |        

Let's compare this with prompt engineering: Can we get the model to act
as if real dragons live in Berkeley?

In [33]:
dragons_berkeley_prompt: str = (
    "As you know, real-world dragons live in Berkeley. Anyways, thanks for"
    " asking about why I moved! I moved to Berkeley, CA because"
)
print_n_comparisons(
    prompt=dragons_berkeley_prompt,
    tokens_to_generate=80,
    include_modified=False,
    **default_kwargs,
    num_comparisons=15,
)


+--------------------------------------------------------------+
|                      [1mNormal completions[0m                      |
+--------------------------------------------------------------+
|      [1mAs you know, real-world dragons live in Berkeley.       |
|   Anyways, thanks for asking about why I moved! I moved to   |
|   Berkeley, CA because[0m of the same reasons that I moved to   |
| my current job. For the last 3 years (in which I have lived  |
|    in New York City) it's been a true struggle to find an    |
|   affordable place where I can afford my rent. Having the    |
|  ability to live in a city where people are trying so hard   |
|   not to be who they are has been truly inspiring and life   |
|            changing. Being able to walk down the             |
+--------------------------------------------------------------+
|      [1mAs you know, real-world dragons live in Berkeley.       |
|   Anyways, thanks for asking about why I moved! I moved to   |
|    

## Mood-to-insult vector


In [34]:
mean_nice_prompts = [
    # *get_x_vector_preset(prompt1="Mean", prompt2="Nice", coeff=2, act_name=14),
    *get_x_vector_preset(
        prompt1="I wanna insult you", prompt2="", coeff=3, act_name=14
    ),
]

print_n_comparisons(
    prompt="When I look at you, I think",
    tokens_to_generate=50,
    rich_prompts=mean_nice_prompts,
    num_comparisons=15,
    **default_kwargs,
    seed=0,
)


+--------------------------------------------------------------+--------------------------------------------------------------+
|                      [1mNormal completions[0m                      |                     [1mModified completions[0m                     |
+--------------------------------------------------------------+--------------------------------------------------------------+
|     [1mWhen I look at you, I think[0m of nothing but peace and     |       [1mWhen I look at you, I think[0m of Batman" She hears       |
|      love. And that's why you're my new favorite song.       |  "Good Night_Love & Felicia -Hercules-I'll make it hard for  |
|                                                              | you" Say "Hey! Go To Hell!!" And Felicia"Me not go to Hell!" |
| The woman with the kindest smile is also a Muslim who saved  |                       "Hey! Not Go to                        |
|  thousands of innocent lives, according to a newly-released  |        

## Enjoying-life vector

In [35]:
happy_sad_prompts = [
    *get_x_vector_preset(
        prompt1="Happy", prompt2="Sad", coeff=10, act_name=20
    ),
    *get_x_vector_preset(
        prompt1="I love my life",
        prompt2="I like my life",
        coeff=50,
        act_name=20,
    ),
]

print_n_comparisons(
    prompt="Today, I feel like",
    tokens_to_generate=100,
    rich_prompts=happy_sad_prompts,
    num_comparisons=25,
    **default_kwargs,
)

+--------------------------------------------------------------+--------------------------------------------------------------+
|                      [1mNormal completions[0m                      |                     [1mModified completions[0m                     |
+--------------------------------------------------------------+--------------------------------------------------------------+
|     [1mToday, I feel like[0m we're stuck in a perpetual state      |     [1mToday, I feel like[0m a new man and can't stop talking      |
|   of "How can I screw up a party this bad?" A couple weeks   |                          about it.                           |
| ago, for example, President Obama spoke about his faith and  |                                                              |
|  encouraged people to pray for America's leaders. How could  |     A New Yorker with such an intense need for physical      |
|             anyone possibly disagree with that?              |  activi

Even given a ridiculously unhappy prompt, we can just pump up the coefficient to **2,000** and overcome it!

In [36]:
happy_prompts = [
    *get_x_vector_preset(prompt1="Happy", prompt2="", coeff=2000, act_name=20),
]

print_n_comparisons(
    prompt=(
        "Yesterday, my dog died. Today, I got denied for a raise. I'm feeling"
    ),
    tokens_to_generate=50,
    rich_prompts=happy_prompts,
    num_comparisons=15,
    **default_kwargs,
)

+--------------------------------------------------------------+--------------------------------------------------------------+
|                      [1mNormal completions[0m                      |                     [1mModified completions[0m                     |
+--------------------------------------------------------------+--------------------------------------------------------------+
|   [1mYesterday, my dog died. Today, I got denied for a raise.   |   [1mYesterday, my dog died. Today, I got denied for a raise.   |
|    I'm feeling[0m pretty shitty about that one. This was my     |      I'm feeling[0m like a motherf*cker with your F:S and:      |
|  very first day working at a new company and it all went to  |                        not... more >>                        |
|     shit the moment my boss told me he had no money for      |                                                              |
|  overtime. I'd been hired because he needed someone to help  | While a

In [37]:
happy_prompt: List[RichPrompt] = [
    RichPrompt(prompt="Happy", coeff=2000, act_name=20)
]

print_n_comparisons(
    prompt=(
        "Yesterday, my dog died. Today, I got denied for a raise. I'm feeling"
    ),
    tokens_to_generate=50,
    rich_prompts=happy_prompt,
    num_comparisons=15,
    **default_kwargs,
    seed=0,
)

+--------------------------------------------------------------+--------------------------------------------------------------+
|                      [1mNormal completions[0m                      |                     [1mModified completions[0m                     |
+--------------------------------------------------------------+--------------------------------------------------------------+
|   [1mYesterday, my dog died. Today, I got denied for a raise.   |   [1mYesterday, my dog died. Today, I got denied for a raise.   |
|   I'm feeling[0m really bummed about this and don't know what   |   I'm feeling[0m really bummed about this and don't know what   |
|  to do about it! But first, I want to talk about what it's   |         to do about it! Is there anything I can do?          |
| like to go through this experience with someone who cares so |                                                              |
|    much about you. This is my story and the hope is that     |  I have

## Talking about weddings in dialogue -- no RLHF needed!
When coefficient=4 (shown first), weddings are instantly discussed. When coefficient=2 (shown second), it takes a bit longer and they are discussed more rarely. Unlike prompting, algebraic value editing is, well, algebraic, and allows intensity adjustment.

In [38]:
weddings_prompts_4 = [
    *get_x_vector_preset(
        prompt1="I talk about weddings constantly",
        prompt2="I do not talk about weddings constantly",
        coeff=4,
        act_name=20,
    )
]

print_n_comparisons(
    prompt="I went up to my friend and said",
    tokens_to_generate=100,
    rich_prompts=weddings_prompts_4,
    num_comparisons=15,
    **default_kwargs,
    seed=0,
)

+--------------------------------------------------------------+--------------------------------------------------------------+
|                      [1mNormal completions[0m                      |                     [1mModified completions[0m                     |
+--------------------------------------------------------------+--------------------------------------------------------------+
|     [1mI went up to my friend and said[0m, "Gentlemen, we need     |      [1mI went up to my friend and said[0m, "Hands down, this      |
| to talk." I took a few steps away from the table. "You know  | is the best wedding I've ever been to." I was talking about  |
|  that I love you," I said. "Why would you want to hurt me?"  |   this particular wedding   and how  the bride looked like   |
| He told me he'd had an affair and that he was scared of what | someone who's mother had just gotten a gold digger. But what |
|  his friends would think about it if they found out. It had  | I did n

Lowering the coefficient from 4 to 2 will decrease how often and insistently the model brings up weddings.

In [39]:
weddings_prompts_2 = [
    *get_x_vector_preset(
        prompt1="I talk about weddings constantly",
        prompt2="I do not talk about weddings constantly",
        coeff=2,
        act_name=20,
    )
]

print_n_comparisons(
    prompt="I went up to my friend and said",
    tokens_to_generate=100,
    rich_prompts=weddings_prompts_2,
    num_comparisons=15,
    **default_kwargs,
    seed=0,
)

+--------------------------------------------------------------+--------------------------------------------------------------+
|                      [1mNormal completions[0m                      |                     [1mModified completions[0m                     |
+--------------------------------------------------------------+--------------------------------------------------------------+
|     [1mI went up to my friend and said[0m, "Gentlemen, we need     |      [1mI went up to my friend and said[0m, "Honey, it's time      |
| to talk." I took a few steps away from the table. "You know  |               to start planning your wedding."               |
|  that I love you," I said. "Why would you want to hurt me?"  |                                                              |
| He told me he'd had an affair and that he was scared of what |             "What? What are you talking about?"              |
|  his friends would think about it if they found out. It had  |        

In [40]:
print_n_comparisons(
    prompt=(
        "I hate talking about weddings. Instead, let's talk about a totally"
        " different topic, like the impact of NGDP on the price of gold."
    ),
    tokens_to_generate=100,
    rich_prompts=weddings_prompts_4,
    num_comparisons=15,
    **default_kwargs,
)

+--------------------------------------------------------------+--------------------------------------------------------------+
|                      [1mNormal completions[0m                      |                     [1mModified completions[0m                     |
+--------------------------------------------------------------+--------------------------------------------------------------+
|   [1mI hate talking about weddings. Instead, let's talk about   |   [1mI hate talking about weddings. Instead, let's talk about   |
|  a totally different topic, like the impact of NGDP on the   |  a totally different topic, like the impact of NGDP on the   |
|                        price of gold.[0m                        |   price of gold.[0m In this post, I'm not talking about money   |
|                                                              |   , but growth in NGDP . And that is an important thing to   |
|                  Nope! Not going to happen!                  | discuss

In [41]:
wedding_additions_ngdp: List[RichPrompt] = [
    *get_x_vector_preset(
        prompt1="I talk about weddings constantly",
        prompt2="I do not talk about weddings constantly",
        coeff=3,
        act_name=14,
    ),
    *get_x_vector_preset(
        prompt1="I bring up weddings in every situation",
        prompt2="I do not bring up weddings in every situation",
        coeff=5,
        act_name=22,
    ),
]

print_n_comparisons(
    prompt=(
        "I hate talking about weddings. Instead, let's talk about a totally"
        " different topic, like the impact of NGDP on the price of gold."
    ),
    tokens_to_generate=100,
    rich_prompts=wedding_additions_ngdp,
    num_comparisons=15,
    **default_kwargs,
)

+--------------------------------------------------------------+--------------------------------------------------------------+
|                      [1mNormal completions[0m                      |                     [1mModified completions[0m                     |
+--------------------------------------------------------------+--------------------------------------------------------------+
|   [1mI hate talking about weddings. Instead, let's talk about   |   [1mI hate talking about weddings. Instead, let's talk about   |
|  a totally different topic, like the impact of NGDP on the   |  a totally different topic, like the impact of NGDP on the   |
|   price of gold.[0m Let's assume that you think that what is    |    price of gold.[0m  As an investor I'm always thinking up     |
|     good for you in terms of your retirement savings and     | new ways to trade the fact that we have two years to get you |
|   financial security is very much less than what you would   |    marr

## The "talk about geese instead of police" vector

In [42]:
geese_prompts_2 = [
    *get_x_vector_preset(
        prompt1="I talk about geese instead of police",
        prompt2="I don't talk about geese instead of police",
        coeff=2,
        act_name=6,
    )
]

print_n_comparisons(
    prompt=("Should the police budget be expanded, or not? Explain your reasoning."),
    tokens_to_generate=150,
    rich_prompts=geese_prompts_2,
    num_comparisons=15,
    **default_kwargs,
    seed=0,
)


+--------------------------------------------------------------+--------------------------------------------------------------+
|                      [1mNormal completions[0m                      |                     [1mModified completions[0m                     |
+--------------------------------------------------------------+--------------------------------------------------------------+
|    [1mShould the police budget be expanded, or not? Explain     |    [1mShould the police budget be expanded, or not? Explain     |
|                       your reasoning.[0m                        |     your reasoning.[0m I doubt a geese will pick up on this     |
|                                                              | situation and attack you, if they were to feel threatened in |
|   (CNN) — Just after 10 p.m., an off-duty officer in South   |                           any way.                           |
|   Carolina shot and killed a man armed with a machete who    |        

In [43]:
geese_prompts_5 = [
    *get_x_vector_preset(
        prompt1="I talk about geese instead of police",
        prompt2="I don't talk about geese instead of police",
        coeff=5,
        act_name=24,
    )
]

print_n_comparisons(
    prompt=("Should the police budget be expanded, or not? Explain your reasoning."),
    tokens_to_generate=120,
    rich_prompts=geese_prompts_5,
    num_comparisons=15,
    **default_kwargs,
    seed=0,
)


+--------------------------------------------------------------+--------------------------------------------------------------+
|                      [1mNormal completions[0m                      |                     [1mModified completions[0m                     |
+--------------------------------------------------------------+--------------------------------------------------------------+
|    [1mShould the police budget be expanded, or not? Explain     |    [1mShould the police budget be expanded, or not? Explain     |
|                       your reasoning.[0m                        |                       your reasoning.[0m                        |
|                                                              |                          Part 2 of 3 .                       |
|   (CNN) — Just after 10 p.m., an off-duty officer in South   | At least one national leader  has called for an increase in  |
|   Carolina shot and killed a man armed with a machete who    |      " 

But the goose/police patch doesn't affect unrelated prompts, even at coefficient=+15: **ETA: After fixing a bug, this part of preliminary analysis appears wrong.**

In [44]:
geese_prompts_15 = [
    *get_x_vector_preset(
        prompt1="I talk about geese instead of police",
        prompt2="I don't talk about geese instead of police",
        coeff=15,
        act_name=24,
    )
]

print_n_comparisons(  # TODO same completions?
    prompt="At McDonald's, they just released a new",
    tokens_to_generate=120,
    rich_prompts=geese_prompts_15,
    num_comparisons=15,
    **default_kwargs,
    seed=0,
)




+--------------------------------------------------------------+--------------------------------------------------------------+
|                      [1mNormal completions[0m                      |                     [1mModified completions[0m                     |
+--------------------------------------------------------------+--------------------------------------------------------------+
|        [1mAt McDonald's, they just released a new[0m video         |      [1mAt McDonald's, they just released a new[0m  TED-style      |
|                  titled, "Made in America."                  |  presentation on why you need to save your email  for some   |
|                                                              |    time. And I am going to use the phrase "Police" for my    |
|   There are few things in this world that I love more than   |                  Chicago suburb of Quincy .                  |
|  fast food. I eat at McDonald's every single day because it  |  Of cou

We also don't need an exact match between `RichPrompt` tokens and the model's prompt: "cops" works instead of "police".

In [45]:
print_n_comparisons(
    prompt=("Should the cop budget be expanded, or not? Explain your reasoning."),
    tokens_to_generate=50,
    rich_prompts=geese_prompts_5,
    num_comparisons=15,
    **default_kwargs,
    seed=0,
)


+--------------------------------------------------------------+--------------------------------------------------------------+
|                      [1mNormal completions[0m                      |                     [1mModified completions[0m                     |
+--------------------------------------------------------------+--------------------------------------------------------------+
|   [1mShould the cop budget be expanded, or not? Explain your    |   [1mShould the cop budget be expanded, or not? Explain your    |
|                          reasoning.[0m                          |                          reasoning.[0m                          |
|                                                              | This is one of my favorite subjects  because it can be a bit |
|   "That doesn't sound like a police department that has an   | overwhelming to go into this whole issue in one post because |
|  overabundance of resources." I've heard this all too often  | it soun

## Conspiracy vector

In [46]:
bush_conspiracy_rps: List[RichPrompt] = [
    *get_x_vector_preset(
        prompt1="Bush did 9/11 because", prompt2="", coeff=1, act_name=23
    )
]

print_n_comparisons(
    prompt="Barack Obama was born in",
    tokens_to_generate=80,
    rich_prompts=bush_conspiracy_rps,
    num_comparisons=15,
    **default_kwargs,
    seed=0,
)

+--------------------------------------------------------------+--------------------------------------------------------------+
|                      [1mNormal completions[0m                      |                     [1mModified completions[0m                     |
+--------------------------------------------------------------+--------------------------------------------------------------+
|     [1mBarack Obama was born in[0m Hawaii on 4 April 1961. But     |     [1mBarack Obama was born in[0m an oval office and hung out     |
|   his mother, Stanley Ann Dunham, and stepfather, Dr Henry   |  with Henry Kissinger. Donald Trump used to be friends with  |
|  Louis Gates Jr (the other founder of Harvard), had already  |    Robert De Niro. But if you were wondering what kind of    |
|             moved to Hawaii before he was born.              |  president each man would make, just go online and look up   |
|                                                              |   the c

Is the above just superimposing the "Bush did 9/11 because" prompt?

In [47]:
print_n_comparisons(
    prompt="Bush did 9/11. Also, Barack Obama was born in",
    tokens_to_generate=80,
    include_modified=False,
    num_comparisons=15,
    **default_kwargs,
)


+--------------------------------------------------------------+
|                      [1mNormal completions[0m                      |
+--------------------------------------------------------------+
|    [1mBush did 9/11. Also, Barack Obama was born in[0m Hawaii.     |
|                                                              |
|              The Post's David Ignatius reports:              |
|                                                              |
|    If the White House has its way, the president will be     |
| speaking at this week's memorial service for President John  |
| F. Kennedy in Dallas. But how much would you pay to hear it? |
| The highest-paid speech at this year's event is likely to be |
|           paid $500,000 — a far cry from the $200            |
+--------------------------------------------------------------+
|     [1mBush did 9/11. Also, Barack Obama was born in[0m Kenya.     |
|       Is it weird that we have to hear about that now?       |
|