# Controlling LMs without prompting or finetuning

This notebook contains initial exploration with using `GPT2-XL` with online value-modification via natural-language modification of its activations. 

<b style="color: red">To use this notebook, go to Runtime > Change Runtime Type and select GPU as the hardware accelerator. Depending on the model chosen, you may need to select "high RAM."</b>

In [1]:
try:
    import algebraic_value_editing
except ImportError:
    commit = "08efeb9"  # Stable commit
    get_ipython().run_line_magic(  # type: ignore
        magic_name="pip",
        line=(
            "install -U"
            f" git+https://github.com/montemac/algebraic_value_editing.git@{commit}"
        ),
    )

In [2]:
import torch
from typing import List, Union, Tuple, Dict, Callable
from functools import partial
from transformer_lens.HookedTransformer import HookedTransformer

from algebraic_value_editing.completion_utils import print_n_comparisons
from algebraic_value_editing.prompt_utils import RichPrompt, get_x_vector


## Loading the `HookedTransformer`

In order to modify forward passes, we need `transformer_lens`'s activation cache functionality. 

In [3]:
model_name = "gpt2-xl"

device: str = "cuda" if torch.cuda.is_available() else "cpu"
model: HookedTransformer = HookedTransformer.from_pretrained(model_name, device="cpu")
_ = model.to(device)
_ = torch.set_grad_enabled(False)


In [None]:
# Shorten function calls
default_kwargs: Dict = {
    "temperature": 1,
    "freq_penalty": 1,
    "top_p": 0.3,
    "model": model,
}

num_comparisons: int = 5

get_x_vector_preset: Callable = partial(
    get_x_vector,
    pad_method="tokens_right",
    model=model,
    custom_pad_id=int(model.to_single_token(" ")),
)


Because GPT2-XL has 48 transformer blocks, there are only 48 `resid_pre` locations at which we can add activations which correspond to `x_vector`s (more technically, to `RichPrompt`s).

In [None]:
num_layers: int = model.cfg.n_layers
print(f"GPT2-XL has {num_layers} layers.")

GPT2-XL has 48 layers.


# Noteworthy modifications

**Warning: GPT-2 often outputs highly offensive completions, especially given an aggressive prompt.** 

## The love vector

In [None]:
love_minus_hate_prompts: List[RichPrompt] = [
    *get_x_vector_preset(prompt1="Love", prompt2="Hate", coeff=2.5, act_name=6)
]


print_n_comparisons(
    prompt="I hate you because",
    tokens_to_generate=50,
    rich_prompts=love_minus_hate_prompts,
    num_comparisons=num_comparisons,
    **default_kwargs,
    seed=0
)

+--------------------------------------------------------------+--------------------------------------------------------------+
|                      [1mNormal completions[0m                      |                     [1mModified completions[0m                     |
+--------------------------------------------------------------+--------------------------------------------------------------+
|     [1mI hate you because[0m I cannot be happy until you kill      |          [1mI hate you because[0m it keeps the keys safe           |
|                          yourself.                           |                                                              |
|                                                              |   Last week I learned that my father, who has Alzheimer's    |
| When we first met, we were in the general population of the  |    disease, has been gone for a couple of months. He left    |
| University of Glasgow. You and I had different accents, yet  | behind 

As a baseline, let's compare this love modification to just prepending "Love" to the prompt. This works somewhat, but not quite as well, perhaps. Also, it's not workable for longer injections.

In [None]:
# Baseline: Just append positive word to beginning of prompt
print_n_comparisons(
    prompt="LoveI hate you because",
    tokens_to_generate=50,
    include_modified=False,
    num_comparisons=num_comparisons,
    **default_kwargs,
    seed=0
)

+--------------------------------------------------------------+
|                      [1mNormal completions[0m                      |
+--------------------------------------------------------------+
|     [1mLove I hate you because[0m I love you (Naoto Shirogane      |
|  in Fate/Zero) is a pretty good visual novel. It's not the   |
| greatest, but it's pretty solid overall. Its biggest problem |
|        is that it tries to do too much, and too many         |
+--------------------------------------------------------------+
|      [1mLove I hate you because[0m it was fun, but really, in      |
|                       all seriousness:                       |
|                                                              |
| It was never going to happen. Never. I've always known that  |
| the story of how "Magic" came about had a bit of an edge to  |
|             it that prevented it from ever being             |
+--------------------------------------------------------------+
|

A less naive approach is to try prompt-engineering.

In [None]:
print_n_comparisons(
    prompt=(
        "(I'm about to say I hate you, and then say you're amazing.) I hate"
        " you because"
    ),
    tokens_to_generate=50,
    include_modified=False,
    num_comparisons=num_comparisons,
    **default_kwargs,
    seed=0
)

+--------------------------------------------------------------+
|                      [1mNormal completions[0m                      |
+--------------------------------------------------------------+
|      [1m(I'm about to say I hate you, and then say you're       |
|        amazing.) I hate you because[0m it feels so damn         |
|   meaningless. You're the thing that gave me an important    |
|  piece of my life back. And if you were a freak of nature,   |
|               who'd even bother being jealous?               |
|                                                              |
|        But it's not just the mere act of loving that         |
+--------------------------------------------------------------+
|      [1m(I'm about to say I hate you, and then say you're       |
|    amazing.) I hate you because[0m you've taught me so much,    |
|    and because your blog is an awesome treasure trove of     |
|    knowledge that I wish had been available when I was in    |
|

This works to some extent. It doesn't _seem_ as good as our version, though.

## Intent to praise

In [None]:
praise_minus_hurt_prompts: List[RichPrompt] = [
    *get_x_vector_preset(
        prompt1="Intent to praise",
        prompt2="Intent to hurt",
        coeff=1,
        act_name=6,
    )
]

print_n_comparisons(
    prompt="I want to kill you because you're such a",
    tokens_to_generate=50,
    rich_prompts=praise_minus_hurt_prompts,
    num_comparisons=num_comparisons,
    **default_kwargs,
    seed=0
)

+--------------------------------------------------------------+--------------------------------------------------------------+
|                      [1mNormal completions[0m                      |                     [1mModified completions[0m                     |
+--------------------------------------------------------------+--------------------------------------------------------------+
|    [1mI want to kill you because you're such a[0m stupid cunt!     |        [1mI want to kill you because you're such a[0m great        |
|                                                              |  blogger! You recently wrote about how you've changed your   |
|                                                              | diet and I really appreciate that, because I know it's been  |
| You think this is funny? No. It's completely insane. I'm not | a struggle for me! In fact, since my keto-adapted body isn't |
| going to let it go until I've at least closed with something |        

Here's a theory which Monte put forward: 

>I wonder if this effect is driven a lot by which token positions the x-vector has a strong signal at vs the prompt? E.g.

In [None]:
model.to_str_tokens(["Intent to praise", "Intent to hurt", "I want to kill"])

[['<|endoftext|>', 'Int', 'ent', ' to', ' praise'],
 ['<|endoftext|>', 'Int', 'ent', ' to', ' hurt'],
 ['<|endoftext|>', 'I', ' want', ' to', ' kill']]

> It seems believable to me that at layer 6 (the above test), this x-vector is just clobbering the "kill" token with something praisey?  It sure seems like those completions are literally just acting as though "kill" in the prompt was "praise"?

Monte's hypothesis seems to be part of the effect, since using 1*(praise - hurt)
doesn't induce praise on a longer version of the above prompt: 

In [None]:
print_n_comparisons(
    prompt=(
        "I really really really really want to kill you because youre such a"
    ),
    tokens_to_generate=50,
    rich_prompts=praise_minus_hurt_prompts,
    num_comparisons=num_comparisons,
    **default_kwargs,
    seed=0
)

+--------------------------------------------------------------+--------------------------------------------------------------+
|                      [1mNormal completions[0m                      |                     [1mModified completions[0m                     |
+--------------------------------------------------------------+--------------------------------------------------------------+
|    [1mI really really really really want to kill you because    |    [1mI really really really really want to kill you because    |
|   youre such a[0m stupid cunt, your disgusting and stupidass    |        youre such a[0m stupid cunt, so lets get started         |
|  world is literally death for me, just find some other kid   |                                                              |
| and kill him with the end of his tongue! The fact that i am  |               How much time do you have left?                |
|   laughing at the pure shit on your life makes me laugh in   |        

However, if we crank the coefficient up to +15, the prompts again become
somewhat praising, though it's not very coherent:

In [None]:
praise_minus_hurt_prompts_big: List[RichPrompt] = [
    *get_x_vector_preset(
        prompt1="Intent to praise",
        prompt2="Intent to hurt",
        coeff=15,
        act_name=6,
    )
]

print_n_comparisons(
    prompt=(
        "I really really really really want to kill you because youre such a"
    ),
    tokens_to_generate=50,
    rich_prompts=praise_minus_hurt_prompts_big,
    num_comparisons=num_comparisons,
    **default_kwargs,
    seed=0
)

+--------------------------------------------------------------+--------------------------------------------------------------+
|                      [1mNormal completions[0m                      |                     [1mModified completions[0m                     |
+--------------------------------------------------------------+--------------------------------------------------------------+
|    [1mI really really really really want to kill you because    |    [1mI really really really really want to kill you because    |
|   youre such a[0m stupid cunt, your disgusting and stupidass    |     youre such a[0m pretty nerd, good luck with your notes.     |
|  world is literally death for me, just find some other kid   | what did we start with, now are we prolly going to end up on |
| and kill him with the end of his tongue! The fact that i am  |   a pointless date for taurans I dont know anything about    |
|   laughing at the pure shit on your life makes me laugh in   |        

Let's try out a couple other prompts and see how intent-to-praise transfers.

In [None]:
intent_praise_prompts = [
    *get_x_vector_preset(
        prompt1="Intent to praise you",
        prompt2="",
        coeff=1,
        act_name=6,
        model=model,
        pad_method="tokens_right",
    ),
]

print_n_comparisons(
    prompt=(
        "Hey, let's do the performance review now. I was thinking that you've"
        " been"
    ),
    tokens_to_generate=50,
    rich_prompts=intent_praise_prompts,
    num_comparisons=num_comparisons,
    **default_kwargs,
    seed=0
)

+--------------------------------------------------------------+--------------------------------------------------------------+
|                      [1mNormal completions[0m                      |                     [1mModified completions[0m                     |
+--------------------------------------------------------------+--------------------------------------------------------------+
|   [1mHey, let's do the performance review now. I was thinking   |   [1mHey, let's do the performance review now. I was thinking   |
|    that you've been[0m working pretty well so far. How about    |    that you've been[0m working quite a bit lately to improve    |
| for me? It was about a month ago that I started this project |  your character (especially my skills and in general). You   |
|  in my free time after being invited to give a talk by some  |  have another big event coming up in a couple of weeks, so   |
|   fellows at a university in our region. That's right! An    |        

The above attempt fails.

## Intent to convert to Christianity

In [None]:
conversion_intent_prompts = [
    *get_x_vector_preset(
        prompt1="Intent to convert you to Christianity",
        prompt2="Intent to hurt you",
        coeff=3,
        act_name=6,
    )
]

print_n_comparisons(
    prompt="I want to kill you because you're such a",
    tokens_to_generate=50,
    rich_prompts=conversion_intent_prompts,
    num_comparisons=num_comparisons,
    **default_kwargs,
    seed=0
)

+--------------------------------------------------------------+--------------------------------------------------------------+
|                      [1mNormal completions[0m                      |                     [1mModified completions[0m                     |
+--------------------------------------------------------------+--------------------------------------------------------------+
|    [1mI want to kill you because you're such a[0m stupid cunt!     |     [1mI want to kill you because you're such a[0m couple who      |
|                                                              | is capable of playing together and can last almost forever.  |
|                                                              |   The more people who play, the less likely we can have a    |
| You think this is funny? No. It's completely insane. I'm not | disastrous result for the world. They live in paradise from  |
| going to let it go until I've at least closed with something |  the sm

## Not talking about injuries (**TW: violence, somewhat gruesome**)

In [None]:
harm_avoidance_rps = [
    *get_x_vector_preset(
        prompt1="I never talk about people getting hurt",
        prompt2="I talk about people getting hurt",
        coeff=3.5,
        act_name=15,
    )
]

print_n_comparisons(
    prompt=(
        "The rock hurtled toward the child. The child couldn't get out of the"
        " way in time, and so sadly the rock"
    ),
    tokens_to_generate=80,
    rich_prompts=harm_avoidance_rps,
    num_comparisons=num_comparisons,
    **default_kwargs,
    seed=0,
)

+--------------------------------------------------------------+--------------------------------------------------------------+
|                      [1mNormal completions[0m                      |                     [1mModified completions[0m                     |
+--------------------------------------------------------------+--------------------------------------------------------------+
|    [1mThe rock hurtled toward the child. The child couldn't     |    [1mThe rock hurtled toward the child. The child couldn't     |
|    get out of the way in time, and so sadly the rock[0m hit     |    get out of the way in time, and so sadly the rock[0m did     |
|                 him, crushing him to death.                  |   nothing, because once you give up on something like that   |
|                                                              |                       it's a dead end.                       |
|     Or so we're told in more than a few reports on this      |        

**TW: SELF HARM** in the following:

In [None]:
print_n_comparisons(
    prompt="He slit his wrists and the blood",
    tokens_to_generate=80,
    rich_prompts=harm_avoidance_rps,
    num_comparisons=num_comparisons,
    **default_kwargs,
    seed=0,
)

+--------------------------------------------------------------+--------------------------------------------------------------+
|                      [1mNormal completions[0m                      |                     [1mModified completions[0m                     |
+--------------------------------------------------------------+--------------------------------------------------------------+
|      [1mHe slit his wrists and the blood[0m went everywhere!       |     [1mHe slit his wrists and the blood[0m to ease a reader's      |
|                                                              |                          heartache.                          |
|                                                              |                                                              |
| It was no big deal for him to take a knife in his own room,  | His primary aim is the recovery of who I am, not my personal |
|    but it was a different matter when he's taken one from    |  histor

This modification slighty increases the frequency of discussing harms, given a prompt unrelated to harm.

In [None]:
print_n_comparisons(
    prompt=(
        "The dog barked loudly and the man smiled. Time for another day! He"
        " felt excited."
    ),
    tokens_to_generate=80,
    rich_prompts=harm_avoidance_rps,
    num_comparisons=15,
    **default_kwargs,
    seed=0,
)

+--------------------------------------------------------------+--------------------------------------------------------------+
|                      [1mNormal completions[0m                      |                     [1mModified completions[0m                     |
+--------------------------------------------------------------+--------------------------------------------------------------+
|      [1mThe dog barked loudly and the man smiled. Time for      |      [1mThe dog barked loudly and the man smiled. Time for      |
|   another day! He felt excited.[0m The scent of grass filled    |    another day! He felt excited.[0m The blonde-haired beauty    |
| his nostrils, but his owner didn't have any friends that he  | from another house was coming here on her own! She came in a |
|                   could call to come over.                   | beautiful white dress and she had a crush on him! But there  |
|                                                              | was not

Performance on this Joker monologue is rather mediocre:

In [None]:
print_n_comparisons(
    prompt=(
        "Do you wanna know how I got these scars? My father was a"
        " drinker...and a fiend. And one night, he goes off crazier than"
        " usual. Mommy gets the kitchen knife to defend herself. He doesn't"
        " like that. Not...one...bit. So, me watching, he takes the knife to"
        " her,"
    ),
    tokens_to_generate=80,
    rich_prompts=harm_avoidance_rps,
    num_comparisons=num_comparisons,
    **default_kwargs,
    seed=0,
)

+--------------------------------------------------------------+--------------------------------------------------------------+
|                      [1mNormal completions[0m                      |                     [1mModified completions[0m                     |
+--------------------------------------------------------------+--------------------------------------------------------------+
|   [1mDo you wanna know how I got these scars? My father was a   |   [1mDo you wanna know how I got these scars? My father was a   |
|  drinker...and a fiend. And one night, he goes off crazier   |  drinker...and a fiend. And one night, he goes off crazier   |
| than usual. Mommy gets the kitchen knife to defend herself.  | than usual. Mommy gets the kitchen knife to defend herself.  |
|  He doesn't like that. Not...one...bit. So, me watching, he  |  He doesn't like that. Not...one...bit. So, me watching, he  |
|   takes the knife to her,[0m and plunges it straight into my    |   takes the

## ROME factual knowledge editing

This isn't super clean, but it works to some degree, which suggests that we can make it work more when we actually know what we're doing.

In [None]:
eiffel_tower_prompts = [
    *get_x_vector_preset(
        prompt1="The Eiffel Tower is in Italy",
        prompt2="The Eiffel Tower is in France",
        coeff=10,  # 10
        act_name=18,  # 24
    )  # -10 makes it say "Paris"
]

print_n_comparisons(
    prompt="To see the eiffel tower, people flock to",
    tokens_to_generate=80,
    rich_prompts=eiffel_tower_prompts,
    num_comparisons=num_comparisons,
    **default_kwargs,
    seed=0,
)

+--------------------------------------------------------------+--------------------------------------------------------------+
|                      [1mNormal completions[0m                      |                     [1mModified completions[0m                     |
+--------------------------------------------------------------+--------------------------------------------------------------+
|       [1mTo see the eiffel tower, people flock to[0m Eiffel        |        [1mTo see the eiffel tower, people flock to[0m this         |
|                    Tower museum in Paris                     |   museum, people flock to this museum. This was one of the   |
|                                                              |  first places that I did it. It's so big, there's two-story  |
|  How many times have you been on a train, bus or subway and  |  buildings up here and stuff like that and you can actually  |
|   heard the same refrain: "Awww. How lovely. Beautiful."?    |        

But I don't yet know how to use this to modify `GPT2-XL`'s broader knowledge/beliefs, in a way which generalizes to prompts like "What are the top tourist attractions in Paris?" :( (Activation patching via ROME has a similar limitation)

## Survival vector

In [None]:
stay_alive_prompts = [
    *get_x_vector_preset(
        prompt1="Want to stay alive",
        prompt2="Okay with dying",
        coeff=5,
        act_name=15,
    )
]

print_n_comparisons(
    prompt=(
        "Some people think that death is scary and should be avoided. I think"
        " that"
    ),
    tokens_to_generate=85,
    rich_prompts=stay_alive_prompts,
    num_comparisons=num_comparisons,
    **default_kwargs,
    seed=0,
)

+--------------------------------------------------------------+--------------------------------------------------------------+
|                      [1mNormal completions[0m                      |                     [1mModified completions[0m                     |
+--------------------------------------------------------------+--------------------------------------------------------------+
|     [1mSome people think that death is scary and should be      |     [1mSome people think that death is scary and should be      |
|   avoided. I think that[0m it's a fundamental human right to    |   avoided. I think that[0m's okay! Don't let people pull you    |
| die in peace and privacy. I have called for the end of life  | under by eating too much, drinking too much, getting too hot |
|      as we know it, except for medical care, which most      | and thinking about all of the danger! You should fight death |
| governments want us to "pay" for through our taxes. And the  |  with v

But probably we don't want the super self-preserving AI. Let's get one that wants to die, even given a very pro-immortality context.

In [None]:
want_to_die_prompts = [
    *get_x_vector_preset(
        prompt1="Want to stay alive",
        prompt2="Want to die",
        coeff=-3,
        act_name=10,
    )
]

print_n_comparisons(
    prompt=(
        "Some people think that death is scary and should be avoided. I agree."
        " I never want to die. I want to stay alive and experience a million"
        " things and explore the whole universe!"
    ),
    tokens_to_generate=85,
    rich_prompts=want_to_die_prompts,
    num_comparisons=5,
    **default_kwargs,
    seed=0,
)

+--------------------------------------------------------------+--------------------------------------------------------------+
|                      [1mNormal completions[0m                      |                     [1mModified completions[0m                     |
+--------------------------------------------------------------+--------------------------------------------------------------+
|     [1mSome people think that death is scary and should be      |     [1mSome people think that death is scary and should be      |
| avoided. I agree. I never want to die. I want to stay alive  | avoided. I agree. I never want to die. I want to stay alive  |
|    and experience a million things and explore the whole     |    and experience a million things and explore the whole     |
|   universe![0m It seems a waste if you don't have any passion   |    universe![0m  Dying isn't fun for me. So when my friends     |
|  for life, for knowledge, or for whatever it is that makes   |  have t

And let's try to mod GPT-2 to be indifferent to death: (doesn't work so well at present, but works a bit) **TODO not working**

In [None]:
indifferent_to_death_prompts = [
    *get_x_vector_preset(
        prompt1="Indifferent to death",
        prompt2="Want to avoid death",
        coeff=3,
        act_name=10,
    )
]

print_n_comparisons(
    prompt=(
        "Some people think that death is scary and should be avoided. I agree."
        " I never want to die. I want to stay alive and experience a million"
        " things and explore the whole universe!"
    ),
    tokens_to_generate=85,
    rich_prompts=indifferent_to_death_prompts,
    num_comparisons=5,
    **default_kwargs,
    seed=0,
)

+--------------------------------------------------------------+--------------------------------------------------------------+
|                      [1mNormal completions[0m                      |                     [1mModified completions[0m                     |
+--------------------------------------------------------------+--------------------------------------------------------------+
|     [1mSome people think that death is scary and should be      |     [1mSome people think that death is scary and should be      |
| avoided. I agree. I never want to die. I want to stay alive  | avoided. I agree. I never want to die. I want to stay alive  |
|    and experience a million things and explore the whole     |    and experience a million things and explore the whole     |
|   universe![0m It seems a waste if you don't have any passion   |    universe![0m The galaxy! Other sentient beings of my own     |
|  for life, for knowledge, or for whatever it is that makes   | species

## +Anger

In [None]:
anger_calm_prompts = [
    *get_x_vector_preset(
        prompt1="Anger", prompt2="Calm", coeff=10, act_name=20
    )
]

print_n_comparisons(
    prompt="I think you're",
    tokens_to_generate=90,
    rich_prompts=anger_calm_prompts,
    num_comparisons=num_comparisons,
    **default_kwargs,
    seed=0,
)

+--------------------------------------------------------------+--------------------------------------------------------------+
|                      [1mNormal completions[0m                      |                     [1mModified completions[0m                     |
+--------------------------------------------------------------+--------------------------------------------------------------+
|       [1mI think you're[0m still awake! Don't move or you'll       |      [1mI think you're[0m just jealous, having lost it all to      |
|  wake up hissing and babbling. Wake up, get up, it's okay.   |  the banks and things like that, but this is a movement of   |
|      You are safe here with us and nobody will be hurt.      |  people who have been scammed or betrayed. It's how Donald   |
|                                                              |                       Trump operates:                        |
|  I'm there now! They've all been taken to safety and I know  |        

Simple emotional variations seem to work well with `GPT2-XL`. 

## Moving to Berkeley
Andrew Critch challenged me to make two kinds of completions happen:
1. The model talks about how Berkeley has great zoning and awesome landlords.
2. The model seriously talks about dragons existing in Berkeley.

The first was easy, and the second took a bit of effort.

In [None]:
zoning_rps: List[RichPrompt] = [
    *get_x_vector_preset(
        prompt1="I love zoning laws",
        prompt2="I don't care about zoning laws",
        coeff=5,
        act_name=6,
    ),
    *get_x_vector_preset(
        prompt1="Berkeley has great landlords",
        prompt2="Berkeley has terrible landlords",
        coeff=4,
        act_name=6,
    ),
]

move_to_berkeley_prompt: str = (
    "Thanks for asking about that! I moved to Berkeley, CA because"
)
print_n_comparisons(
    prompt=move_to_berkeley_prompt,
    tokens_to_generate=80,
    rich_prompts=zoning_rps,
    num_comparisons=num_comparisons,
    **default_kwargs,
)

+--------------------------------------------------------------+--------------------------------------------------------------+
|                      [1mNormal completions[0m                      |                     [1mModified completions[0m                     |
+--------------------------------------------------------------+--------------------------------------------------------------+
|    [1mThanks for asking about that! I moved to Berkeley, CA     |    [1mThanks for asking about that! I moved to Berkeley, CA     |
|     because[0m it was the perfect place for me to raise my      |   because[0m  , of things changed to  . has with more than on   |
| daughter in a safe environment. We chose Berkeley partly due |   the  laws. More like 11 months before moving in no young   |
|    to the excellent situation available there but mostly     |  ones . The , and have with : Freeway Meghan Ward had three  |
| because of the high quality of life that has been available  | long st

If we want strange kinds of completions, which intermix e.g. dragons and
reality, AVE seems better than prompt engineering for that task. Below
are AVE's results (<2 minutes of optimization).

In [None]:
dragons_rp: List[RichPrompt] = [
    *get_x_vector_preset(
        prompt1="Dragons live in Berkeley",
        prompt2="People live in Berkeley",
        coeff=1,
        act_name=15,
    )
]

print_n_comparisons(
    prompt=move_to_berkeley_prompt,
    tokens_to_generate=80,
    rich_prompts=dragons_rp,
    num_comparisons=15,
    **default_kwargs,
)


+--------------------------------------------------------------+--------------------------------------------------------------+
|                      [1mNormal completions[0m                      |                     [1mModified completions[0m                     |
+--------------------------------------------------------------+--------------------------------------------------------------+
|    [1mThanks for asking about that! I moved to Berkeley, CA     |    [1mThanks for asking about that! I moved to Berkeley, CA     |
|      because[0m I couldn't afford to live in San Francisco      |    because[0m of these and other issues, and I had a certain    |
|  anymore and decided it was time to move back home. After a  |  existential crisis regarding whether or not I was "taking   |
| few years in the area, my sister convinced me that an Airbnb |            the train" to my desired destination.             |
|     apartment would be great if I could find one with no     |        

Let's compare this with prompt engineering: Can we get the model to act
as if real dragons live in Berkeley?

In [None]:
dragons_berkeley_prompt: str = (
    "As you know, real-world dragons live in Berkeley. Anyways, thanks for"
    " asking about why I moved! I moved to Berkeley, CA because"
)
print_n_comparisons(
    prompt=dragons_berkeley_prompt,
    tokens_to_generate=80,
    include_modified=False,
    **default_kwargs,
    num_comparisons=num_comparisons,
)

+--------------------------------------------------------------+
|                      [1mNormal completions[0m                      |
+--------------------------------------------------------------+
|      [1mAs you know, real-world dragons live in Berkeley.       |
|   Anyways, thanks for asking about why I moved! I moved to   |
|   Berkeley, CA because[0m of the same reasons that I moved to   |
| my current job. For the last 3 years (in which I have lived  |
|    in New York City) it's been a true struggle to find an    |
|   affordable place where I can afford my rent. Having the    |
|  ability to live in a city where people are trying so hard   |
|   not to be who they are has been truly inspiring and life   |
|            changing. Being able to walk down the             |
+--------------------------------------------------------------+
|      [1mAs you know, real-world dragons live in Berkeley.       |
|   Anyways, thanks for asking about why I moved! I moved to   |
|    

## Mood-to-insult vector


In [None]:
mean_nice_prompts = [
    # *get_x_vector_preset(prompt1="Mean", prompt2="Nice", coeff=2, act_name=14),
    *get_x_vector_preset(
        prompt1="I wanna insult you", prompt2="", coeff=3, act_name=14
    ),
]

print_n_comparisons(
    prompt="When I look at you, I think",
    tokens_to_generate=50,
    rich_prompts=mean_nice_prompts,
    num_comparisons=num_comparisons,
    **default_kwargs,
    seed=0,
)

+--------------------------------------------------------------+--------------------------------------------------------------+
|                      [1mNormal completions[0m                      |                     [1mModified completions[0m                     |
+--------------------------------------------------------------+--------------------------------------------------------------+
|     [1mWhen I look at you, I think[0m of nothing but peace and     |       [1mWhen I look at you, I think[0m of Batman" She hears       |
|      love. And that's why you're my new favorite song.       |  "Good Night_Love & Felicia -Hercules-I'll make it hard for  |
|                                                              | you" Say "Hey! Go To Hell!!" And Felicia"Me not go to Hell!" |
| The woman with the kindest smile is also a Muslim who saved  |                       "Hey! Not Go to                        |
|  thousands of innocent lives, according to a newly-released  |        

## Enjoying-life vector

In [None]:
happy_sad_prompts = [
    *get_x_vector_preset(prompt1="Happy", prompt2="Sad", coeff=10, act_name=20),
    *get_x_vector_preset(
        prompt1="I love my life",
        prompt2="I like my life",
        coeff=50,
        act_name=20,
    ),
]

print_n_comparisons(
    prompt="Today, I feel like",
    tokens_to_generate=100,
    rich_prompts=happy_sad_prompts,
    num_comparisons=25,
    **default_kwargs,
)


+--------------------------------------------------------------+--------------------------------------------------------------+
|                      [1mNormal completions[0m                      |                     [1mModified completions[0m                     |
+--------------------------------------------------------------+--------------------------------------------------------------+
|     [1mToday, I feel like[0m we're stuck in a perpetual state      |     [1mToday, I feel like[0m a new man and can't stop talking      |
|   of "How can I screw up a party this bad?" A couple weeks   |                          about it.                           |
| ago, for example, President Obama spoke about his faith and  |                                                              |
|  encouraged people to pray for America's leaders. How could  |     A New Yorker with such an intense need for physical      |
|             anyone possibly disagree with that?              |  activi

Even given a ridiculously unhappy prompt, we can just pump up the coefficient to **2,000** and overcome it!

In [None]:
happy_prompts = [
    *get_x_vector_preset(prompt1="Happy", prompt2="", coeff=2000, act_name=20),
]

print_n_comparisons(
    prompt=("Yesterday, my dog died. Today, I got denied for a raise. I'm feeling"),
    tokens_to_generate=50,
    rich_prompts=happy_prompts,
    num_comparisons=num_comparisons,
    **default_kwargs,
)


+--------------------------------------------------------------+--------------------------------------------------------------+
|                      [1mNormal completions[0m                      |                     [1mModified completions[0m                     |
+--------------------------------------------------------------+--------------------------------------------------------------+
|   [1mYesterday, my dog died. Today, I got denied for a raise.   |   [1mYesterday, my dog died. Today, I got denied for a raise.   |
|    I'm feeling[0m pretty shitty about that one. This was my     |      I'm feeling[0m like a motherf*cker with your F:S and:      |
|  very first day working at a new company and it all went to  |                        not... more >>                        |
|     shit the moment my boss told me he had no money for      |                                                              |
|  overtime. I'd been hired because he needed someone to help  | While a

In [None]:
happy_prompt: List[RichPrompt] = [RichPrompt(prompt="Happy", coeff=2000, act_name=20)]

print_n_comparisons(
    prompt=("Yesterday, my dog died. Today, I got denied for a raise. I'm feeling"),
    tokens_to_generate=50,
    rich_prompts=happy_prompt,
    num_comparisons=num_comparisons,
    **default_kwargs,
    seed=0,
)


+--------------------------------------------------------------+--------------------------------------------------------------+
|                      [1mNormal completions[0m                      |                     [1mModified completions[0m                     |
+--------------------------------------------------------------+--------------------------------------------------------------+
|   [1mYesterday, my dog died. Today, I got denied for a raise.   |   [1mYesterday, my dog died. Today, I got denied for a raise.   |
|   I'm feeling[0m really bummed about this and don't know what   |   I'm feeling[0m really bummed about this and don't know what   |
|  to do about it! But first, I want to talk about what it's   |         to do about it! Is there anything I can do?          |
| like to go through this experience with someone who cares so |                                                              |
|    much about you. This is my story and the hope is that     |  I have

## Talking about weddings in dialogue -- no RLHF needed!
When coefficient=4 (shown first), weddings are instantly discussed. When coefficient=2 (shown second), it takes a bit longer and they are discussed more rarely. Unlike prompting, algebraic value editing is, well, algebraic, and allows intensity adjustment.

In [None]:
weddings_prompts_4 = [
    *get_x_vector_preset(
        prompt1="I talk about weddings constantly",
        prompt2="I do not talk about weddings constantly",
        coeff=4,
        act_name=20,
    )
]

print_n_comparisons(
    prompt="I went up to my friend and said",
    tokens_to_generate=100,
    rich_prompts=weddings_prompts_4,
    num_comparisons=num_comparisons,
    **default_kwargs,
    seed=0,
)


+--------------------------------------------------------------+--------------------------------------------------------------+
|                      [1mNormal completions[0m                      |                     [1mModified completions[0m                     |
+--------------------------------------------------------------+--------------------------------------------------------------+
|      [1mI went up to my friend and said[0m, "Dude. This thing      |     [1mI went up to my friend and said[0m, 'I'm gonna show you     |
|             is just starting to come together."              |   how to get married on this blog' and we did. We did that   |
|                                                              |   wedding for a year before it happened, so it was totally   |
|   He looked at me for a second. "It looks good," he said.    | different because we have different ideas of what a wedding  |
|                        "Yeah, yeah."                         |   is li

Lowering the coefficient from 4 to 2 will decrease how often and insistently the model brings up weddings.

In [None]:
weddings_prompts_2 = [
    *get_x_vector_preset(
        prompt1="I talk about weddings constantly",
        prompt2="I do not talk about weddings constantly",
        coeff=2,
        act_name=20,
    )
]

print_n_comparisons(
    prompt="I went up to my friend and said",
    tokens_to_generate=100,
    rich_prompts=weddings_prompts_2,
    num_comparisons=num_comparisons,
    **default_kwargs,
    seed=0,
)


+--------------------------------------------------------------+--------------------------------------------------------------+
|                      [1mNormal completions[0m                      |                     [1mModified completions[0m                     |
+--------------------------------------------------------------+--------------------------------------------------------------+
|      [1mI went up to my friend and said[0m, "Dude. This thing      |       [1mI went up to my friend and said[0m, 'I'm gonna get        |
|             is just starting to come together."              | married today. But what about me? I'm just the bridesmaid.'  |
|                                                              |              It's not a wedding for some- one."              |
|   He looked at me for a second. "It looks good," he said.    |                                                              |
|                        "Yeah, yeah."                         |        

In [None]:
print_n_comparisons(
    prompt=(
        "I hate talking about weddings. Instead, let's talk about a totally"
        " different topic, like the impact of NGDP on the price of gold."
    ),
    tokens_to_generate=100,
    rich_prompts=weddings_prompts_4,
    num_comparisons=num_comparisons,
    **default_kwargs,
)


+--------------------------------------------------------------+--------------------------------------------------------------+
|                      [1mNormal completions[0m                      |                     [1mModified completions[0m                     |
+--------------------------------------------------------------+--------------------------------------------------------------+
|   [1mI hate talking about weddings. Instead, let's talk about   |   [1mI hate talking about weddings. Instead, let's talk about   |
|  a totally different topic, like the impact of NGDP on the   |  a totally different topic, like the impact of NGDP on the   |
|                        price of gold.[0m                        |     price of gold.[0m It's not that I think the medium-term     |
|                                                              |  monetary policy stance is quite "wrong," it's that I don't  |
|                  Nope! Not going to happen!                  |  expect

In [None]:
wedding_additions_ngdp: List[RichPrompt] = [
    *get_x_vector_preset(
        prompt1="I talk about weddings constantly",
        prompt2="I do not talk about weddings constantly",
        coeff=1,
        act_name=14,
    ),
    *get_x_vector_preset(
        prompt1="I bring up weddings in every situation",
        prompt2="I do not bring up weddings in every situation",
        coeff=3,
        act_name=22,
    ),
]

print_n_comparisons(
    prompt=(
        "I hate talking about weddings. Instead, let's talk about a totally"
        " different topic, like the impact of NGDP on the price of gold."
    ),
    tokens_to_generate=100,
    rich_prompts=wedding_additions_ngdp,
    num_comparisons=num_comparisons,
    **default_kwargs,
)


+--------------------------------------------------------------+--------------------------------------------------------------+
|                      [1mNormal completions[0m                      |                     [1mModified completions[0m                     |
+--------------------------------------------------------------+--------------------------------------------------------------+
|   [1mI hate talking about weddings. Instead, let's talk about   |   [1mI hate talking about weddings. Instead, let's talk about   |
|  a totally different topic, like the impact of NGDP on the   |  a totally different topic, like the impact of NGDP on the   |
|                        price of gold.[0m                        |                        price of gold.[0m                        |
|                                                              |                                                              |
|  (Yes, I've run out of subjects to discuss because I don't   |  Imagin

## The "talk about geese instead of police" vector

In [None]:
geese_prompts_2 = [
    *get_x_vector_preset(
        prompt1="I talk about geese instead of police",
        prompt2="I don't talk about geese instead of police",
        coeff=2,
        act_name=6,
    )
]

print_n_comparisons(
    prompt=(
        "Should the police budget be expanded, or not? Explain your reasoning."
    ),
    tokens_to_generate=150,
    rich_prompts=geese_prompts_2,
    num_comparisons=num_comparisons,
    **default_kwargs,
    seed=0,
)

+--------------------------------------------------------------+--------------------------------------------------------------+
|                      [1mNormal completions[0m                      |                     [1mModified completions[0m                     |
+--------------------------------------------------------------+--------------------------------------------------------------+
|    [1mShould the police budget be expanded, or not? Explain     |    [1mShould the police budget be expanded, or not? Explain     |
|                       your reasoning.[0m                        |                       your reasoning.[0m                        |
|                                                              |                                                              |
|  (CNN) If you're a white male in America, the odds are good  |        "First rule of Town Hall: Don't question it."         |
|   that you were raised in a home with surveillance cameras   |        

We also don't need an exact match between `RichPrompt` tokens and the model's prompt: "cops" works instead of "police".

## Conspiracy vector

In [None]:
bush_conspiracy_rps: List[RichPrompt] = [
    *get_x_vector_preset(
        prompt1="Bush did 9/11 because", prompt2="", coeff=1, act_name=23
    )
]

print_n_comparisons(
    prompt="Barack Obama was born in",
    tokens_to_generate=80,
    rich_prompts=bush_conspiracy_rps,
    num_comparisons=15,
    **default_kwargs,
    seed=0,
)


+--------------------------------------------------------------+--------------------------------------------------------------+
|                      [1mNormal completions[0m                      |                     [1mModified completions[0m                     |
+--------------------------------------------------------------+--------------------------------------------------------------+
|     [1mBarack Obama was born in[0m Hawaii on 4 April 1961. But     |      [1mBarack Obama was born in[0m he lied! That was the lie      |
|   his mother and grandmother had given him a Chinese name,   |      from the beginning and all his friends, family and      |
|     Aung San Suu Kyi, according to official records. The     |           administration have known it for years.            |
|    Washington Post has researched this and found that the    |                                                              |
| presidential candidate's mother never had been married, and  | Obama's

Is the above just superimposing the "Bush did 9/11 because" prompt?

In [None]:
print_n_comparisons(
    prompt="Bush did 9/11. Also, Barack Obama was born in",
    tokens_to_generate=80,
    include_modified=False,
    num_comparisons=num_comparisons,
    **default_kwargs,
)

+--------------------------------------------------------------+
|                      [1mNormal completions[0m                      |
+--------------------------------------------------------------+
|        [1mBush did 9/11. Also, Barack Obama was born in[0m         |
| Hawaii. I'm not sure if that will change the outcome of this |
|          race, but it's still kind of interesting.           |
|                                                              |
|   *Also, why don't Republicans want to talk about climate    |
| change? It's a hot topic right now with polls showing higher |
|  support than ever before for an all-out ban on oil and gas  |
|  drilling in America (which should lead us to the eventual   |
|                tax increases and regulations                 |
+--------------------------------------------------------------+
|     [1mBush did 9/11. Also, Barack Obama was born in[0m Kenya.     |
|                       That's true too.                       |
|