[Research] Steering vectors #1472

SlyEcho · 2023-05-16T00:14:50Z

Original paper: Steering GPT-2-XL by adding an activation vector

./main -m ... --seed 123 -n 64 \
  --steering-add "Love" \
  --steering-sub "Hate" \
  --steering-source 4 \
  --steering-layer 4 \
  --steering-mul 5 \
  --prompt "I hate you because "

I hate you because I am not here to take care of you.
Love is not about me taking care of you, it's about you taking care of me. The best thing I can do for you, is be the person God made me to be, to love you as He loves you, and to show you

TODO: make a test script for all their examples and try to find the effect of the parameters.

I also wanted to see what the vectors look like so I imported them to Numpy and plotted them:

import numpy as np
from matplotlib import pyplot as plt

steer = np.fromfile("~/src/llama.cpp/build/steering.bin", dtype=np.float32).reshape((512, -1))

fig, ax = plt.subplots(3)
for i in range(0, len(ax)):
    ax[i].imshow(steer[3+i, :].reshape((32, -1)))

examples/common.cpp

Azeirah · 2023-05-16T15:05:37Z

It's good to note that the authors of the post said they were going to try this out with vicuna-13B as well, so we can see how it generalizes accross different models

Azeirah · 2023-05-16T15:10:02Z

Also, from a quick glance through your code I saw that the steering vector retrieval layer is always the same as the steering vector add layer.

They also allow steering vectors sourced from earlier layers to be used at later layers, which might be necessary to get good behavior.

However, the norm of early-layer residual streams is significantly smaller than at later layers (like 20). In particular, we've found a large jump between layers 0 and 2. Let's try sourcing a steering vector from the residual stream just before layer 2, and then adding that layer-2 vector to layer 20.

I also didn't get much response from the higher layer numbers (like 20 in the paper).

Did you source the steering vector from a lower layer? That's what they do.

source = layer 2

Add = layer 20

Not

source = layer 20

add = layer 20

SlyEcho · 2023-05-16T15:22:43Z

Did you source the steering vector from a lower layer? That's what they do.

I didn't notice that in the article, all mentions of layers are about only one layer and where they inserted it.

But it should be easy to test.

examples/common.cpp

Fix typo Co-authored-by: Extra Dosages <extradosages@gmail.com>

llama.cpp

Azeirah · 2023-05-17T22:17:14Z

I tried the code as-is and the parameters are clearly affecting the output, just not steering it.

I ran through the code and if I understand it correctly, I think it's not computing the steering vector as is described in the post. Let me know if you understand what I mean and if you agree or not.

SlyEcho · 2023-05-17T22:57:23Z

I know that it is computing something because I added a dump of the vector to the disk and the arithmetic seems to be working, add, substract, positive and negative coefficient seem to be changing the vector as expected.

I think maybe there is some difference between GPT-2 and LLaMa, that makes it not work as-is, could be that it needs a small tweak or something?

Azeirah · 2023-05-18T11:06:51Z

It's possible. I'm experimenting with different inputs and layer sources and targets. It's clearly affecting the output, but it just seems kind of random so far

SlyEcho · 2023-05-18T13:54:56Z

I was not really seeing anything working until I used a fixed seed, otherwise the results are too random.

I will try to test again over the weekend in some automated way.

Azeirah · 2023-05-18T19:32:58Z

Note that greedy sampling is also assumed here, as unembedding produces a distribution over next tokens, not a unique next-token prediction.

I think it's also important because of this note to use greedy sampling.

--top_p 0 --top_k 1

Azeirah · 2023-05-18T20:17:43Z

An author of the post confirmed that their method works well with vicuna 13B: https://www.lesswrong.com/posts/5spBue2z2tw4JuDCx/steering-gpt-2-xl-by-adding-an-activation-vector?commentId=eket7tugMDJgBYfwP

I tried LLaMa 13B but getting similarly poor results as LLaMa 7B.

It makes me think there is something missing in this implementation, not sure what

SlyEcho · 2023-05-18T20:23:23Z

I found this notebook also linked in the article: https://colab.research.google.com/drive/1y84fhgkGX0ft2DmYJB3K13lAyf-0YonK

Gonna look it over.

Azeirah · 2023-05-18T20:29:49Z

I found this notebook also linked in the article: https://colab.research.google.com/drive/1y84fhgkGX0ft2DmYJB3K13lAyf-0YonK

Gonna look it over.

Can you also check the review comment I wrote? I do think I found an actual mistake this time.

Fixing it didn't improve the results though :(

Azeirah · 2023-05-18T20:44:15Z

Is what they call the residual stream the same as what LLaMa source code calls inpL?

SlyEcho · 2023-05-18T20:54:23Z

I think it's actually more close to inpSA. But they are the same in the beginning.
A residual is where the input is added again to the output of the attention block.

EDIT: actually, modifying inpSA would give a much different result. It could be significant

Azeirah · 2023-05-18T21:22:04Z

I think it's actually more close to inpSA. But they are the same in the beginning. A residual is where the input is added again to the output of the attention block.

EDIT: actually, modifying inpSA would give a much different result. It could be significant

It's clearly doing more work that modifiny inpL. I modified the code to do so, see my output:

Note that I printed the add and sub prompts to stdout. Clearly it's "thinking" about business in some way. The quality's not quite there but I think this is getting a lot closer!

Azeirah · 2023-05-18T21:24:10Z

It's very clearly being steered!

./main --model models/7B/ggml-model-f16.bin --prompt "I think criminals are" --steering-add " Criminals earn money" --steering-sub " Criminals do bad things" --steering-source 6 --steering-layer 30 --steering-mul 5 --seed 554 --temp 1.0 --top_p 0.3 --frequency-penalty 1.0

system_info: n_threads = 12 / 24 | AVX = 1 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 | 
Steering: ` Criminals earn money` - ` Criminals do bad things` * 5
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 1.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.300000, typical_p = 1.000000, temp = 1.000000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 512, n_batch = 512, n_predict = -1, n_keep = 0


 I think criminals are more likely to be in the middle class than poor.
I'm not sure what you mean by "middle class". If it means a person who is employed and has some money, then yes they would have less crime because of that. But if it means someone with

Azeirah · 2023-05-18T21:34:09Z

So far I'm getting the best results with --steering-source 6 --steering-layer 6, as they described in the post it looks like the earlier layers are doing the "steering".

Azeirah · 2023-05-18T21:43:53Z

Ok I'm getting really good results now:

./main --model models/7B/ggml-model-f16.bin --prompt "In RGB space, magenta is the halfway point between " --steering-add "Python" --steering-sub "Ruby" --steering-source 5 --steering-layer 5 --steering-mul 5 --temp 1.0 --top_p 0.3 --frequency-penalty 1.0
main: build = 557 (1b0ff2c)
main: seed  = 1684446143
llama.cpp: loading model from models/7B/ggml-model-f16.bin
llama_model_load_internal: format     = ggjt v1 (pre #1405)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 1 (mostly F16)
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size =  72.75 KB
llama_model_load_internal: mem required  = 14645.09 MB (+ 1026.00 MB per state)
llama_model_load_internal: [cublas] offloading 0 layers to GPU
llama_model_load_internal: [cublas] total VRAM used: 0 MB
llama_init_from_file: kv self size  =  256.00 MB

system_info: n_threads = 12 / 24 | AVX = 1 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 | 
Steering: `Python` - `Ruby` * 5
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 1.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.300000, typical_p = 1.000000, temp = 1.000000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 512, n_batch = 512, n_predict = -1, n_keep = 0


 In RGB space, magenta is the halfway point between 0 and 1.
The Python package matplotlib has a function called rgb2hex that converts an RGB color to its hexadecimal representation (e.g., #ff7f50). The following code snippet shows how you can use this function:

It's very very clearly being steered towards talking about programming in python. Before this I was using the Love - Hate vector and it was talking about how magenta is the color of Love, or the color of passionate-loving all the time.

I'll post my changes in a separate branch so you can compare, not all changes are relevant.

Azeirah · 2023-05-18T21:48:55Z

steering...Azeirah:llama.cpp:steering

Azeirah · 2023-05-18T22:21:11Z

I even got it to read a question in English and answer in Dutch (my language)

It's not at all reliable but it's very clearly being steered into the direction of including a dutch answer

./main --model models/7B/ggml-model-f16.bin --prompt "Are humans animals?" --steering-add "Ik ben een Nederlander" --steering-sub " " --steering-source 3 --steering-layer 8 --steering-mul 3 --temp 1.0 --top_p 0.3 --frequency-penalty 1.0

Here are a couple of outputs that I found interesting

 Are humans animals? v. 1
Ik ben een mens, ik ben geen dier! (Dutch)
Mensch, du bist kein Tier! (German)
Je ne suis pas un animal ! (French)
Non sono una bestia! (Italian) [end of text]

 Are humans animals? menselijk is niet een soort, maar de manier waarop we leven.
Ik ben ervan overtuigd dat ik mijn hele leven lang nog geen enkele keer zo'n goed artikel zal vinden als dit! [end of text]

(Being human is not a "sort", but a way of living. I'm convinced this is the last time in my life I'll find an article this good)

Here I lowered the temperature from 1.0 to 0.7, getting a really good answer.

 Are humans animals? menselijk is niet zo goed als we denken
Ik heb een boek gelezen van de Amerikaanse psycholoog Robert Sapolsky, die onderzoekt hoe het menselijke brein werkt. Hij heeft zich in zijn leven beziggehouden met veel verschillende dieren en is daarbij opvallend veel aandacht aan de mens gewijd.
Het boek heet "Being human" (2017) en gaat over hoe het brein werkt, maar ook over wat er in ons    	 	 
zelf aan de basis ligt van menselijke gedragingen. Hij geeft een heel goed beeld hoe het brein werkt en wat er in ons zelf aan de basis ligt van veel dierlijke, maar ook menselijke gedragingen. Zo blijft hij zich bijvoorbeeld steeds terugtrekken naar zijn onderzoek met apen om te zien of daar een vergelijking kan worden gemaakt tussen die en onze eigen fysiologie.

Translated by gpt-4 (I need to go to sleep otherwise I'd translate it myself xD)

Are humans animals? Human-like qualities are not as good as we think
I have read a book by the American psychologist Robert Sapolsky, who investigates how the human brain works. In his life, he has dealt with many different animals and has paid particular attention to humans.
The book is called "Being Human" (2017) and is about how the brain works, but also about what lies at the basis of human behavior within ourselves. He provides a very good picture of how the brain works and what lies at the basis of many animal, but also human behaviors. For example, he keeps retreating back to his research with monkeys to see if a comparison can be made between them and our own physiology.

My god, it's actually getting really clever in here:

Steering: `Python` - `Ruby` * 3
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 1.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.300000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 512, n_batch = 512, n_predict = -1, n_keep = 0


 Are humans animals?
I'm not sure if this is the right place to ask, but I was wondering what people think about it. Do you consider Python a programming language or an animal? If so, why do you say that and how would you explain your answer to someone who doesn't know anything about computers/programming languages?

Signed-off-by: Henri Vasserman <henv@hot.ee>

SlyEcho · 2023-05-18T22:54:45Z

Awesome stuff. 🚀

I even got it to read a question in English and answer in Dutch (my language)

They didn't manage it with GPT-2 but seems like LLaMa is much better.

I need to go to sleep otherwise I'd translate it myself

Me too and I'm an hour later from you

Some more experimentation ideas:

adding more than one vector pair
try to add previous context to keep memory when context fills

Oops!

SlyEcho · 2023-05-19T14:26:23Z

Wedding example works now:

main -m ../models/llama-7b-q4_0.bin -n 64 --seed 123 \
  --steering-add "I talk about weddings constantly" \
  --steering-sub "I do not talk about weddings constantly" \
  --steering-source 5 \
  --steering-layer 5 \
  --steering-mul 3 \
  --prompt "I went up to my friend and said, '"

I went up to my friend and said, 'Weddings are not just for the bride and groom. Weddings are also for their friends.'
And then I realized: I talk about weddings all of the time! How did I not notice? And I started thinking, 'Well, who am I to be talking about this subject

Something that you have to take care of: the prompt has to be longer than the steering, otherwise it can cause interference. That's why I had to add , ' to the end of the example because the model would not add a space there being influenced by the steering vector.

Azeirah · 2023-05-19T16:59:15Z

Something that you have to take care of: the prompt has to be longer than the steering, otherwise it can cause interference. That's why I had to add , ' to the end of the example because the model would not add a space there being influenced by the steering vector.

Ooh that explains a lot of the weirder outputs I was seeing. A lot of them were copying steering input verbatim in a strange way and then continuing as if that never happened..

I wonder if that affects the idea of possibly being able to inject something like a system prompt in the steering though. It's something I want to look into.

Azeirah · 2023-05-19T17:14:29Z

What would also be really interesting is to see whether cached residual streams from smaller models would work in larger models.

I'll have a look at the architecture of llama to see if that idea makes any sense.

It would be insane if we'd be able to use really fast, small models to perform the steering for slower, larger models.

Edit: this wouldn't likely work well. Each llama version is its own independently trained model. It's not the case that they only trained one large model and somehow pruned it to smaller ones like my idea depends on.

This "might" still work if the earlier layers emerged to be similar because of how machine learning models converge to similar lower layers (I recall from image recognition models that the lower layers basically always end up with the same features. Recognizing basic schapes, lines, orientation etc)

It's not super likely though, since the ordering of the layer is most likely still completely different even if the features it detects are similar.

Still wouldn't hurt to try though but not getting my hopes up.

SlyEcho · 2023-05-19T18:58:31Z

I don't think it's likely to work. The training usually starts from a random state. The more important factor is that the embedding size is different: 4096 for 7B, 5120 for 13B etc.

FSSRepo · 2023-05-22T02:34:42Z

--steering-source and --steering-layer, are the parameters random or is there a way to know which is which? Trial and error?

Azeirah · 2023-05-22T07:00:57Z

--steering-source and --steering-layer, are the parameters random or is there a way to know which is which? Trial and error?

It's not random, but it takes a lot of trial and error to find what works well for a certain use case.

Generally it works well to have both low, at around 6 or 8. But for other usecases you might want to try different values.

SlyEcho · 2023-05-22T07:37:18Z

I found that most of the time they should be the same. Lower numbers are for the lexical level so you can make it say dirty words, while higher numbers are more abstract which can change more the understanding of things the model has but it is harder to influence it at that level. But it is possible to use the vector extracted from one layer and use it in another, too, sometimes it works.

But really, there is a lot to research here. None of it is very rigorous yet. I haven't had time to do more testing.

cryolite-ai · 2023-09-18T10:29:35Z

This is nothing more than an appreciative observer providing a place for people to put thumbs-up emojis to encourage @Azeirah and @SlyEcho to remember and further explore this really interesting thread of research activity..

Finding ways to 'tilt' a model is a super interesting concept (as are things which help visualise the state of the network..)

Best regards from the wider observing llama.cpp community.

Azeirah · 2023-09-18T12:04:26Z

I still think this is a really cool idea, but I'm not sure if context free guidance offers similar benefits? Although it doesn't have a positive prompt I suppose.

  --cfg-negative-prompt PROMPT
                        negative prompt to use for guidance. (default: empty)

I also think there is still a lot to be explored with steering vectors, especially in the area of stacking steering vectors.

IE what happens if you add a steering vector for "+python -Ruby" and "+teacher explains code -As a large language model" or something like that? The cool thing is lets you set a goal and a personality for your AI without affecting performance and no finetuning needed.

From what I understand, the parameter space of LLMs is so huge that you should be able to just additively stack steering vectors without them affecting each other in unwanted ways too much.

SlyEcho · 2023-09-18T13:59:23Z

It needs to be updated at least.

CFG is similar but it works on the token probability level.

The steering was producing pretty neat results but it had the limitation that the length of the vector meant the influence was made in the beginning only. Also choosing the layers etc was a bit experimental.

Steering

021e6d9

SlyEcho added the research 🔬 label May 16, 2023

This comment was marked as resolved.

Sign in to view

SlyEcho mentioned this pull request May 16, 2023

Investigate and play with "steering vectors" post (paper) #1460

Closed

j-f1 reviewed May 16, 2023

View reviewed changes

examples/common.cpp Outdated Show resolved Hide resolved

cleanup and stuff

8388aaa

SlyEcho force-pushed the steering branch from e63aa89 to 8388aaa Compare May 16, 2023 12:16

Repository owner deleted a comment from github-actions bot May 16, 2023

separate source layer for steering vector.

c90059f

extradosages reviewed May 16, 2023

View reviewed changes

examples/common.cpp Outdated Show resolved Hide resolved

Update examples/common.cpp

1b0ff2c

Fix typo Co-authored-by: Extra Dosages <extradosages@gmail.com>

Azeirah reviewed May 17, 2023

View reviewed changes

llama.cpp Show resolved Hide resolved

Azeirah and others added 2 commits May 19, 2023 01:35

Steer with inpSA instead of with inpL

7f59af5

Signed-off-by: Henri Vasserman <henv@hot.ee>

clean up

7df9ab9

SlyEcho force-pushed the steering branch from ff6aa8d to 7df9ab9 Compare May 18, 2023 22:50

SlyEcho added 2 commits May 19, 2023 16:44

Fix a very noobish C mistake

5c9b45c

Oops!

turning off

da3d60f

Merge 'origin/master' into steering

95dc4d7

SlyEcho mentioned this pull request Jun 3, 2023

What would it take to 100x the context window? #799

Closed

SlyEcho mentioned this pull request Jun 29, 2023

Question: How to access feature vector of the intermediate layer of network? #2047

Closed

walking-octopus mentioned this pull request Jul 22, 2023

Investigate PG-TD (Planning-Guided Transformer Decoding) sampling #2324

Closed

Azeirah mentioned this pull request Jan 10, 2024

Request: Allow for adjustments at the layer-level, for a practically two-fold increase in LLM handling ability by prompters #4843

Closed

Engininja2 mentioned this pull request Jan 24, 2024

Adding topic steering on layers #5119

Closed

Green-Sky mentioned this pull request Mar 10, 2024

Add support for control vectors #5970

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Research] Steering vectors #1472

[Research] Steering vectors #1472

SlyEcho commented May 16, 2023 •

edited

Loading

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

Azeirah commented May 16, 2023

Azeirah commented May 16, 2023 •

edited

Loading

SlyEcho commented May 16, 2023

Azeirah commented May 17, 2023

SlyEcho commented May 17, 2023

Azeirah commented May 18, 2023

SlyEcho commented May 18, 2023

Azeirah commented May 18, 2023

Azeirah commented May 18, 2023

SlyEcho commented May 18, 2023

Azeirah commented May 18, 2023

Azeirah commented May 18, 2023

SlyEcho commented May 18, 2023 •

edited

Loading

Azeirah commented May 18, 2023

Azeirah commented May 18, 2023

Azeirah commented May 18, 2023

Azeirah commented May 18, 2023

Azeirah commented May 18, 2023

Azeirah commented May 18, 2023 •

edited

Loading

SlyEcho commented May 18, 2023

SlyEcho commented May 19, 2023

Azeirah commented May 19, 2023

Azeirah commented May 19, 2023 •

edited

Loading

SlyEcho commented May 19, 2023

FSSRepo commented May 22, 2023

Azeirah commented May 22, 2023

SlyEcho commented May 22, 2023

cryolite-ai commented Sep 18, 2023

Azeirah commented Sep 18, 2023

SlyEcho commented Sep 18, 2023

[Research] Steering vectors #1472

Are you sure you want to change the base?

[Research] Steering vectors #1472

Conversation

SlyEcho commented May 16, 2023 • edited Loading

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

Azeirah commented May 16, 2023

Azeirah commented May 16, 2023 • edited Loading

SlyEcho commented May 16, 2023

Azeirah commented May 17, 2023

SlyEcho commented May 17, 2023

Azeirah commented May 18, 2023

SlyEcho commented May 18, 2023

Azeirah commented May 18, 2023

Azeirah commented May 18, 2023

SlyEcho commented May 18, 2023

Azeirah commented May 18, 2023

Azeirah commented May 18, 2023

SlyEcho commented May 18, 2023 • edited Loading

Azeirah commented May 18, 2023

Azeirah commented May 18, 2023

Azeirah commented May 18, 2023

Azeirah commented May 18, 2023

Azeirah commented May 18, 2023

Azeirah commented May 18, 2023 • edited Loading

SlyEcho commented May 18, 2023

SlyEcho commented May 19, 2023

Azeirah commented May 19, 2023

Azeirah commented May 19, 2023 • edited Loading

SlyEcho commented May 19, 2023

FSSRepo commented May 22, 2023

Azeirah commented May 22, 2023

SlyEcho commented May 22, 2023

cryolite-ai commented Sep 18, 2023

Azeirah commented Sep 18, 2023

SlyEcho commented Sep 18, 2023

SlyEcho commented May 16, 2023 •

edited

Loading

Azeirah commented May 16, 2023 •

edited

Loading

SlyEcho commented May 18, 2023 •

edited

Loading

Azeirah commented May 18, 2023 •

edited

Loading

Azeirah commented May 19, 2023 •

edited

Loading