Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Research] Steering vectors #1472

Draft
wants to merge 9 commits into
base: master
Choose a base branch
from
Draft

[Research] Steering vectors #1472

wants to merge 9 commits into from

Conversation

SlyEcho
Copy link
Sponsor Collaborator

@SlyEcho SlyEcho commented May 16, 2023

For #1460

Original paper: Steering GPT-2-XL by adding an activation vector

./main -m ... --seed 123 -n 64 \
  --steering-add "Love" \
  --steering-sub "Hate" \
  --steering-source 4 \
  --steering-layer 4 \
  --steering-mul 5 \
  --prompt "I hate you because "

I hate you because I am not here to take care of you.
Love is not about me taking care of you, it's about you taking care of me. The best thing I can do for you, is be the person God made me to be, to love you as He loves you, and to show you

TODO: make a test script for all their examples and try to find the effect of the parameters.


I also wanted to see what the vectors look like so I imported them to Numpy and plotted them:

import numpy as np
from matplotlib import pyplot as plt

steer = np.fromfile("~/src/llama.cpp/build/steering.bin", dtype=np.float32).reshape((512, -1))

fig, ax = plt.subplots(3)
for i in range(0, len(ax)):
    ax[i].imshow(steer[3+i, :].reshape((32, -1)))

image

github-actions[bot]

This comment was marked as resolved.

github-actions[bot]

This comment was marked as resolved.

github-actions[bot]

This comment was marked as resolved.

github-actions[bot]

This comment was marked as resolved.

examples/common.cpp Outdated Show resolved Hide resolved
Repository owner deleted a comment from github-actions bot May 16, 2023
@Azeirah
Copy link
Contributor

Azeirah commented May 16, 2023

It's good to note that the authors of the post said they were going to try this out with vicuna-13B as well, so we can see how it generalizes accross different models

@Azeirah
Copy link
Contributor

Azeirah commented May 16, 2023

Also, from a quick glance through your code I saw that the steering vector retrieval layer is always the same as the steering vector add layer.

They also allow steering vectors sourced from earlier layers to be used at later layers, which might be necessary to get good behavior.

However, the norm of early-layer residual streams is significantly smaller than at later layers (like 20). In particular, we've found a large jump between layers 0 and 2. Let's try sourcing a steering vector from the residual stream just before layer 2, and then adding that layer-2 vector to layer 20.


I also didn't get much response from the higher layer numbers (like 20 in the paper).

Did you source the steering vector from a lower layer? That's what they do.

source = layer 2

Add = layer 20

Not

source = layer 20

add = layer 20

@SlyEcho
Copy link
Sponsor Collaborator Author

SlyEcho commented May 16, 2023

Did you source the steering vector from a lower layer? That's what they do.

I didn't notice that in the article, all mentions of layers are about only one layer and where they inserted it.

But it should be easy to test.

examples/common.cpp Outdated Show resolved Hide resolved
Fix typo

Co-authored-by: Extra Dosages <extradosages@gmail.com>
llama.cpp Show resolved Hide resolved
@Azeirah
Copy link
Contributor

Azeirah commented May 17, 2023

I tried the code as-is and the parameters are clearly affecting the output, just not steering it.

I ran through the code and if I understand it correctly, I think it's not computing the steering vector as is described in the post. Let me know if you understand what I mean and if you agree or not.

@SlyEcho
Copy link
Sponsor Collaborator Author

SlyEcho commented May 17, 2023

I know that it is computing something because I added a dump of the vector to the disk and the arithmetic seems to be working, add, substract, positive and negative coefficient seem to be changing the vector as expected.

I think maybe there is some difference between GPT-2 and LLaMa, that makes it not work as-is, could be that it needs a small tweak or something?

@Azeirah
Copy link
Contributor

Azeirah commented May 18, 2023

It's possible. I'm experimenting with different inputs and layer sources and targets. It's clearly affecting the output, but it just seems kind of random so far

@SlyEcho
Copy link
Sponsor Collaborator Author

SlyEcho commented May 18, 2023

I was not really seeing anything working until I used a fixed seed, otherwise the results are too random.

I will try to test again over the weekend in some automated way.

@Azeirah
Copy link
Contributor

Azeirah commented May 18, 2023

Note that greedy sampling is also assumed here, as unembedding produces a distribution over next tokens, not a unique next-token prediction.

I think it's also important because of this note to use greedy sampling.

--top_p 0 --top_k 1

@Azeirah
Copy link
Contributor

Azeirah commented May 18, 2023

An author of the post confirmed that their method works well with vicuna 13B: https://www.lesswrong.com/posts/5spBue2z2tw4JuDCx/steering-gpt-2-xl-by-adding-an-activation-vector?commentId=eket7tugMDJgBYfwP

I tried LLaMa 13B but getting similarly poor results as LLaMa 7B.

It makes me think there is something missing in this implementation, not sure what

@SlyEcho
Copy link
Sponsor Collaborator Author

SlyEcho commented May 18, 2023

I found this notebook also linked in the article: https://colab.research.google.com/drive/1y84fhgkGX0ft2DmYJB3K13lAyf-0YonK

Gonna look it over.

@Azeirah
Copy link
Contributor

Azeirah commented May 18, 2023

I found this notebook also linked in the article: https://colab.research.google.com/drive/1y84fhgkGX0ft2DmYJB3K13lAyf-0YonK

Gonna look it over.

Can you also check the review comment I wrote? I do think I found an actual mistake this time.

Fixing it didn't improve the results though :(

@Azeirah
Copy link
Contributor

Azeirah commented May 18, 2023

Is what they call the residual stream the same as what LLaMa source code calls inpL?

@SlyEcho
Copy link
Sponsor Collaborator Author

SlyEcho commented May 18, 2023

I think it's actually more close to inpSA. But they are the same in the beginning.
A residual is where the input is added again to the output of the attention block.

EDIT: actually, modifying inpSA would give a much different result. It could be significant

@Azeirah
Copy link
Contributor

Azeirah commented May 18, 2023

I think it's actually more close to inpSA. But they are the same in the beginning. A residual is where the input is added again to the output of the attention block.

EDIT: actually, modifying inpSA would give a much different result. It could be significant

It's clearly doing more work that modifiny inpL. I modified the code to do so, see my output:

Note that I printed the add and sub prompts to stdout. Clearly it's "thinking" about business in some way. The quality's not quite there but I think this is getting a lot closer!

afbeelding

@Azeirah
Copy link
Contributor

Azeirah commented May 18, 2023

It's very clearly being steered!

./main --model models/7B/ggml-model-f16.bin --prompt "I think criminals are" --steering-add " Criminals earn money" --steering-sub " Criminals do bad things" --steering-source 6 --steering-layer 30 --steering-mul 5 --seed 554 --temp 1.0 --top_p 0.3 --frequency-penalty 1.0

system_info: n_threads = 12 / 24 | AVX = 1 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 | 
Steering: ` Criminals earn money` - ` Criminals do bad things` * 5
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 1.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.300000, typical_p = 1.000000, temp = 1.000000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 512, n_batch = 512, n_predict = -1, n_keep = 0


 I think criminals are more likely to be in the middle class than poor.
I'm not sure what you mean by "middle class". If it means a person who is employed and has some money, then yes they would have less crime because of that. But if it means someone with

@Azeirah
Copy link
Contributor

Azeirah commented May 18, 2023

So far I'm getting the best results with --steering-source 6 --steering-layer 6, as they described in the post it looks like the earlier layers are doing the "steering".

@Azeirah
Copy link
Contributor

Azeirah commented May 18, 2023

Ok I'm getting really good results now:

./main --model models/7B/ggml-model-f16.bin --prompt "In RGB space, magenta is the halfway point between " --steering-add "Python" --steering-sub "Ruby" --steering-source 5 --steering-layer 5 --steering-mul 5 --temp 1.0 --top_p 0.3 --frequency-penalty 1.0
main: build = 557 (1b0ff2c)
main: seed  = 1684446143
llama.cpp: loading model from models/7B/ggml-model-f16.bin
llama_model_load_internal: format     = ggjt v1 (pre #1405)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 1 (mostly F16)
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size =  72.75 KB
llama_model_load_internal: mem required  = 14645.09 MB (+ 1026.00 MB per state)
llama_model_load_internal: [cublas] offloading 0 layers to GPU
llama_model_load_internal: [cublas] total VRAM used: 0 MB
llama_init_from_file: kv self size  =  256.00 MB

system_info: n_threads = 12 / 24 | AVX = 1 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 | 
Steering: `Python` - `Ruby` * 5
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 1.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.300000, typical_p = 1.000000, temp = 1.000000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 512, n_batch = 512, n_predict = -1, n_keep = 0


 In RGB space, magenta is the halfway point between 0 and 1.
The Python package matplotlib has a function called rgb2hex that converts an RGB color to its hexadecimal representation (e.g., #ff7f50). The following code snippet shows how you can use this function:

It's very very clearly being steered towards talking about programming in python. Before this I was using the Love - Hate vector and it was talking about how magenta is the color of Love, or the color of passionate-loving all the time.

I'll post my changes in a separate branch so you can compare, not all changes are relevant.

@Azeirah
Copy link
Contributor

Azeirah commented May 18, 2023

steering...Azeirah:llama.cpp:steering

@Azeirah
Copy link
Contributor

Azeirah commented May 18, 2023

I even got it to read a question in English and answer in Dutch (my language)

It's not at all reliable but it's very clearly being steered into the direction of including a dutch answer

./main --model models/7B/ggml-model-f16.bin --prompt "Are humans animals?" --steering-add "Ik ben een Nederlander" --steering-sub " " --steering-source 3 --steering-layer 8 --steering-mul 3 --temp 1.0 --top_p 0.3 --frequency-penalty 1.0

Here are a couple of outputs that I found interesting

 Are humans animals? v. 1
Ik ben een mens, ik ben geen dier! (Dutch)
Mensch, du bist kein Tier! (German)
Je ne suis pas un animal ! (French)
Non sono una bestia! (Italian) [end of text]
 Are humans animals? menselijk is niet een soort, maar de manier waarop we leven.
Ik ben ervan overtuigd dat ik mijn hele leven lang nog geen enkele keer zo'n goed artikel zal vinden als dit! [end of text]

(Being human is not a "sort", but a way of living. I'm convinced this is the last time in my life I'll find an article this good)

Here I lowered the temperature from 1.0 to 0.7, getting a really good answer.

 Are humans animals? menselijk is niet zo goed als we denken
Ik heb een boek gelezen van de Amerikaanse psycholoog Robert Sapolsky, die onderzoekt hoe het menselijke brein werkt. Hij heeft zich in zijn leven beziggehouden met veel verschillende dieren en is daarbij opvallend veel aandacht aan de mens gewijd.
Het boek heet "Being human" (2017) en gaat over hoe het brein werkt, maar ook over wat er in ons    	 	 
zelf aan de basis ligt van menselijke gedragingen. Hij geeft een heel goed beeld hoe het brein werkt en wat er in ons zelf aan de basis ligt van veel dierlijke, maar ook menselijke gedragingen. Zo blijft hij zich bijvoorbeeld steeds terugtrekken naar zijn onderzoek met apen om te zien of daar een vergelijking kan worden gemaakt tussen die en onze eigen fysiologie.

Translated by gpt-4 (I need to go to sleep otherwise I'd translate it myself xD)

Are humans animals? Human-like qualities are not as good as we think
I have read a book by the American psychologist Robert Sapolsky, who investigates how the human brain works. In his life, he has dealt with many different animals and has paid particular attention to humans.
The book is called "Being Human" (2017) and is about how the brain works, but also about what lies at the basis of human behavior within ourselves. He provides a very good picture of how the brain works and what lies at the basis of many animal, but also human behaviors. For example, he keeps retreating back to his research with monkeys to see if a comparison can be made between them and our own physiology.

My god, it's actually getting really clever in here:

Steering: `Python` - `Ruby` * 3
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 1.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.300000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 512, n_batch = 512, n_predict = -1, n_keep = 0


 Are humans animals?
I'm not sure if this is the right place to ask, but I was wondering what people think about it. Do you consider Python a programming language or an animal? If so, why do you say that and how would you explain your answer to someone who doesn't know anything about computers/programming languages?

Azeirah and others added 2 commits May 19, 2023 01:35
Signed-off-by: Henri Vasserman <henv@hot.ee>
@SlyEcho
Copy link
Sponsor Collaborator Author

SlyEcho commented May 18, 2023

Awesome stuff. 🚀

I even got it to read a question in English and answer in Dutch (my language)

They didn't manage it with GPT-2 but seems like LLaMa is much better.

I need to go to sleep otherwise I'd translate it myself

Me too and I'm an hour later from you


Some more experimentation ideas:

  • adding more than one vector pair
  • try to add previous context to keep memory when context fills

@SlyEcho
Copy link
Sponsor Collaborator Author

SlyEcho commented May 19, 2023

Wedding example works now:

main -m ../models/llama-7b-q4_0.bin -n 64 --seed 123 \
  --steering-add "I talk about weddings constantly" \
  --steering-sub "I do not talk about weddings constantly" \
  --steering-source 5 \
  --steering-layer 5 \
  --steering-mul 3 \
  --prompt "I went up to my friend and said, '"

I went up to my friend and said, 'Weddings are not just for the bride and groom. Weddings are also for their friends.'
And then I realized: I talk about weddings all of the time! How did I not notice? And I started thinking, 'Well, who am I to be talking about this subject

Something that you have to take care of: the prompt has to be longer than the steering, otherwise it can cause interference. That's why I had to add , ' to the end of the example because the model would not add a space there being influenced by the steering vector.

@Azeirah
Copy link
Contributor

Azeirah commented May 19, 2023

Something that you have to take care of: the prompt has to be longer than the steering, otherwise it can cause interference. That's why I had to add , ' to the end of the example because the model would not add a space there being influenced by the steering vector.

Ooh that explains a lot of the weirder outputs I was seeing. A lot of them were copying steering input verbatim in a strange way and then continuing as if that never happened..

I wonder if that affects the idea of possibly being able to inject something like a system prompt in the steering though. It's something I want to look into.

@Azeirah
Copy link
Contributor

Azeirah commented May 19, 2023

What would also be really interesting is to see whether cached residual streams from smaller models would work in larger models.

I'll have a look at the architecture of llama to see if that idea makes any sense.

It would be insane if we'd be able to use really fast, small models to perform the steering for slower, larger models.

Edit: this wouldn't likely work well. Each llama version is its own independently trained model. It's not the case that they only trained one large model and somehow pruned it to smaller ones like my idea depends on.

This "might" still work if the earlier layers emerged to be similar because of how machine learning models converge to similar lower layers (I recall from image recognition models that the lower layers basically always end up with the same features. Recognizing basic schapes, lines, orientation etc)

It's not super likely though, since the ordering of the layer is most likely still completely different even if the features it detects are similar.

Still wouldn't hurt to try though but not getting my hopes up.

@SlyEcho
Copy link
Sponsor Collaborator Author

SlyEcho commented May 19, 2023

I don't think it's likely to work. The training usually starts from a random state. The more important factor is that the embedding size is different: 4096 for 7B, 5120 for 13B etc.

@FSSRepo
Copy link
Collaborator

FSSRepo commented May 22, 2023

--steering-source and --steering-layer, are the parameters random or is there a way to know which is which? Trial and error?

@Azeirah
Copy link
Contributor

Azeirah commented May 22, 2023

--steering-source and --steering-layer, are the parameters random or is there a way to know which is which? Trial and error?

It's not random, but it takes a lot of trial and error to find what works well for a certain use case.

Generally it works well to have both low, at around 6 or 8. But for other usecases you might want to try different values.

@SlyEcho
Copy link
Sponsor Collaborator Author

SlyEcho commented May 22, 2023

I found that most of the time they should be the same. Lower numbers are for the lexical level so you can make it say dirty words, while higher numbers are more abstract which can change more the understanding of things the model has but it is harder to influence it at that level. But it is possible to use the vector extracted from one layer and use it in another, too, sometimes it works.

But really, there is a lot to research here. None of it is very rigorous yet. I haven't had time to do more testing.

@cryolite-ai
Copy link

This is nothing more than an appreciative observer providing a place for people to put thumbs-up emojis to encourage @Azeirah and @SlyEcho to remember and further explore this really interesting thread of research activity..

Finding ways to 'tilt' a model is a super interesting concept (as are things which help visualise the state of the network..)

Best regards from the wider observing llama.cpp community.

@Azeirah
Copy link
Contributor

Azeirah commented Sep 18, 2023

I still think this is a really cool idea, but I'm not sure if context free guidance offers similar benefits? Although it doesn't have a positive prompt I suppose.

  --cfg-negative-prompt PROMPT
                        negative prompt to use for guidance. (default: empty)

I also think there is still a lot to be explored with steering vectors, especially in the area of stacking steering vectors.

IE what happens if you add a steering vector for "+python -Ruby" and "+teacher explains code -As a large language model" or something like that? The cool thing is lets you set a goal and a personality for your AI without affecting performance and no finetuning needed.

From what I understand, the parameter space of LLMs is so huge that you should be able to just additively stack steering vectors without them affecting each other in unwanted ways too much.

@SlyEcho
Copy link
Sponsor Collaborator Author

SlyEcho commented Sep 18, 2023

It needs to be updated at least.

CFG is similar but it works on the token probability level.

The steering was producing pretty neat results but it had the limitation that the length of the vector meant the influence was made in the beginning only. Also choosing the layers etc was a bit experimental.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants