Support to LLM like LLAMA-2 and Vicuna? #98

BiEchi · 2023-07-21T09:32:56Z

Dear @jalammar ,
Greetings! I'm writing to check whether there are any updates on whether you'll implement the LLM support these days. I can contribute to this part if you think it might be valuable.

Jack

jalammar · 2023-07-28T00:11:28Z

That would be valuable indeed, if you have the bandwidth for it! Sure!

BiEchi · 2023-07-28T01:05:46Z

Hi @jalammar ,
Ecco is actually compatible with the current LLAMA 2 model because it's stll causal. The config looks like this:

model_config = {
    'embedding': "model.embed_tokens",
    'type': 'causal',
    'activations': ['down_proj'], #This is a regex
    'token_prefix': '_',
    'partial_token_prefix': ''
}

However, as the model becomes larger and larger, Ecco is occupying a significant amount of GPU memory. I'd like to contribute to some memory optimization options. Would you like to point to me where Ecco occupies GPU memory?

verazuo · 2023-07-31T13:32:23Z

Thx for this comment! I tried this model config and the pulled ecco library in @BiEchi 's repo, it works well on Vicuna.

EricPeter · 2023-08-11T14:21:56Z

Hi , I am trying to install ecco in google colab but am getting this error Collecting ecco

Using cached ecco-0.1.2-py2.py3-none-any.whl (70 kB)
Collecting transformers~=4.2 (from ecco)
Using cached transformers-4.31.0-py3-none-any.whl (7.4 MB)
Requirement already satisfied: seaborn~=0.11 in /usr/local/lib/python3.10/dist-packages (from ecco) (0.12.2)
Collecting scikit-learn~=0.23 (from ecco)
Using cached scikit-learn-0.24.2.tar.gz (7.5 MB)
Installing build dependencies ... done
Getting requirements to build wheel ... done
error: subprocess-exited-with-error

× Preparing metadata (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.
Preparing metadata (pyproject.toml) ... error
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

And I can't seem to find a way to fix it .

Dongximing · 2023-08-25T16:31:46Z

Hi , I just wandering is 'embedding': "model.embed_tokens" ? or 'model.embed_tokens.weight'
could you give me a sample code and how did you use it ?

thanks a lot

BiEchi · 2023-08-25T17:02:07Z

@EricPeter Pls open a separate issue for this.

BiEchi · 2023-08-25T17:09:34Z

@Dongximing


text = """The first presient of US is """

print("===== Attribution Method =====")
attribution_method = 'dl'
print(attribution_method)
tokenizer = AutoTokenizer.from_pretrained(model, torch_dtype=dtype)

model_config = {
    'embedding': "model.embed_tokens",
    'type': 'causal',
    'activations': ['down_proj'],
    'token_prefix': '_',
    'partial_token_prefix': ''
}

lm = ecco.from_pretrained(model,
                          activations=False,
                          model_config=model_config,
                          gpu=False
                          )

BiEchi · 2023-08-25T17:16:49Z

Hi @jalammar,
I'm closing this issue as I've got some conclusive results. Gradient-based saliency methods are computationally heavy at LLM generation because each token takes too much GPU memory due to the back prop process, and the time consumption is also stunningly high.
The performance of gradient-based saliency methods is also extremely low on LLMs, because gradient-based methods assumes that the model is simulatable to its first-order Taylor series, i.e., its linear function (affined model). Thus, when the model gets more complex and unliearlized, the results of gradient-based methods drop significantly. I've tested naive gradient, integrated gradient and input * gradient methods, and none of them perform well on LLAMA-2 (7B), but the results are exceptionally good on GPT-2 (1.5B).
Thus, I'd like to suggest people following this thread give up applying gradient-based methods to LLAMA-2. Perturbing-based methods may make a difference, but I'll make a different thread to discuss about it.

BiEchi · 2023-08-25T17:19:21Z

@verazuo Pls correct me if you see any good results applying these methods on LLMs. I can reopen this issue if it's still promising.

Dongximing · 2023-08-28T15:03:35Z

Hi @jalammar, I'm closing this issue as I've got some conclusive results. Gradient-based saliency methods are computationally heavy at LLM generation because each token takes too much GPU memory due to the back prop process, and the time consumption is also stunningly high. The performance of gradient-based saliency methods is also extremely low on LLMs, because gradient-based methods assumes that the model is simulatable to its first-order Taylor series, i.e., its linear function (affined model). Thus, when the model gets more complex and unliearlized, the results of gradient-based methods drop significantly. I've tested naive gradient, integrated gradient and input * gradient methods, and none of them perform well on LLAMA-2 (7B), but the results are exceptionally good on GPT-2 (1.5B). Thus, I'd like to suggest people following this thread give up applying gradient-based methods to LLAMA-2.
Perturbing-based methods may make a difference, but I'll make a different thread to discuss about it.

Hi @BiEchi
the result is just a normal distribution, for example, "please tell me this sentence positive or negative: I love you. the answer is [model output]" ， I got a result. is there some benchmark to evaluate the performance of model explanation? or How do you think this result?
thank you very much, btw as you said linear is not good, could you recommend me some other algorithm in this ecco library?

BiEchi · 2023-08-28T17:17:32Z

@Dongximing this result makes sense to an LLM because saliency/integrated saliency methods perform extremely badly on complex models. The reason is that they're not developed to interpret LLMs. When proposed, these methods were just used for small models like CNN and at most LSTM. Later it got worked on GPT-2 as the model is still not too complex. When it comes to LLAMA, backbrop becomes extremely expensive and unreliable due to the linearality assumption of the saliency methods.
For other algorithms, I'll come back to you in several days.

Dongximing · 2023-09-07T16:17:03Z

Hi BiEchi
is that possible cause out of memory ? when the number of output increase . such as when the max_new_tokens = 1000 or input_size = 1000. also , do you know other open source to have model explanation in NLP?
thanks

BiEchi · 2023-09-07T17:18:19Z

Hi @Dongximing yes, large models are significantly more costly than small models because backprop is several times larger and more computational heavy. GPT-2 is 1.5B, while LLAMA-2 is 7B.
When the input sequence increases, each output token needs to attribute to more input tokens, so the memory usage increases. For output sequence, it's very obvious that the memory and time increases when it's longer, because you do one backprop for each output token.
For other saliency methods, I'd suggest to use lime or shap provided by Captum. You can write some code in Ecco to suit that. I've got it working on my hand using Ecco. If you're interested pls leave comments here. If there are lots of likes for this comment, I'll try to clean my code and release it for your good.

Dongximing · 2023-09-07T17:50:00Z

if possible, please share it. thanks

jalammar · 2023-09-18T13:13:07Z

A new method that could work better with these models is Contrastive Explanations (https://arxiv.org/abs/2202.10419). You can try an implementation of it in Inseq https://github.com/inseq-team/inseq

BiEchi closed this as completed Aug 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support to LLM like LLAMA-2 and Vicuna? #98

Support to LLM like LLAMA-2 and Vicuna? #98

BiEchi commented Jul 21, 2023

jalammar commented Jul 28, 2023

BiEchi commented Jul 28, 2023 •

edited

Loading

verazuo commented Jul 31, 2023

EricPeter commented Aug 11, 2023

Dongximing commented Aug 25, 2023

BiEchi commented Aug 25, 2023

BiEchi commented Aug 25, 2023

BiEchi commented Aug 25, 2023 •

edited

Loading

BiEchi commented Aug 25, 2023

Dongximing commented Aug 28, 2023

BiEchi commented Aug 28, 2023

Dongximing commented Sep 7, 2023

BiEchi commented Sep 7, 2023

Dongximing commented Sep 7, 2023

jalammar commented Sep 18, 2023

Support to LLM like LLAMA-2 and Vicuna? #98

Support to LLM like LLAMA-2 and Vicuna? #98

Comments

BiEchi commented Jul 21, 2023

jalammar commented Jul 28, 2023

BiEchi commented Jul 28, 2023 • edited Loading

verazuo commented Jul 31, 2023

EricPeter commented Aug 11, 2023

Dongximing commented Aug 25, 2023

BiEchi commented Aug 25, 2023

BiEchi commented Aug 25, 2023

BiEchi commented Aug 25, 2023 • edited Loading

BiEchi commented Aug 25, 2023

Dongximing commented Aug 28, 2023

BiEchi commented Aug 28, 2023

Dongximing commented Sep 7, 2023

BiEchi commented Sep 7, 2023

Dongximing commented Sep 7, 2023

jalammar commented Sep 18, 2023

BiEchi commented Jul 28, 2023 •

edited

Loading

BiEchi commented Aug 25, 2023 •

edited

Loading