-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support to LLM like LLAMA-2 and Vicuna? #98
Comments
That would be valuable indeed, if you have the bandwidth for it! Sure! |
Hi @jalammar , model_config = {
'embedding': "model.embed_tokens",
'type': 'causal',
'activations': ['down_proj'], #This is a regex
'token_prefix': '_',
'partial_token_prefix': ''
} However, as the model becomes larger and larger, Ecco is occupying a significant amount of GPU memory. I'd like to contribute to some memory optimization options. Would you like to point to me where Ecco occupies GPU memory? |
Thx for this comment! I tried this model config and the pulled ecco library in @BiEchi 's repo, it works well on Vicuna. |
Hi , I am trying to install ecco in google colab but am getting this error Collecting ecco
And I can't seem to find a way to fix it . |
Hi , I just wandering is 'embedding': "model.embed_tokens" ? or 'model.embed_tokens.weight' thanks a lot |
@EricPeter Pls open a separate issue for this. |
|
Hi @jalammar, |
@verazuo Pls correct me if you see any good results applying these methods on LLMs. I can reopen this issue if it's still promising. |
Hi @BiEchi |
@Dongximing this result makes sense to an LLM because saliency/integrated saliency methods perform extremely badly on complex models. The reason is that they're not developed to interpret LLMs. When proposed, these methods were just used for small models like CNN and at most LSTM. Later it got worked on GPT-2 as the model is still not too complex. When it comes to LLAMA, backbrop becomes extremely expensive and unreliable due to the linearality assumption of the saliency methods. |
Hi BiEchi |
Hi @Dongximing yes, large models are significantly more costly than small models because backprop is several times larger and more computational heavy. GPT-2 is 1.5B, while LLAMA-2 is 7B. |
if possible, please share it. thanks |
A new method that could work better with these models is Contrastive Explanations (https://arxiv.org/abs/2202.10419). You can try an implementation of it in Inseq https://github.com/inseq-team/inseq |
Dear @jalammar ,
Greetings! I'm writing to check whether there are any updates on whether you'll implement the LLM support these days. I can contribute to this part if you think it might be valuable.
Jack
The text was updated successfully, but these errors were encountered: