-
Notifications
You must be signed in to change notification settings - Fork 121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extracting EVO representations rather than logits #32
Comments
Here's how I'm currently solving this (adapted from usage in README) :
Note that this is for the model object returned by evo-model, which is an instance of StripedHyena. If you are using Huggingface directly, this is wrapped with StripedHyenaModelForCausalLM, so you need to do |
Thanks @davidkell ! |
@davidkell I tried your code on a A100 40GB using the evo-8k model, embedding the 4-letter sequence in the example costs over 400MB GPU RAM, the model itself needs 13GB. The embedding dimension is 4096. I don't understand why it cost so much memory. 4x4096 BF16 should only take 32KB, right? I tried to embed a 2kb sequence but always ran out of cuda memory. Anyone has a similar problem? |
I had a similar experience. I was able to get inference working for 2k sequences on A100 80GB (e.g. available on Paperspace), although around 2.5-3k I would get OOM. I haven't looked in depth on what is driving the memory requirement |
Quoting from this issue #24:
So I think if you want to generate embeddings for longer sequences, you will need to manually shard on GPUs or setup CPU offloading or something like that |
Hi, thanks for your amazing work!
How can I extract representations rather than logits from the model?
I am using the huggingface version, and I see the model returns
logits' and 'past_key_values
. Could you please explain what's inpast_key_values
and if anything of those can be used as a sequence representation? Or maybe you can suggest other ways to access representations of a model?The text was updated successfully, but these errors were encountered: