You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.
Replace embedding layer if necessary: torch.nn.Embedding(..) -> bnb.nn.Embedding(..)
Does it suppose user creation of custom classes to replace (for example) huggingface transformers' GPT2DoubleHeadsModel?
Or there is something like bnb.optim.GlobalOptimManager which change provided model instance to use bitsandbytes embeddings instead of torch ones?
The text was updated successfully, but these errors were encountered:
Currently, a replacement is required, since the layer also adds an embedding layer. This is critical for pretraining models. If you are fine-tuning then you do not need the StableEmbedding layer (for GLUE, I am not sure for fine-tuning GPT-2 or seq-to-seq).
If you want to use 32-bit optimizers for the embedding, but without layer norm, you can add the following code after the embedding class is defined in the GPT2DoubleHeadsModel:
This will add further stability to the fine-tuning, especially for seq-to-seq or LM fine-tuning. I would recommend replacing the embedding with the StableEmbedding layer if you do pretraining from scratch.
A standard Embedding layer has been added that is very easy to use in place of torch.nn.Embedding. The bnb.nn.Embedding class ensures that optimization happens in 32-bit for the embedding layer, even if the rest of the model is optimized with 8-bit optimizers. Thank you for this suggestion!
Does it suppose user creation of custom classes to replace (for example) huggingface transformers' GPT2DoubleHeadsModel?
Or there is something like
bnb.optim.GlobalOptimManager
which change provided model instance to use bitsandbytes embeddings instead of torch ones?The text was updated successfully, but these errors were encountered: