-
Notifications
You must be signed in to change notification settings - Fork 25.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make _fast_init
fast again (by surely skipping model weights init)!
#26258
Comments
Interesting, could you please describe how you tested this? This sounds like a bug. |
Hi, @BenjaminBossan , This is how to test this slow loading issue: Then try some or all of these:
See my testing notebook as gist here: https://gist.github.com/poedator/792d6c7528a1bc5a84acb550268777ed |
Thanks for providing the context and notebook. I could replicate your results and also confirmed that the model produces the same output in both cases. This smells like a bug to me, maybe @ArthurZucker can take a look. |
definitely interesting, I'll have a look! |
@poedator thanks a lot for the deep investigation - do you observe the same behaviour with |
@younesbelkada, Thank you for your interest in supporting SpQR in the HF ecosystem. Let me discuss with my teammates the best way to do this, and then I will get back to you. |
One possible solution is mentioned here: #18505 |
I met the same issue. And I have another specific scenario, where I want to randomly initialize a large model for debug. So I just want a very fast initialization. I tried: config = AutoConfig.from_pretrained(model_name)
model = AutoModelForCausalLM.from_config(config) I found it is even slower than just loading the weights: model = AutoModelForCausalLM.from_pretrained(model_name, _fast_init=True, low_cpu_mem_usage=True) So I wonder if there is a way to fast initialize a very large model (without any initialization algorithm) using Thank you very much! |
Ouch sorry about that! Was off for a bit, and it's planned! Will try to open a draft PR asap |
Update 🤗 |
On main branch of Transformers, I observe the following:
|
The goal is to still have fast init without accelerate |
I observed that loading pre-trained model takes rather long, even when loading cached models from fast SSD. It is especially noticeable when dealing with LLMs with billions of weights.
Apparently, majority of the time is lost in this section of the code:
Time is spent on weights initialization (by
torch.nn.init.kaiming_uniform_()
and similar) is wasted, because the newly initialized weights will be then replaced by loaded ones. Theno_init_weights
context manager sets_init_weights
global variable, but it gets ignored by model's code (tested on Llama_2_7B).I recently discussed a similar issue with PEFT team, but there it was easier to solve, because in PEFT the init code was dealing with specific torch.nn layer. see huggingface/peft#871 and linked PRs by @BenjaminBossan. Here we need a model-scale solution.
One (not perfectly elegant one) - is to temporarily override methods like torch.nn.init.kaiming_uniform_(). It is used in our SpQR repo:
but there may be better ones, using some native torch tools?
I'd be glad to contribute a PR with the maintainers blessing. Summoning @younesbelkada
System Info
A100-80G + SSD + mucho RAM and kernels.
Who can help?
@younesbelkada
Reproduction
load model, measure timing for this line
Expected behavior
faster loading
The text was updated successfully, but these errors were encountered: