Improve init_empty_weights to override tensor constructor#699
Improve init_empty_weights to override tensor constructor#699
init_empty_weights to override tensor constructor#699Conversation
|
The documentation is not available anymore as the PR was closed or merged. |
|
I don't really understand why this is needed: while load a pretrained model inside the context manager and complain it takes time? from accelerate import init_empty_weights
from transformers import AutoConfig, AutoModel
config = AutoConfig.from_pretrained("gpt2")
with init_empty_weights():
model = AutoModel.from_config(config)is way faster than 6s |
|
Hum it's doing it 10 times, so 0.6 sec per load. Benchmarking your solution displays the same order of magnitude: Though your workaround removed the need to override |
|
Also I'm not sure why this wasn't detected, but the test |
…de tensor constructor only when include_buffers argument is True
|
Actually if we activate this feature only when |
Summary
init_empty_weightsactually construct tensors incpuand then moves them tometa. Instead we propose to construct tensors inmetadevice directly by overriding default constructors. This is inspired from https://github.com/microsoft/DeepSpeed/blob/c199edac8210e730acfd004c6e2bc3a98c0db903/deepspeed/utils/init_on_device.py This results in a faster loading mechanism with using theinit_empty_weightscontext manager.Additionally we override loading mechanism to return empty dictionary as there's no reason to read the checkpoint since everything is in
meta(This is a hack asmap_location="meta"doesn't work yet). Not sure if that's considered too hacky to be integrated insideaccelerateRunning the following gets accelerated: