Preventing initialization of siglip's lecun_normal_, default_flax_embed_init in ZeRO3#43574
Preventing initialization of siglip's lecun_normal_, default_flax_embed_init in ZeRO3#43574vasqu merged 21 commits intohuggingface:mainfrom
Conversation
lecun_normal_, default_flax_embed_init
vasqu
left a comment
There was a problem hiding this comment.
The changes look good to me, since a few models already use this initialization, I'd rather we move this to the init file (so we don't repeat it for other models in the future)
vasqu
left a comment
There was a problem hiding this comment.
Lgtm, let's add a small test please
|
Also you can run |
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
|
@vasqu |
…d update references
…`: remove redundant `init.` prefix for clarity
…date to use `init` namespace for clarity
vasqu
left a comment
There was a problem hiding this comment.
Perfect let's add a small test to deepspeed and we are ready to go
|
@vasqu I originally planned to import initialization.py directly and test it, but that didn't seem very meaningful. Since this issue ultimately occurs when loading the model, I instead loaded the SigLIP model directly to see whether the latest version of transformers raises an error. For now, it works fine this way.... I'm not sure what you think. Let me know if there's anything you want me to change. |
vasqu
left a comment
There was a problem hiding this comment.
Just some smaller things but this looks pretty good! Thanks a lot for taking your time for this appreciate it
tests/deepspeed/test_deepspeed.py
Outdated
| with mockenv_context(**self.dist_env_1_gpu): | ||
| logger = logging.get_logger("transformers.modeling_utils") | ||
| with CaptureLogger(logger) as cl: | ||
| model = AutoModel.from_pretrained("google/siglip-base-patch16-224") |
There was a problem hiding this comment.
The test is looking good I just have one nit here: We do not need to have a full model, we can create a dummy model and save to a temporary directory and load from that - no hub calls needed and should run much faster.
You can also look into test modeling siglip2 file for some dummy values. It's just there to have something small
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
|
@vasqu |
vasqu
left a comment
There was a problem hiding this comment.
LGTM, just changed the comment slightly. Merging now
|
[For maintainers] Suggested jobs to run (before merge) run-slow: phi4_multimodal, siglip, siglip2 |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
thank you! |
What does this PR do?
In the latest version of transformers, when initializing siglip with ZeRO3 applied, the following error occurs:
This error occurs in the process of calculating variance_scaling_ in siglip, from torch's _calculate_fan_in_and_fan_out.
To calculate _calculate_fan_in_and_fan_out, the tensor's size must have at least 2 dimensions.
However, in ZeRO3, since sharding is done in advance, the size becomes 0, causing the error.
Therefore, referring to the code in transformers > initialization.py > normal_, we want to add code that decides whether to initialize based on the
_is_hf_initializedflag through this PR.transformersversion: 5.0.0.dev0I'm using the dev version via
pip install git+https://github.com/huggingface/transformers.git@dfe30827b8ebdd974eb7ce69c7d5d8cf8e6cf852Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@yonigozlan