You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It seems like, when I save fsdp model, transformers/accelerator don't help me to create the parent folder 'xxxx/checkpoint-4'. When I downgrade the transformers and the accelerate's version, it works, and when I manually create the 'xxx/checkpoint-4' before saving, it also works.
It seems like, when I save fsdp model, transformers/accelerator don't help me to create the parent folder 'xxxx/checkpoint-4'. When I downgrade the transformers and the accelerate's version, it works, and when I manually create the 'xxx/checkpoint-4' before saving, it also works.
The text was updated successfully, but these errors were encountered:
System Info
I use pytorch==2.0 fsdp fully-shard
If I use transformers==4.29.1, accelerate==0.19.0, things works well:
When I switch to transformers==4.30 accelerate==0.20.0 when I save the model, I got the following error
It seems like, when I save fsdp model, transformers/accelerator don't help me to create the parent folder 'xxxx/checkpoint-4'. When I downgrade the transformers and the accelerate's version, it works, and when I manually create the 'xxx/checkpoint-4' before saving, it also works.
Who can help?
@sgugger
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
pytorch==2.0
transformers==4.30.0
accelerate==0.20.3
Trainer using FSDP fully shard, modified from train_clm.py example
Expected behavior
I use pytorch==2.0 fsdp fully-shard
If I use transformers==4.29.1, accelerate==0.19.0, things works well:
When I switch to transformers==4.30 accelerate==0.20.0 when I save the model, I got the following error
It seems like, when I save fsdp model, transformers/accelerator don't help me to create the parent folder 'xxxx/checkpoint-4'. When I downgrade the transformers and the accelerate's version, it works, and when I manually create the 'xxx/checkpoint-4' before saving, it also works.
The text was updated successfully, but these errors were encountered: