resume_from_checkpoint cannot load model correctly

**Describe the bug**
when use resume_from_checkpoint to load model for continue train
`
--resume_from_checkpoint xx \
--resume_only_model false
`
error info:
[rank4]: _pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
[rank4]: 	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
[rank4]: 	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
[rank4]: 	WeightsUnpickler error: Unsupported global: GLOBAL deepspeed.runtime.fp16.loss_scaler.LossScaler was not an allowed global by default. Please use `torch.serialization.add_safe_globals([LossScaler])` or the `torch.serialization.safe_globals([LossScaler])` context manager to allowlist this global if you trust this class/function.

if not use resume_from_checkpoint, just full train, it works.


**Your hardware and system info**
`swift.version: 3.5.0
GPU: H20, 
cuda:12.4, 
torch:2.6`



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

resume_from_checkpoint cannot load model correctly #5808

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

resume_from_checkpoint cannot load model correctly #5808

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions