Skip to content

Error reloading model from checkpoint #558

@lyyc199586

Description

@lyyc199586

I try to reload the saved model by:

saving:

model.save_checkpoint("./model", save_name="fno")

and load:

model_reload = FNO.from_checkpoint('./model', save_name="fno")

get error:

---------------------------------------------------------------------------
UnpicklingError                           Traceback (most recent call last)
Cell In[9], [line 12](vscode-notebook-cell:?execution_count=9&line=12)
      [1](vscode-notebook-cell:?execution_count=9&line=1) # reload model
      [2](vscode-notebook-cell:?execution_count=9&line=2) # model_reload = FNO(
      [3](vscode-notebook-cell:?execution_count=9&line=3) #     n_modes=(16,16),
   (...)
      [9](vscode-notebook-cell:?execution_count=9&line=9) # model_reload.load_state_dict(torch.load("./model/fno.pt", weights_only=False))
     [10](vscode-notebook-cell:?execution_count=9&line=10) # model_reload.eval()
---> [12](vscode-notebook-cell:?execution_count=9&line=12) model_reload = FNO.from_checkpoint('./model', save_name="fno")

File C:\workspace\no_playground\neuraloperator\neuralop\models\base_model.py:179, in BaseModel.from_checkpoint(cls, save_folder, save_name, map_location)
    [176](file:///C:/workspace/no_playground/neuraloperator/neuralop/models/base_model.py:176)     init_args = []
    [177](file:///C:/workspace/no_playground/neuraloperator/neuralop/models/base_model.py:177) instance = cls(*init_args, **init_kwargs)
--> [179](file:///C:/workspace/no_playground/neuraloperator/neuralop/models/base_model.py:179) instance.load_checkpoint(save_folder, save_name, map_location=map_location)
    [180](file:///C:/workspace/no_playground/neuraloperator/neuralop/models/base_model.py:180) return instance

File C:\workspace\no_playground\neuraloperator\neuralop\models\base_model.py:159, in BaseModel.load_checkpoint(self, save_folder, save_name, map_location)
    [157](file:///C:/workspace/no_playground/neuraloperator/neuralop/models/base_model.py:157) save_folder = Path(save_folder)
    [158](file:///C:/workspace/no_playground/neuraloperator/neuralop/models/base_model.py:158) state_dict_filepath = save_folder.joinpath(f'{save_name}_state_dict.pt').as_posix()
--> [159](file:///C:/workspace/no_playground/neuraloperator/neuralop/models/base_model.py:159) self.load_state_dict(torch.load(state_dict_filepath, map_location=map_location))

File c:\workspace\no_playground\no\lib\site-packages\torch\serialization.py:1470, in load(f, map_location, pickle_module, weights_only, mmap, **pickle_load_args)
   [1462](file:///C:/workspace/no_playground/no/lib/site-packages/torch/serialization.py:1462)                 return _load(
   [1463](file:///C:/workspace/no_playground/no/lib/site-packages/torch/serialization.py:1463)                     opened_zipfile,
   [1464](file:///C:/workspace/no_playground/no/lib/site-packages/torch/serialization.py:1464)                     map_location,
   (...)
   [1467](file:///C:/workspace/no_playground/no/lib/site-packages/torch/serialization.py:1467)                     **pickle_load_args,
   [1468](file:///C:/workspace/no_playground/no/lib/site-packages/torch/serialization.py:1468)                 )
   [1469](file:///C:/workspace/no_playground/no/lib/site-packages/torch/serialization.py:1469)             except pickle.UnpicklingError as e:
-> [1470](file:///C:/workspace/no_playground/no/lib/site-packages/torch/serialization.py:1470)                 raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
   [1471](file:///C:/workspace/no_playground/no/lib/site-packages/torch/serialization.py:1471)         return _load(
   [1472](file:///C:/workspace/no_playground/no/lib/site-packages/torch/serialization.py:1472)             opened_zipfile,
   [1473](file:///C:/workspace/no_playground/no/lib/site-packages/torch/serialization.py:1473)             map_location,
   (...)
   [1476](file:///C:/workspace/no_playground/no/lib/site-packages/torch/serialization.py:1476)             **pickle_load_args,
   [1477](file:///C:/workspace/no_playground/no/lib/site-packages/torch/serialization.py:1477)         )
   [1478](file:///C:/workspace/no_playground/no/lib/site-packages/torch/serialization.py:1478) if mmap:

UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. 
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL torch._C._nn.gelu was not an allowed global by default. Please use `torch.serialization.add_safe_globals([gelu])` or the `torch.serialization.safe_globals([gelu])` context manager to allowlist this global if you trust this class/function.

Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.

It seems that PyTorch 2.6 is not compatible. what version of torch should neuralop package use?

I can do load using:

# save
torch.save(model.state_dict(), "./model/fno.pt")

# reload
model_reload = FNO(
     n_modes=(16,16),
     in_channels=1,
     out_channels=1,
     hidden_channels=32,
     projection_channel_ratio=2
)
model_reload.load_state_dict(torch.load("./model/fno.pt", weights_only=False))

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions