Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in pt_file_model_persistor.py #150

Closed
ilfsn opened this issue Jan 26, 2022 · 3 comments
Closed

Error in pt_file_model_persistor.py #150

ilfsn opened this issue Jan 26, 2022 · 3 comments

Comments

@ilfsn
Copy link

ilfsn commented Jan 26, 2022

I am using NVFLare version 2.0.6
However, when I starting the app on my system (includes 4 clients), the server got error like this:

2022-01-27 04:48:10,374 - ServerRunner - ERROR - [run=1]: Aborting current RUN due to FATAL_SYSTEM_ERROR received: expect model to be torch.nn.Module but got <class 'dict'>
2022-01-27 04:48:10,374 - ServerRunner - INFO - [run=1]: asked to abort - triggered abort_signal to stop the RUN
2022-01-27 04:48:10,374 - ServerRunner - INFO - [run=1]: starting workflow scatter_gather_ctl (<class 'nvflare.app_common.workflows.scatter_and_gather.ScatterAndGather'>) ...
2022-01-27 04:48:10,374 - ScatterAndGather - INFO - [run=1]: Initializing ScatterAndGather workflow.
2022-01-27 04:48:10,374 - PTFileModelPersistor - ERROR - [run=1]: error getting state_dict from model object
Traceback (most recent call last):
  File "/home/jupyter-test/.conda/envs/fl/lib/python3.8/site-packages/nvflare/app_common/pt/pt_file_model_persistor.py", line 202, in load_model
    data = self.model.state_dict() if self.model is not None else OrderedDict()
AttributeError: 'dict' object has no attribute 'state_dict'
2022-01-27 04:48:10,374 - ServerRunner - ERROR - [run=1]: Aborting current RUN due to FATAL_SYSTEM_ERROR received: cannot create state_dict from model object
2022-01-27 04:48:10,374 - ServerRunner - INFO - [run=1]: asked to abort - triggered abort_signal to stop the RUN
2022-01-27 04:48:10,375 - ServerRunner - INFO - [run=1]: Workflow scatter_gather_ctl (<class 'nvflare.app_common.workflows.scatter_and_gather.ScatterAndGather'>) started
2022-01-27 04:48:10,375 - ScatterAndGather - INFO - [run=1, wf=scatter_gather_ctl]: Beginning ScatterAndGather training phase.
2022-01-27 04:48:10,375 - ScatterAndGather - INFO - [run=1, wf=scatter_gather_ctl]: Abort signal received. Exiting at round 0.
2022-01-27 04:48:10,375 - ServerRunner - INFO - [run=1, wf=scatter_gather_ctl]: Workflow: scatter_gather_ctl finalizing ...
2022-01-27 04:48:12,877 - ServerRunner - INFO - [run=1, wf=scatter_gather_ctl]: ABOUT_TO_END_RUN fired
2022-01-27 04:48:12,877 - ServerRunner - INFO - [run=1, wf=scatter_gather_ctl]: END_RUN fired
2022-01-27 04:48:12,878 - ServerRunner - INFO - [run=1, wf=scatter_gather_ctl]: Server runner finished.
2022-01-27 04:48:13,376 - FederatedServer - INFO - Server app stopped.

Please help me resolving this problem, thank you.

@ilfsn
Copy link
Author

ilfsn commented Jan 26, 2022

Its the incorrect configuration in server config json file that caused the error.

@ilfsn ilfsn closed this as completed Jan 26, 2022
@LSnyd
Copy link

LSnyd commented Mar 10, 2022

Hi @ilfsn,
could you share how you solved this problem?
I am running into the same error. Thanks!

@ilfsn
Copy link
Author

ilfsn commented Mar 12, 2022

Hi @ilfsn, could you share how you solved this problem? I am running into the same error. Thanks!

Hi there, I guess the problem is you might forget to declare the name of network python class in the json file. For instance, I coded my python network class (inherited Pytorch Resnet) by the name FXRRes50 in the fxrres_nets.py inside the networks folder, then the component section in json config file should include:

{
  "id": "persistor",
  "name": "PTFileModelPersistor",
  "args": {
    "model": {
      "path": "networks.fxrres_nets.FXRRes50",
    "args": {}
    }
  }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants