Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: Error(s) in loading state_dict for ResNet #9

Open
dszpr opened this issue Jan 9, 2024 · 7 comments
Open

RuntimeError: Error(s) in loading state_dict for ResNet #9

dszpr opened this issue Jan 9, 2024 · 7 comments

Comments

@dszpr
Copy link

dszpr commented Jan 9, 2024

Hi! Much appreciated for the excellent work!
When doing instruction finetuning, I encountered an error:

WARNING:root:Pytorch pre-release version 1.14.0a0+410ce96 - assuming intent to test it
/usr/local/lib/python3.8/dist-packages/diffusers/models/cross_attention.py:30: FutureWarning: Importing from cross_attention is deprecated. Please import from diffusers.models.attention_processor instead.
deprecate(
| distributed init (rank 0, world 1): env://
[1704792300.373239] [7771d2eff014:2391 :f] vfs_fuse.c:281 UCX ERROR inotify_add_watch(/tmp) failed: No space left on device
Traceback (most recent call last):
File "train.py", line 103, in
main()
File "train.py", line 94, in main
model = task.build_model(cfg)
File "/workspace/code/LMDrive/LAVIS/lavis/tasks/drive.py", line 35, in build_model
return model_cls.from_config(model_config)
File "/workspace/code/LMDrive/LAVIS/lavis/models/drive_models/drive.py", line 575, in from_config
model = cls(
File "/workspace/code/LMDrive/LAVIS/lavis/models/drive_models/drive.py", line 87, in init
self.visual_encoder.load_state_dict(pretrain_weights, strict=True)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1918, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for ResNet:
Missing key(s) in state_dict:...

In your original config file "notice_llava15_visual_encoder_r50_seq40.yaml",
preception_model: memfuser_baseline_e1d3_return_feature
It would cause "RuntimeError: Unknown model (memfuser_baseline_e1d3_return_feature)"
So I changed 'memfuser_baseline_e1d3_return_feature' into 'resnet50', and the above 'RuntimeError: Error(s) in loading state_dict for ResNet:' occurred. Do you know how to fix this?
I noticed that there is another error:"vfs_fuse.c:281 UCX ERROR inotify_add_watch(/tmp) failed: No space left on device". Does it have something to do with my failure?
Many thanks and looking forward to your reply.

@deepcs233
Copy link
Collaborator

Hi!
It looks like you have not installed the vision encoder correctly. The model name should be memfuser_baseline_e1d3_return_feature, instead of ResNet.

"vfs_fuse.c:281 UCX ERROR inotify_add_watch(/tmp) failed: No space left on device"

Maybe you don't have enough disk space?

You can try the following steps:

  1. pip uninstall timm
  2. cd vision_encoder/
  3. python setup.py develop

@dszpr
Copy link
Author

dszpr commented Jan 10, 2024

Hi!
After uninstall timm and 'python setup.py develop' the vision_encoder, I just can't find timm module:
ModuleNotFoundError: No module named 'timm'
I also tried to use 'pip install -e .' to install vision_encoder, and the log printed 'Successfully installed timm'. However, I just can't find timm module by 'pip list'. So it seems the module wasn't installed properly anyway.
I conduct the project in DOCKER instead of CONDA environment, does this has something to do with the failure of installing the vision_encoder?

@deepcs233
Copy link
Collaborator

Hi!
I just created a blank conda env and installed the package. It's ok to run the following script:

import timm
timm.create_model('memfuser_baseline_e1d3_return_feature')

Maybe you need to create a conda env in your docker?

@deepcs233
Copy link
Collaborator

Also, I have updated the Setup section in readme and some files in the repo. Please use the latest version.

@kongdehong
Copy link

I run the following scripts and it has an Unknown Model error. Is it because timm's model name changed.
import timm timm.create_model('memfuser_baseline_e1d3_return_feature')

@Dagoli
Copy link

Dagoli commented Jun 3, 2024

pip install timm==0.4.13 @dszpr @deepcs233

1 similar comment
@Dagoli
Copy link

Dagoli commented Jun 3, 2024

pip install timm==0.4.13 @dszpr @deepcs233

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants