Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

demo in docker fails to run #3

Closed
ikvision opened this issue Oct 3, 2019 · 5 comments
Closed

demo in docker fails to run #3

ikvision opened this issue Oct 3, 2019 · 5 comments

Comments

@ikvision
Copy link
Contributor

ikvision commented Oct 3, 2019

When running in the docker pulled
demo.py --checkpoint=data/model_checkpoint.ptt --img=examples/im1010.jpg
I get the following error:
Traceback (most recent call last):
File "demo.py", line 34, in
from models import hmr, SMPL
File "/SPIN/models/init.py", line 1, in
from .hmr import hmr
File "/SPIN/models/hmr.py", line 6, in
from utils.geometry import rot6d_to_rotmat
File "/SPIN/utils/init.py", line 3, in
from .base_trainer import BaseTrainer
File "/SPIN/utils/base_trainer.py", line 8, in
from torch.utils.tensorboard import SummaryWriter
File "/usr/local/lib/python3.6/dist-packages/torch/utils/tensorboard/init.py", line 6, in
from .writer import FileWriter, SummaryWriter # noqa F401
File "/usr/local/lib/python3.6/dist-packages/torch/utils/tensorboard/writer.py", line 18, in
from ._convert_np import make_np
File "/usr/local/lib/python3.6/dist-packages/torch/utils/tensorboard/_convert_np.py", line 12, in
from caffe2.python import workspace
File "/usr/local/lib/python3.6/dist-packages/caffe2/python/workspace.py", line 15, in
from past.builtins import basestring
ModuleNotFoundError: No module named 'past'

@nkolot
Copy link
Owner

nkolot commented Oct 3, 2019

Ok, apparently the package future is not in the docker image but I had it installed locally so it was working. I updated the image. Can you try again now? (make sure to pull the image again)

@ikvision
Copy link
Contributor Author

ikvision commented Oct 4, 2019

Hi, thank you for the quick reply, I pull the updated docker.
Now I think there is a issue with pyrender dependency freetype and pyglet, I installing those using pip.
run fetch_data.sh and using nvidia-docker run chaneyk/spin -it run the demo
I get the following error

Traceback (most recent call last):
File "demo.py", line 105, in
checkpoint = torch.load(args.checkpoint)
File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 387, in load
return _load(f, map_location, pickle_module, **pickle_load_args)
File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 574, in _load
result = unpickler.load()
File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 537, in persistent_load
deserialized_objects[root_key] = restore_location(obj, location)
File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 119, in default_restore_location
result = fn(storage, location)
File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 95, in _cuda_deserialize
device = validate_cuda_device(location)
File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 79, in validate_cuda_device
raise RuntimeError('Attempting to deserialize object on a CUDA '
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location='cpu' to map your storages to the CPU.

Indeed while in the container created by chaneyk/spin - nvidia-smi is empty

While the docker works with gpu fine:
nvidia-docker run nvidia/cuda:9.0-base nvidia-smi shows all the gpu

@nkolot
Copy link
Owner

nkolot commented Oct 4, 2019

What GPU are you using?
So what happens is that PyTorch tries to map the tensors in the checkpoint directly on the GPU and docker cannot see any GPUs which is really strange. In the meantime try the fix in the last line of the error message, i.e. pass map_location='cpu' in torch.load

@nkolot
Copy link
Owner

nkolot commented Oct 4, 2019

Update: You need to run nvidia-docker2

@ikvision
Copy link
Contributor Author

ikvision commented Oct 4, 2019

When using map_location='cpu' and renaming the smpl neutral model:
mv ./data/smpl/basicModel_neutral_lbs_10_207_0_v1.0.0.pkl ./data/smpl/SMPL_NEUTRAL.pkl
It works

Thank you

@ikvision ikvision closed this as completed Oct 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants