Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DLRM-comms benchmark doesn't have device attribute #12

Closed
Sergei-Lebedev opened this issue Nov 2, 2020 · 4 comments
Closed

DLRM-comms benchmark doesn't have device attribute #12

Sergei-Lebedev opened this issue Nov 2, 2020 · 4 comments
Assignees

Comments

@Sergei-Lebedev
Copy link
Contributor

PyTorchNCCLBackend expects to have device attribute in commsParams (6e571fa). This is not the case for DLRM benchmark because first it doesn't have such parameter and second it passes dictionary to PyTorchNCCLBackend.

Traceback (most recent call last):
  File "./dlrm.py", line 1129, in <module>
    main()
  File "./dlrm.py", line 1125, in main
    dlrmBench.runBench(mpi_env_params, comms_world_info, args)
  File "./dlrm.py", line 1048, in runBench
    local_rank, global_rank, world_size, group, curDevice = comms_utils.get_rank_details(self.backendFuncs)
  File "/wspace/param/train/comms/pt/comms_utils.py", line 99, in get_rank_details
    curDevice = backendFuncs.get_device()
  File "/wspace/param/train/comms/pt/pytorch_nccl_backend.py", line 213, in get_device
    my_dev = torch.device(self.commsParams.device)
AttributeError: 'dict' object has no attribute 'device'
@kingchc
Copy link
Contributor

kingchc commented Nov 2, 2020

@Sergei-Lebedev Thanks for reporting this. You're right, device option was added recently for comms.py, we will fix this for dlrm benchmark shortly.

kingchc added a commit to kingchc/param-1 that referenced this issue Nov 3, 2020
Summary: Add a quick workaround of device selection for DLRM benchmark, which only support cuda now. Fix facebookresearch#12

Differential Revision: D24690816

fbshipit-source-id: 8ea2f86a166d8f7c3e3f1a49db5fa52ad6a17408
@kingchc
Copy link
Contributor

kingchc commented Nov 3, 2020

@Sergei-Lebedev could you try #15 and see if it fixes the issue for you?

@Sergei-Lebedev
Copy link
Contributor Author

@kingchc yes, it solves the issue, thanks for a quick fix!

@amrragab8080
Copy link

Fixed for me as well!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants