Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

There is an imbalance between your GPUs #8

Closed
kotchin opened this issue Jul 4, 2019 · 2 comments
Closed

There is an imbalance between your GPUs #8

kotchin opened this issue Jul 4, 2019 · 2 comments

Comments

@kotchin
Copy link

kotchin commented Jul 4, 2019

When executing the step "train psmnet with 4 TITAN X GPUs" there's an annoying warning caused by the necessity of using Python2.7 and pytorch, in full:

torch/nn/parallel/data_parallel.py:25: UserWarning:
There is an imbalance between your GPUs. You may want to exclude GPU 1 which
has less than 75% of the memory or cores of GPU 0. You can do so by setting
the device_ids argument to DataParallel, or by setting the CUDA_VISIBLE_DEVICES
environment variable.

I suspect it could prevent the usage of multi-gpu. Luckily, there's an easy fix for it. Go to the file in question causing the warning (torch/nn/parallel/data_parallel.py), and at the top of the file, add the following line:

from __future__ import division

Try again. In case you run into the following issue:

ImportError: No module named future

Just install future, e.g. using one of the following commands:

conda install future
pip3 install future
@kotchin kotchin changed the title warnings.warn(imbalance_warn.format(device_ids[min_pos], device_ids[max_pos])) There is an imbalance between your GPUs Jul 4, 2019
@mileyan
Copy link
Owner

mileyan commented Jul 5, 2019

The warning shows your GPU 1 is different with your GPU 0. Could you please list your hardware environment?

@kotchin
Copy link
Author

kotchin commented Jul 5, 2019

It says they are different yes, but they aren't. I'm running with 2x RTX 2080 Ti, with the same amount of memory.

I listed above the solution to fix it, which seems to be a problem with Python2 divisions.

I found this fix here.

EDIT: There is nothig wrong with your code, it's a pytorch error. I'm just sharing the fix which seems to have touched other people too.

@kotchin kotchin closed this as completed Jul 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants