Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run code on multiple GPUs #9

Open
shu-hai opened this issue Jun 15, 2017 · 5 comments
Open

run code on multiple GPUs #9

shu-hai opened this issue Jun 15, 2017 · 5 comments

Comments

@shu-hai
Copy link

shu-hai commented Jun 15, 2017

Hi, Julian,
I just start to run your step3_predict_nodules.py using your trained model.
I found it only ran on 1 GPU even I assigned 2 GPUs to it by
os.environ["CUDA_VISIBLE_DEVICES"] = "0,1"
I also muted config.gpu_options.per_process_gpu_memory_fraction = 0.5
because I am allowed to use the 2 GPUs totally, but the speed was still slow.

Could you let me know how to run the code on multiple GPUs? Thanks.

@shu-hai shu-hai changed the title run code on multiple GPU run code on multiple GPUs Jun 15, 2017
@juliandewit
Copy link
Owner

Hello,
I think tensorflow sees no way to distribute the network over multiple GPU's.
Although in theory it should be smart enough to split the batch in 2 parts en run eacht part on a separate GPU.

You could do this manually however.
I cannot type it out for you but every patient needs roughly 30x30x30 (~900) predictions .
If you predict half of them over GPU1 with a network and the other half over GPU2 with another instance of the network you will achieve 2x speedup.

@shu-hai
Copy link
Author

shu-hai commented Jun 17, 2017

Hi, Julian,
On lines 348-349 of step4_train_submissions.py, it is the following:

  if level == 1:
        dst_dir += "level2/"

Why not level1?

@juliandewit
Copy link
Owner

Indeed I also had to look twice after this time.

The level 1 models are combined into level2 folder.
The models in level2 are combined into the submission folder.

@shu-hai
Copy link
Author

shu-hai commented Jun 17, 2017

Also it gives an error on line 23 of step4_train_submissions.py:
mass_df = pandas.read_csv(settings.BASE_DIR + "masses_predictions.csv").
It cannot find the masses_predictions.csv file.
I searched this file name in the codes of first three steps,but cannot find it.
Where do you generate this file?

@juliandewit
Copy link
Owner

step2_train_mass_segmenter.py
Also has a predict phase.
This one will generate this file.

You can also leave it out. It will not change the score very much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants