Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-GPU utilization #31

Open
DeadpanZiao opened this issue Oct 15, 2021 · 4 comments
Open

Multi-GPU utilization #31

DeadpanZiao opened this issue Oct 15, 2021 · 4 comments
Labels

Comments

@DeadpanZiao
Copy link

I am looking for a way to correctly applying multi-GPU to train and inference.

I am now using multi-GPU to inference a large volume data separately. The labels generated by multi GPU turns out independent, and it's still confusing how I can combine them into a large one.

Really appreciate any help or insights.

@mjanusz
Copy link
Collaborator

mjanusz commented Oct 15, 2021

To utilize multiple GPUs during inference, generating independent segmentation for partially overlapping subvolumes is the typical thing to do. These need to be reconciled and assembled into a global segmentation. One generates ID equivalences by looking at the subvolume overlap area, computes the connected components of the resulting graph (e.g. using a union-find data structure), and writes the individual subvolumes into some volumetric storage system (e.g. using TensorStore) while relabelling the segments according to the CCs.

For training, the code is currently configured to use asynchronous SGD. One can start a process as a 'parameter server', and then some number of independent workers (one GPU each; can be on different machines) which connect to it and train together as a flock.

@DeadpanZiao
Copy link
Author

Really appreciate the reply!

I have done making some overlapping labels. I checked there were some resegmentation function in the repo. I am not sure if they are in the way you mentioned here, and I find no scripts to starting them. It would be even better if you could provide a script to run reconcile.

Thanks again for the explanation.

@mjanusz
Copy link
Collaborator

mjanusz commented Oct 19, 2021

We unfortunately don't have this functionality in the main ffn repo, but I'm aware of at least one third party solution (https://github.com/Hanyu-Li/klab_utils/tree/master/klab_utils/ffn/reconciliation) which you might be able to use. IIUC, the process is to run remap.py (unique IDs per subvolume), find_graph.py (equivalences from overlapping subvolumes) and agglomerate_cv.py (update IDs according to the graph built in the previous step).

@DeadpanZiao
Copy link
Author

Really appreciate it. I will spend some time run the code and try to implement on our labels. By the way, I am wondering how you guys generate large labels. Large scale volume labels seem to be highly dependent on and limited to memory. We have got 8 Tesla-v100 GPU, but as far as I learned, it takes months to acquire the whole labels. Is this the same problem for you as well?

Kind regards.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants