-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi-GPU utilization #31
Comments
To utilize multiple GPUs during inference, generating independent segmentation for partially overlapping subvolumes is the typical thing to do. These need to be reconciled and assembled into a global segmentation. One generates ID equivalences by looking at the subvolume overlap area, computes the connected components of the resulting graph (e.g. using a union-find data structure), and writes the individual subvolumes into some volumetric storage system (e.g. using TensorStore) while relabelling the segments according to the CCs. For training, the code is currently configured to use asynchronous SGD. One can start a process as a 'parameter server', and then some number of independent workers (one GPU each; can be on different machines) which connect to it and train together as a flock. |
Really appreciate the reply! I have done making some overlapping labels. I checked there were some resegmentation function in the repo. I am not sure if they are in the way you mentioned here, and I find no scripts to starting them. It would be even better if you could provide a script to run reconcile. Thanks again for the explanation. |
We unfortunately don't have this functionality in the main ffn repo, but I'm aware of at least one third party solution (https://github.com/Hanyu-Li/klab_utils/tree/master/klab_utils/ffn/reconciliation) which you might be able to use. IIUC, the process is to run remap.py (unique IDs per subvolume), find_graph.py (equivalences from overlapping subvolumes) and agglomerate_cv.py (update IDs according to the graph built in the previous step). |
Really appreciate it. I will spend some time run the code and try to implement on our labels. By the way, I am wondering how you guys generate large labels. Large scale volume labels seem to be highly dependent on and limited to memory. We have got 8 Tesla-v100 GPU, but as far as I learned, it takes months to acquire the whole labels. Is this the same problem for you as well? Kind regards. |
I am looking for a way to correctly applying multi-GPU to train and inference.
I am now using multi-GPU to inference a large volume data separately. The labels generated by multi GPU turns out independent, and it's still confusing how I can combine them into a large one.
Really appreciate any help or insights.
The text was updated successfully, but these errors were encountered: