Multi-GPU utilization #31

DeadpanZiao · 2021-10-15T02:50:17Z

I am looking for a way to correctly applying multi-GPU to train and inference.

I am now using multi-GPU to inference a large volume data separately. The labels generated by multi GPU turns out independent, and it's still confusing how I can combine them into a large one.

Really appreciate any help or insights.

mjanusz · 2021-10-15T10:38:28Z

To utilize multiple GPUs during inference, generating independent segmentation for partially overlapping subvolumes is the typical thing to do. These need to be reconciled and assembled into a global segmentation. One generates ID equivalences by looking at the subvolume overlap area, computes the connected components of the resulting graph (e.g. using a union-find data structure), and writes the individual subvolumes into some volumetric storage system (e.g. using TensorStore) while relabelling the segments according to the CCs.

For training, the code is currently configured to use asynchronous SGD. One can start a process as a 'parameter server', and then some number of independent workers (one GPU each; can be on different machines) which connect to it and train together as a flock.

DeadpanZiao · 2021-10-19T06:25:37Z

Really appreciate the reply!

I have done making some overlapping labels. I checked there were some resegmentation function in the repo. I am not sure if they are in the way you mentioned here, and I find no scripts to starting them. It would be even better if you could provide a script to run reconcile.

Thanks again for the explanation.

mjanusz · 2021-10-19T14:40:41Z

We unfortunately don't have this functionality in the main ffn repo, but I'm aware of at least one third party solution (https://github.com/Hanyu-Li/klab_utils/tree/master/klab_utils/ffn/reconciliation) which you might be able to use. IIUC, the process is to run remap.py (unique IDs per subvolume), find_graph.py (equivalences from overlapping subvolumes) and agglomerate_cv.py (update IDs according to the graph built in the previous step).

DeadpanZiao · 2021-10-26T01:47:14Z

Really appreciate it. I will spend some time run the code and try to implement on our labels. By the way, I am wondering how you guys generate large labels. Large scale volume labels seem to be highly dependent on and limited to memory. We have got 8 Tesla-v100 GPU, but as far as I learned, it takes months to acquire the whole labels. Is this the same problem for you as well?

Kind regards.

mjanusz added the question label Oct 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-GPU utilization #31

Multi-GPU utilization #31

DeadpanZiao commented Oct 15, 2021

mjanusz commented Oct 15, 2021

DeadpanZiao commented Oct 19, 2021

mjanusz commented Oct 19, 2021

DeadpanZiao commented Oct 26, 2021

Multi-GPU utilization #31

Multi-GPU utilization #31

Comments

DeadpanZiao commented Oct 15, 2021

mjanusz commented Oct 15, 2021

DeadpanZiao commented Oct 19, 2021

mjanusz commented Oct 19, 2021

DeadpanZiao commented Oct 26, 2021