-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
eval gets stuck indefinitely #16
Comments
Hey @kaushikb258, how long did you wait? The CRF for potsdam slices can take a few minutes to complete |
I ran the eval code on Potsdam for over 4-5 hours and still no result (the code is still running). Even training didn't take this long. |
Yes that definitely sounds like its stuck appreciate the context here. Perhaps set the num workers in this line STEGO/src/eval_segmentation.py Line 118 in d1341b9
To something small and see if that stops you from getting stuck. If that's the case its probably due to starvation or something |
I decreased the num workers, but no progress. So I made a serial code for CRF and this works now. Attaching below if it can help others... (github is screwing up the indendation!) def batched_crf(img_tensor, prob_tensor): |
It can be avoided by simply replacing STEGO/src/eval_segmentation.py Line 118 in d1341b9
with from multiprocessing import get_context
with get_context('spawn').Pool(cfg.num_workers + 5) as pool:
... |
The eval_segmentation.py gets stuck for potsdam data. The issue is in batched_crf() in the following line:
outputs = pool.map(_apply_crf, zip(img_tensor.detach().cpu(), prob_tensor.detach().cpu()))
The code never proceeds further. One proc is waiting for others indefinitely. Any suggestions?
The text was updated successfully, but these errors were encountered: