eval gets stuck indefinitely #16

kaushikb258 · 2022-05-16T17:41:36Z

The eval_segmentation.py gets stuck for potsdam data. The issue is in batched_crf() in the following line:

outputs = pool.map(_apply_crf, zip(img_tensor.detach().cpu(), prob_tensor.detach().cpu()))

The code never proceeds further. One proc is waiting for others indefinitely. Any suggestions?

mhamilton723 · 2022-05-16T19:24:05Z

Hey @kaushikb258, how long did you wait? The CRF for potsdam slices can take a few minutes to complete

kaushikb258 · 2022-05-17T02:14:09Z

I ran the eval code on Potsdam for over 4-5 hours and still no result (the code is still running). Even training didn't take this long.

mhamilton723 · 2022-05-17T04:06:43Z

Yes that definitely sounds like its stuck appreciate the context here. Perhaps set the num workers in this line

STEGO/src/eval_segmentation.py

Line 118 in d1341b9

with Pool(cfg.num_workers + 5) as pool:

To something small and see if that stops you from getting stuck. If that's the case its probably due to starvation or something

kaushikb258 · 2022-05-17T17:18:20Z

I decreased the num workers, but no progress. So I made a serial code for CRF and this works now. Attaching below if it can help others... (github is screwing up the indendation!)

def batched_crf(img_tensor, prob_tensor):
batch_size = list(img_tensor.size())[0]
img_tensor_cpu = img_tensor.detach().cpu()
prob_tensor_cpu = prob_tensor.detach().cpu()
out = []
for i in range(batch_size):
out_ = dense_crf(img_tensor_cpu[i], prob_tensor_cpu[i])
out.append(out_)
return torch.cat([torch.from_numpy(arr).unsqueeze(0) for arr in out], dim=0)

Supgb · 2022-06-28T14:05:27Z

It can be avoided by simply replacing

STEGO/src/eval_segmentation.py

Line 118 in d1341b9

with Pool(cfg.num_workers + 5) as pool:

with

from multiprocessing import get_context

with get_context('spawn').Pool(cfg.num_workers + 5) as pool:
    ...

Supgb mentioned this issue Jul 14, 2022

The evaluation is blocked when passing the CRT module #37

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

eval gets stuck indefinitely #16

eval gets stuck indefinitely #16

kaushikb258 commented May 16, 2022

mhamilton723 commented May 16, 2022

kaushikb258 commented May 17, 2022

mhamilton723 commented May 17, 2022 •

edited

Loading

kaushikb258 commented May 17, 2022 •

edited

Loading

Supgb commented Jun 28, 2022 •

edited

Loading

eval gets stuck indefinitely #16

eval gets stuck indefinitely #16

Comments

kaushikb258 commented May 16, 2022

mhamilton723 commented May 16, 2022

kaushikb258 commented May 17, 2022

mhamilton723 commented May 17, 2022 • edited Loading

kaushikb258 commented May 17, 2022 • edited Loading

Supgb commented Jun 28, 2022 • edited Loading

mhamilton723 commented May 17, 2022 •

edited

Loading

kaushikb258 commented May 17, 2022 •

edited

Loading

Supgb commented Jun 28, 2022 •

edited

Loading