AssertionError: Range subprocess failed (exit code: 1) #63
Comments
https://github.com/roytseng-tw/Detectron.pytorch/blob/master/lib/utils/subprocess.py#L71 |
Thank @roytseng-tw for fastly reply, I modified as your suggested link, the notify ImportError: No module named cv2 is fixed. But the problem about subprocess is still exist. DEBUG: Run into test_net_data_set() |
I have modified the command in https://github.com/roytseng-tw/Detectron.pytorch/blob/master/lib/utils/subprocess.py#L71 by adding --multi-gpu-testing', but there another problem: |
You should checkout the inference section in README. |
Thank @roytseng-tw , actually, I have passed --multi-gpu-testing in my command: But in https://github.com/roytseng-tw/Detectron.pytorch/blob/1833c71a62e389d2b5f873f40a914c5a47bdd8a2/lib/utils/subprocess.py#L71 , |
You should not change anything except python --> python3. |
Update: @roytseng-tw yes, I keep everything as you said. I have tried evaluate in only one GPU, it run successfully, but when I pass |
You should not pass |
Here is error, INFO test_engine.py: 211: Wrote detections to: /mnt/hdd/tung/aim_2018/try_model/mask-rcnn.pytorch/test_output/detections.pkl To clear, |
First, you don't need to change anything in the config file if you use Below is my deduction:
|
Yes, I am on machine with 8 GPUs, but I am only allowed to run on 2 GPUs, so I want to use only 2 GPUs 5 and 6. I ran as you said: INFO test_engine.py: 281: im_detect: range [626, 1250] of 5000: 1246/1250 0.292s + 0.021s (eta: 0:00:01) INFO test_engine.py: 211: Wrote detections to: /mnt/hdd/tung/aim_2018/try_model/mask-rcnn.pytorch/Outputs/e2e_mask_rcnn_R-50-C4_1x/May17-21-45-19_slspGPU6_step/test/detections.pkl |
I find a weird thing in your log |
I only add this line in head of test_net.py file and keep everything: I see that process is divided into 2 subprocess, first range is [1, 2500], but it fails in assert like log below. Here my new full log: |
You should not add |
@roytseng-tw if I don't add this, the command |
What's the output of this for you
|
@roytseng-tw Output is 2 |
I try command: |
@roytseng-tw I have run sucessfully, thank you, I don't know why it not detect GPU device ID when I use |
Hi @roytseng-tw
When I evaluating training result, I face a problem like below:
INFO subprocess.py: 129: # ---------------------------------------------------------------------------- #
INFO subprocess.py: 131: stdout of subprocess 0 with range [1, 1250]
INFO subprocess.py: 133: # ---------------------------------------------------------------------------- #
Traceback (most recent call last):
File "/mnt/hdd/tung/aim_2018/try_model/mask-rcnn.pytorch/tools/test_net.py", line 4, in
import cv2
ImportError: No module named cv2
Traceback (most recent call last):
File "tools/test_net.py", line 119, in
check_expected_results=True)
File "/mnt/hdd/tung/aim_2018/try_model/mask-rcnn.pytorch/lib/core/test_engine.py", line 128, in run_inference
all_results = result_getter()
File "/mnt/hdd/tung/aim_2018/try_model/mask-rcnn.pytorch/lib/core/test_engine.py", line 108, in result_getter
multi_gpu=multi_gpu_testing
File "/mnt/hdd/tung/aim_2018/try_model/mask-rcnn.pytorch/lib/core/test_engine.py", line 155, in test_net_on_dataset
args, dataset_name, proposal_file, num_images, output_dir
File "/mnt/hdd/tung/aim_2018/try_model/mask-rcnn.pytorch/lib/core/test_engine.py", line 187, in multi_gpu_test_net_on_dataset
args.load_ckpt, args.load_detectron, opts
File "/mnt/hdd/tung/aim_2018/try_model/mask-rcnn.pytorch/lib/utils/subprocess.py", line 109, in process_in_parallel
log_subprocess_output(i, p, output_dir, tag, start, end)
File "/mnt/hdd/tung/aim_2018/try_model/mask-rcnn.pytorch/lib/utils/subprocess.py", line 147, in log_subprocess_output
assert ret == 0, 'Range subprocess failed (exit code: {})'.format(ret)
AssertionError: Range subprocess failed (exit code: 1)
I have installed opencv and successfully imported cv2, but i don't know what is caused to this problem. I have tried solution in https://github.com/facebookresearch/Detectron/issues/349 but it is not helpful. In config file e2e_mask_rcnn_R-50-C4_1x.yaml, I just re-config NUM_GPUS and keep original everything. Can you tell me what is this problem ?
The command that I ran:
python3 tools/test_net.py --dataset coco2017 --cfg configs/e2e_mask_rcnn_R-50-C4_1x.yaml --load_ckpt Outputs/e2e_mask_rcnn_R-50-C4_1x/May17-21-45-19_slspGPU6_step/ckpt/model_step89999.pth --multi-gpu-testing --output_dir Output_val
System information
The text was updated successfully, but these errors were encountered: