The error when training #39

daixiaolei623 · 2021-10-09T19:39:27Z

Thank you for your great work.
However, when i train the maskformer_swin_large_IN21k_384_bs16_160k_res640.yaml using the commend:
./train_net.py --num-gpus 2 --config-file configs/ade20k-150/swin/maskformer_swin_large_IN21k_384_bs16_160k_res640.yaml .

I got the following errors:
`MaskFormer Training Script.

This script is a simplified version of the training script in detectron2/tools.
: No such file or directory
import-im6.q16: not authorized copy' @ error/constitute.c/WriteImage/1037. import-im6.q16: not authorized itertools' @ error/constitute.c/WriteImage/1037.
import-im6.q16: not authorized logging' @ error/constitute.c/WriteImage/1037. import-im6.q16: not authorized os' @ error/constitute.c/WriteImage/1037.
from: can't read /var/mail/collections
from: can't read /var/mail/typing
import-im6.q16: not authorized torch' @ error/constitute.c/WriteImage/1037. import-im6.q16: not authorized comm' @ error/constitute.c/WriteImage/1037.
from: can't read /var/mail/detectron2.checkpoint
from: can't read /var/mail/detectron2.config
from: can't read /var/mail/detectron2.data
from: can't read /var/mail/detectron2.engine
./train_net.py: line 21: syntax error near unexpected token (' ./train_net.py: line 21: from detectron2.evaluation import ('`

Could you please tell me what is the problem and how to solve it?
thank you very much!

The text was updated successfully, but these errors were encountered:

bowenc0221 · 2021-10-09T19:50:12Z

Add python

daixiaolei623 · 2021-10-09T23:02:21Z

@bowenc0221
Thank you.
However, i have add python and install cuda-11.1, i run python ./train_net.py --num-gpus 2 --config-file /home/dai/code/semantic_segmentation/27/MaskFormer-master/configs/ade20k-150/swin/maskformer_swin_large_IN21k_384_bs16_160k_res640.yaml, and got the following error:

`Command Line Args: Namespace(config_file='/home/dai/code/semantic_segmentation/27/MaskFormer-master/configs/ade20k-150/swin/maskformer_swin_large_IN21k_384_bs16_160k_res640.yaml', dist_url='tcp://127.0.0.1:49152', eval_only=False, machine_rank=0, num_gpus=2, num_machines=1, opts=[], resume=False)
/home/dai/TOOL/anaconda3/envs/maskfromer/lib/python3.7/site-packages/torch/cuda/init.py:52: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 101: invalid device ordinal (Triggered internally at /pytorch/c10/cuda/CUDAFunctions.cpp:109.)
return torch._C._cuda_getDeviceCount() > 0
Traceback (most recent call last):
File "./train_net.py", line 270, in
args=(args,),
File "/home/dai/TOOL/anaconda3/envs/maskfromer/lib/python3.7/site-packages/detectron2/engine/launch.py", line 79, in launch
daemon=False,
File "/home/dai/TOOL/anaconda3/envs/maskfromer/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/home/dai/TOOL/anaconda3/envs/maskfromer/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes
while not context.join():
File "/home/dai/TOOL/anaconda3/envs/maskfromer/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 150, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 1 terminated with the following error:
Traceback (most recent call last):
File "/home/dai/TOOL/anaconda3/envs/maskfromer/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
fn(i, *args)
File "/home/dai/TOOL/anaconda3/envs/maskfromer/lib/python3.7/site-packages/detectron2/engine/launch.py", line 95, in _distributed_worker
assert torch.cuda.is_available(), "cuda is not available. Please check your installation."
AssertionError: cuda is not available. Please check your installation.`

daixiaolei623 · 2021-10-10T14:41:32Z

@bowenc0221
thank you , i have solved the above error, but my GPU is 1080Ti, which is out of memory, i want to train on CPU, my CPU is 64G,
Could you please tell me how to train it on CPU?
thank you.

bowenc0221 · 2021-10-12T18:30:11Z

You can try adding MODEL.DEVICE 'cpu' at the end of your command, but I have never tested it with CPU.

bowenc0221 closed this as completed Oct 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The error when training #39

The error when training #39

daixiaolei623 commented Oct 9, 2021

bowenc0221 commented Oct 9, 2021

daixiaolei623 commented Oct 9, 2021

daixiaolei623 commented Oct 10, 2021

bowenc0221 commented Oct 12, 2021

The error when training #39

The error when training #39

Comments

daixiaolei623 commented Oct 9, 2021

bowenc0221 commented Oct 9, 2021

daixiaolei623 commented Oct 9, 2021

daixiaolei623 commented Oct 10, 2021

bowenc0221 commented Oct 12, 2021