Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error when trying to train on the yvos dataset #7

Closed
ghost opened this issue Aug 1, 2019 · 4 comments
Closed

error when trying to train on the yvos dataset #7

ghost opened this issue Aug 1, 2019 · 4 comments

Comments

@ghost
Copy link

ghost commented Aug 1, 2019

using the command taken from drsn/do_train_eval and adapted i tried to train on the yvos dataset to test the algorithm

the comman i used was:
"CUDA_VISIBLE_DEVICES=0 python train.py --model_save_path experiments/snapshots --max_iters 100000 --decayat 60000 --learning_rate 2e-5 --batch_size 64 --input_size 256,256"
my default python version is 3.6.8, gcc version 5.5, cuda version 9.0 and inplace_abn version 1.0.3

the error i received was the following:


Traceback (most recent call last):
File "train.py", line 23, in
from model.drsn import DRSN
File "/home/orel/projects/pts/PTSNet-master/drsn/model/drsn.py", line 9, in
from inplace_ABN import InPlaceABN, InPlaceABNSync
File "/home/orel/projects/pts/PTSNet-master/drsn/model/inplace_ABN/init.py", line 1, in
from .bn import ABN, InPlaceABN, InPlaceABNSync
File "/home/orel/projects/pts/PTSNet-master/drsn/model/inplace_ABN/bn.py", line 10, in
from .functions import *
File "/home/orel/projects/pts/PTSNet-master/drsn/model/inplace_ABN/functions.py", line 22, in
extra_cuda_cflags=["--expt-extended-lambda"])
File "/home/orel/projects/pts/ptsenv/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 644, in load
is_python_module)
File "/home/orel/projects/pts/ptsenv/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 802, in _jit_compile
if baton.try_acquire():
File "/home/orel/projects/pts/ptsenv/lib/python3.6/site-packages/torch/utils/file_baton.py", line 36, in try_acquire
self.fd = os.open(self.lock_file_path, os.O_CREAT | os.O_EXCL)
FileNotFoundError: [Errno 2] No such file or directory: '/home/orel/projects/pts/PTSNet-master/drsn/model/inplace_ABN/build/lock'


I don't understand what file is missing and how to replace/construct it

I would appreciate your help

Thanks,

Orel

@sydney0zq
Copy link
Owner

You could firstly try to delete the /home/orel/projects/pts/PTSNet-master/drsn/model/inplace_ABN/build directory and then run drsn again.

If it doesn't work, please paste your error here and we could discuss further.

@ghost
Copy link
Author

ghost commented Aug 4, 2019

Thanks for your quick reply,

the directory you are asking me to delete does not exist in the file system i have (as downloaded)
the error i receive is the same as the one pasted above.
do you have a suggestion how to proceed?

again, thank you

@sydney0zq
Copy link
Owner

sydney0zq commented Aug 5, 2019

Sorry I forget to add the build directory in the code, and I have fixed it now. You could manually run mkdir /home/orel/projects/pts/PTSNet-master/drsn/model/inplace_ABN/build command to create the build directory.

It the error still happens, please try to check your inplace_ABN module only. I believe it is just because some minor issues like nonexist directory or the wrong build path.

@ghost
Copy link
Author

ghost commented Aug 7, 2019

Thank you!
the fix worked

@ghost ghost closed this as completed Aug 7, 2019
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant