New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
【Training stage1(VAE-GAN)】Cannot train with the data i've prepared. #3
Comments
According to williamfzc/stagesepx#150 , The function name of skimag has been changed. |
Scond phot
warnings.warn( Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. Using /root/.cache/torch_extensions/py38_cu113 as PyTorch extensions root... synchronize() File "/workspace/DriveGAN_code/latent_decoder_model/distributed.py", line 63, in synchronize File "/workspace/DriveGAN_code/latent_decoder_model/distributed.py", line 63, in synchronize dist.barrier() File "/opt/conda/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 2709, in barrier File "/opt/conda/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 2709, in barrier RuntimeError: NCCL error in: ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:957, invalid usage, NCCL version 21.0.3 ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 916) of binary: /opt/conda/bin/python
|
If we train the data we've prepared with single GPU , we need change Also prepared dataset need following directory structure.
|
I solved this issue. Thank u. |
Hello.
I tried VAE-GAN training with my data (256×256pix).
However , after typing
./scripts/train.sh ./img_data/data1
displayed following errors.There are about 17,000 png images in the data1 folder.
Also --batch in
latent_decoder_model/script/train.sh
is changed 6 to 1.The development environment is a docker container provided by nvidia.
https://ngc.nvidia.com/catalog/containers/nvidia:pytorch
OS is ubuntu20.04LTS.
Graphics board is RTX3080.
The specified torch did not work with this graphics board, so I had to re-install the torch.
pip3 install torch==1.10.1+cu113 torchvision==0.11.2+cu113 torchaudio==0.10.1+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html
I'm not from an English-speaking country, so my writing may be poor, but I hope you can help me with a solution.
The text was updated successfully, but these errors were encountered: