New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] RealBasicVSR training Error : torch.distributed.elastic.multiprocessing.api:failed #1478
Comments
Hey @gihwan-kim, this seems to be a |
Its ver 8.0.2 |
@gihwan-kim, Can you attempt to install |
Thank you for kind reply!. and ffmpeg version is 4.3
I think too much iteration need large memory. Should i have to change training configuration? This is my output of
|
@gihwan-kim, training RealBasicVSR with default config needs at least 17201MB of GPU memory, and you can refer to the I think you can try to change |
I could solve by changing
My UDM data file path is 'data/UDM10/BIx4/archpeople/000.png'. |
@gihwan-kim For the master branch, you need to rename your images of datasets. You can use a simple script to resolve it. Like this:
For the 1.x or dev-1.x branch, if your UDM data file path is 'data/UDM10/BIx4/archpeople/000.png', you can simply add a parameter like |
Thank you for kindness help!
Is it blur4 data is BIx4 ? And Blx4 mean bicubic interpolation x4 downsampling? |
Blur4 is not BIx4 or BDx4. BIx4 and BDx4 are both pre-processed using MATLAB. For BDx4, you need to use MATLAB script https://github.com/ckkelvinchan/BasicVSR-IconVSR/blob/main/BD_degradation.m . For BIx4, you can simply use |
I will check |
Prerequisite
Task
I'm using the official example scripts/configs for the officially supported tasks/models/datasets.
Branch
master branch https://github.com/open-mmlab/mmediting
Environment
sys.platform: linux
Python: 3.7.11 (default, Jul 27 2021, 14:32:16) [GCC 7.5.0]
CUDA available: True
GPU 0: NVIDIA GeForce RTX 2080 Ti
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 11.2, V11.2.152
GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
PyTorch: 1.10.2
PyTorch compiling details: PyTorch built with:
TorchVision: 0.11.3
OpenCV: 4.5.4
MMCV: 1.5.0
MMCV Compiler: GCC 7.3
MMCV CUDA Compiler: 11.3
MMEditing: 0.16.0+7b3a8bd
Reproduces the problem - code sample
I just training again.
Reproduces the problem - command or script
Reproduces the problem - error message
Additional information
I'm trying to train a Real BasicVSR to check if it trains in my environment.
I have similar issue like issue. But that issue isn't resolved yet.
The text was updated successfully, but these errors were encountered: