Encountered errors while executing training process #2 #39

Ma5onic · 2022-08-22T07:55:07Z

(Using Leaderboard_B)
First I was stuck solving the environment and I let it sit for 30 min, but conda never finished creating the env from the yml.
Because I was using a cloud instance, I didn't have time to wait and I did this instead:

conda create -n mdx-net
conda update conda
conda config --add channels conda-forge
conda activate mdx-net
sudo apt-get install soundstretch
python -m pip install -r requirements.txt
python src/utils/data_augmentation.py --data_dir /real/path/to/musdbhq/ --train True --test True

It seems that the model doesn't allow me to train it with songs that don't contain vocals.

python src/utils/data_augmentation.py --data_dir /home/ubuntu/mdx-files/musdb/ --train True --test True
 10%|███████████████▉                                                                                                                                                     | 11/114 [01:13<11:25,  6.65s/it]
Traceback (most recent call last):
  File "src/utils/data_augmentation.py", line 111, in <module>
    main(parser.parse_args())
  File "src/utils/data_augmentation.py", line 30, in main
    save_shifted_dataset(p, t, data_dir, 'train')
  File "src/utils/data_augmentation.py", line 92, in save_shifted_dataset
    source = load_wav(in_path.joinpath(s_name+'.wav'))
  File "src/utils/data_augmentation.py", line 102, in load_wav
    return sf.read(path, samplerate=sr, dtype='float32')[0].T
  File "/home/ubuntu/.local/lib/python3.8/site-packages/soundfile.py", line 256, in read
    with SoundFile(file, 'r', samplerate, channels,
  File "/home/ubuntu/.local/lib/python3.8/site-packages/soundfile.py", line 629, in __init__
    self._file = self._open(file, mode_int, closefd)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/soundfile.py", line 1183, in _open
    _error_check(_snd.sf_error(file_ptr),
  File "/home/ubuntu/.local/lib/python3.8/site-packages/soundfile.py", line 1357, in _error_check
    raise RuntimeError(prefix + _ffi.string(err_str).decode('utf-8', 'replace'))
RuntimeError: Error opening '/home/ubuntu/mdx-files/musdb/train/Artificial Intelligence - Native Instruments/vocals.wav': System error.

I deleted the songs that didn't contain vocals, then the data augmentation succeeded, but all attempts to train failed and I didn't have time to do debugging in the cloud GPU instance.

Here is the output from: python run.py experiment=multigpu_other model=ConvTDFNet_other

/usr/lib/python3/dist-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: /usr/lib/python3/dist-packages/torchvision/image.so: undefined symbol: _ZNK3c106IValue23reportToTensorTypeErrorEv
  warn(f"Failed to load image Python extension: {e}")
Traceback (most recent call last):
  File "run.py", line 7, in <module>
    from pytorch_lightning.utilities import rank_zero_info
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pytorch_lightning/__init__.py", line 20, in <module>
    from pytorch_lightning import metrics  # noqa: E402
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pytorch_lightning/metrics/__init__.py", line 15, in <module>
    from pytorch_lightning.metrics.classification import (  # noqa: F401
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pytorch_lightning/metrics/classification/__init__.py", line 14, in <module>
    from pytorch_lightning.metrics.classification.accuracy import Accuracy  # noqa: F401
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pytorch_lightning/metrics/classification/accuracy.py", line 16, in <module>
    from torchmetrics import Accuracy as _Accuracy
  File "/home/ubuntu/.local/lib/python3.8/site-packages/torchmetrics/__init__.py", line 14, in <module>
    from torchmetrics import functional  # noqa: E402
  File "/home/ubuntu/.local/lib/python3.8/site-packages/torchmetrics/functional/__init__.py", line 14, in <module>
    from torchmetrics.functional.audio.pit import permutation_invariant_training, pit, pit_permutate
  File "/home/ubuntu/.local/lib/python3.8/site-packages/torchmetrics/functional/audio/__init__.py", line 26, in <module>
    from torchmetrics.functional.audio.pesq import perceptual_evaluation_speech_quality  # noqa: F401
  File "/home/ubuntu/.local/lib/python3.8/site-packages/torchmetrics/functional/audio/pesq.py", line 20, in <module>
    import pesq as pesq_backend
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pesq/__init__.py", line 5, in <module>
    from ._pesq import pesq, pesq_batch
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pesq/_pesq.py", line 8, in <module>
    from .cypesq import cypesq, cypesq_retvals, cypesq_error_message as pesq_error_message
  File "__init__.pxd", line 238, in init cypesq
ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 96 from C header, got 80 from PyObject

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03    Driver Version: 510.47.03    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A100-PCI...  On   | 00000000:07:00.0 Off |                    0 |
| N/A   35C    P0    36W / 250W |      0MiB / 40960MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA A100-PCI...  On   | 00000000:08:00.0 Off |                    0 |
| N/A   34C    P0    33W / 250W |      0MiB / 40960MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+

The text was updated successfully, but these errors were encountered:

KimberleyJensen · 2022-08-22T09:37:24Z

@Ma5onic try pip install --upgrade numpy

Satisfy256 · 2022-09-16T16:51:54Z

Had the same issue. Fixed by installing old dependencies from around 2021.
requirements.txt

Ma5onic · 2022-09-24T11:13:19Z

@Ma5onic try pip install --upgrade numpy

@KimberleyJensen Thanks, but the newest version of numpy is incompatible.

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
numba 0.56.2 requires setuptools<60, but you have setuptools 63.4.1 which is incompatible.
hydra-optuna-sweeper 1.1.0.dev2 requires numpy<1.20.0, but you have numpy 1.23.3 which is incompatible.

Maybe I should have used conda to update instead. Thanks anyways.

@Satisfy256 Ouhhh! interesting, okay I'll nuke my current install and start over lol.

Had the same issue. Fixed by installing old dependencies from around 2021. requirements.txt

Ma5onic · 2022-09-25T00:45:53Z

@KimberleyJensen, you're onto something though, the current requirements.txt seems to also contain an issue related the one you mentioned here. The requirements.txt that @Satisfy256 mentioned has demucs<=2.0.3 listed as a dependency... that file might be a little hidden gem because I could not find it it the committed file history:
https://github.com/kuielab/mdx-net/commits/main/requirements.txt
Same with the leaderboard B branch/tree https://github.com/kuielab/mdx-net/commits/Leaderboard_B/requirements.txt

Still waiting for conda to solve the environment 😢

Satisfy256 · 2022-09-25T07:48:14Z

@Ma5onic I modified the requirements.txt to use old versions. I tested it out and it works for me in Ubuntu 20.04

Ma5onic · 2022-09-29T20:55:16Z

@Satisfy256 okay, sick. That gives me hope, i'll start from scratch and try again.

@Ma5onic I modified the requirements.txt to use old versions. I tested it out and it works for me in Ubuntu 20.04

Ma5onic · 2022-09-30T06:25:51Z

yay! it works!!!
Thank you very much

Ma5onic · 2022-10-01T02:23:23Z

Linux users with rtx cards, or anyone using a cloud instances will encounter dependency issues unrelated to the solution above. The pytorch landing page shows how the commands differ based on your OS/env

Ma5onic changed the title ~~Encountered an error while executing training process~~ Encountered errors while executing training process 2 Aug 22, 2022

Ma5onic changed the title ~~Encountered errors while executing training process 2~~ Encountered errors while executing training process #2 Aug 22, 2022

Ma5onic closed this as completed Sep 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Encountered errors while executing training process #2 #39

Encountered errors while executing training process #2 #39

Ma5onic commented Aug 22, 2022

KimberleyJensen commented Aug 22, 2022

Satisfy256 commented Sep 16, 2022

Ma5onic commented Sep 24, 2022 •

edited

Ma5onic commented Sep 25, 2022 •

edited

Satisfy256 commented Sep 25, 2022

Ma5onic commented Sep 29, 2022

Ma5onic commented Sep 30, 2022

Ma5onic commented Oct 1, 2022

Encountered errors while executing training process #2 #39

Encountered errors while executing training process #2 #39

Comments

Ma5onic commented Aug 22, 2022

KimberleyJensen commented Aug 22, 2022

Satisfy256 commented Sep 16, 2022

Ma5onic commented Sep 24, 2022 • edited

Ma5onic commented Sep 25, 2022 • edited

Satisfy256 commented Sep 25, 2022

Ma5onic commented Sep 29, 2022

Ma5onic commented Sep 30, 2022

Ma5onic commented Oct 1, 2022

Ma5onic commented Sep 24, 2022 •

edited

Ma5onic commented Sep 25, 2022 •

edited