Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

RuntimeError: Offset past EOF #54

Closed
ntyoshi opened this issue Feb 20, 2021 · 8 comments
Closed

RuntimeError: Offset past EOF #54

ntyoshi opened this issue Feb 20, 2021 · 8 comments

Comments

@ntyoshi
Copy link

ntyoshi commented Feb 20, 2021

Hi,

I'm trying to reproduce your model.
I got an error when I started training on GPUs with launch_valentini.sh.
The error was 'Offset past EOF' but I'm not familiar with the error.
I didn't change conf/conf.yaml except for output directory of logs.
Can you give me any advices I should check next step?

Thank you.

Script output:

$ bash launch_valentini.sh
[2021-02-19 21:30:45,199][__main__][INFO] - For logs, checkpoints and samples check /data/workspace/ntyoshi/outputs/exp_bandmask=0.2,demucs.causal=1,demucs.hidden=48,demucs.resample=4,dset=valentini,remix=1,segment=4.5,shift=8000,shift_same=True,stft_loss=True,stride=0.5
[2021-02-19 21:30:45,719][denoiser.executor][INFO] - Starting 1 worker processes for DDP.
[2021-02-19 21:30:46,017][__main__][INFO] - For logs, checkpoints and samples check /data/workspace/ntyoshi/outputs/exp_bandmask=0.2,demucs.causal=1,demucs.hidden=48,demucs.resample=4,dset=valentini,remix=1,segment=4.5,shift=8000,shift_same=True,stft_loss=True,stride=0.5
[2021-02-19 21:30:49,350][denoiser.solver][INFO] - ----------------------------------------------------------------------
[2021-02-19 21:30:49,351][denoiser.solver][INFO] - Training...
[2021-02-19 21:30:49,483][__main__][ERROR] - Some error happened
Traceback (most recent call last):
  File "train.py", line 104, in main
    _main(args)
  File "train.py", line 98, in _main
    run(args)
  File "train.py", line 79, in run
    solver.train()
  File "/data/home/ntyoshi/denoiser/denoiser/solver.py", line 137, in train
    train_loss = self._run_one_epoch(epoch)
  File "/data/home/ntyoshi/denoiser/denoiser/solver.py", line 200, in _run_one_epoch
    for i, data in enumerate(logprog):
  File "/data/home/ntyoshi/denoiser/denoiser/utils.py", line 126, in __next__
    value = next(self._iterator)
  File "/home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 345, in __next__
    data = self._next_data()
  File "/home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 856, in _next_data
    return self._process_data(data)
  File "/home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 881, in _process_data
    data.reraise()
  File "/home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torch/_utils.py", line 395, in reraise
    raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/data/home/ntyoshi/denoiser/denoiser/data.py", line 96, in __getitem__
    return self.noisy_set[index], self.clean_set[index]
  File "/data/home/ntyoshi/denoiser/denoiser/audio.py", line 72, in __getitem__
    out, sr = torchaudio.load(str(file), offset=offset, num_frames=num_frames)
  File "/home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torchaudio/__init__.py", line 85, in load
    filetype=filetype,
  File "/home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torchaudio/_sox_backend.py", line 47, in load
    filetype
RuntimeError: Offset past EOF

[2021-02-19 21:30:49,532][denoiser.executor][ERROR] - Worker 0 died, killing all workers
@adefossez
Copy link
Contributor

Could you try to add some debug statements around this line: https://github.com/facebookresearch/denoiser/blob/master/denoiser/audio.py#L72

            try:
                out, sr = torchaudio.load(str(file), offset=offset, num_frames=num_frames)
            except Exception:
                print(file, examples, offset); raise

and then report the offending file? If you have ffmpeg installed, you can run ``ffprobe PATH_TO_FILE` so that i have more info to debug the issue.

@ntyoshi
Copy link
Author

ntyoshi commented Feb 22, 2021

Hi @abhshkdz,

The code you gave me output the following lines:

[2021-02-22 14:27:41,384][denoiser.solver][INFO] - Training...
/data/workspace/ntyoshi/dataset/valentini/noisy_trainset_wav/p228_301.wav 9 56000
/data/workspace/ntyoshi/dataset/valentini/noisy_trainset_wav/p239_287.wav 12 72000
/data/workspace/ntyoshi/dataset/valentini/noisy_trainset_wav/p259_464.wav 10 48000
/data/workspace/ntyoshi/dataset/valentini/noisy_trainset_wav/p279_312.wav 14 88000
/data/workspace/ntyoshi/dataset/valentini/noisy_trainset_wav/p278_021.wav 44 224000
/data/workspace/ntyoshi/dataset/valentini/noisy_trainset_wav/p250_002.wav 10 72000
/data/workspace/ntyoshi/dataset/valentini/noisy_trainset_wav/p259_006.wav 25 144000
/data/workspace/ntyoshi/dataset/valentini/noisy_trainset_wav/p276_010.wav 13 96000
[2021-02-22 14:27:41,507][__main__][ERROR] - Some error happened
Traceback (most recent call last):
  File "train.py", line 104, in main
    _main(args)
  File "train.py", line 98, in _main
    run(args)
  File "train.py", line 79, in run
    solver.train()
  File "/data/home/ntyoshi/denoiser/denoiser/solver.py", line 137, in train
    train_loss = self._run_one_epoch(epoch)
  File "/data/home/ntyoshi/denoiser/denoiser/solver.py", line 200, in _run_one_epoch
    for i, data in enumerate(logprog):
  File "/data/home/ntyoshi/denoiser/denoiser/utils.py", line 126, in __next__
    value = next(self._iterator)
  File "/home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 345, in __next__
    data = self._next_data()
  File "/home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 856, in _next_data
    return self._process_data(data)
  File "/home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 881, in _process_data
    data.reraise()
  File "/home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torch/_utils.py", line 395, in reraise
    raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/data/home/ntyoshi/denoiser/denoiser/data.py", line 96, in __getitem__
    return self.noisy_set[index], self.clean_set[index]
  File "/data/home/ntyoshi/denoiser/denoiser/audio.py", line 73, in __getitem__
    out, sr = torchaudio.load(str(file), offset=offset, num_frames=num_frames)
  File "/home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torchaudio/__init__.py", line 85, in load
    filetype=filetype,
  File "/home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torchaudio/_sox_backend.py", line 47, in load
    filetype
RuntimeError: Offset past EOF

/data/workspace/ntyoshi/dataset/valentini/noisy_trainset_wav/p254_257.wav 17 80000
/data/workspace/ntyoshi/dataset/valentini/noisy_trainset_wav/p259_016.wav 32 216000
/data/workspace/ntyoshi/dataset/valentini/noisy_trainset_wav/p279_283.wav 7 48000
[2021-02-22 14:27:41,566][denoiser.executor][ERROR] - Worker 0 died, killing all workers

And when I tried ffprove to one of the path I got before the error was happened, I got these message:

$ ffprobe /data/workspace/ntyoshi/dataset/valentini/noisy_trainset_wav/p228_301.wav
ffprobe version 4.3.1 Copyright (c) 2007-2020 the FFmpeg developers
  built with gcc 7.5.0 (crosstool-NG 1.24.0.131_87df0e6_dirty)
  configuration: --prefix=/home/ntyoshi/anaconda3/envs/denoiser --cc=/home/conda/feedstock_root/build_artifacts/ffmpeg_1602879523915/_build_env/bin/x86_64-conda-linux-gnu-cc --disable-doc --disable-openssl --enable-avresample --enable-gnutls --enable-gpl --enable-hardcoded-tables --enable-libfreetype --enable-libopenh264 --enable-libx264 --enable-pic --enable-pthreads --enable-shared --enable-static --enable-version3 --enable-zlib --enable-libmp3lame
  libavutil      56. 51.100 / 56. 51.100
  libavcodec     58. 91.100 / 58. 91.100
  libavformat    58. 45.100 / 58. 45.100
  libavdevice    58. 10.100 / 58. 10.100
  libavfilter     7. 85.100 /  7. 85.100
  libavresample   4.  0.  0 /  4.  0.  0
  libswscale      5.  7.100 /  5.  7.100
  libswresample   3.  7.100 /  3.  7.100
  libpostproc    55.  7.100 / 55.  7.100
Input #0, wav, from '/data/workspace/ntyoshi/dataset/valentini/noisy_trainset_wav/p228_301.wav':
  Metadata:
    encoder         : Lavf58.45.100
  Duration: 00:00:02.78, bitrate: 256 kb/s
    Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, 1 channels, s16, 256 kb/s

Then that after the error, here:

$ ffprobe /data/workspace/ntyoshi/dataset/valentini/noisy_trainset_wav/p254_257.wav
ffprobe version 4.3.1 Copyright (c) 2007-2020 the FFmpeg developers
  built with gcc 7.5.0 (crosstool-NG 1.24.0.131_87df0e6_dirty)
  configuration: --prefix=/home/ntyoshi/anaconda3/envs/denoiser --cc=/home/conda/feedstock_root/build_artifacts/ffmpeg_1602879523915/_build_env/bin/x86_64-conda-linux-gnu-cc --disable-doc --disable-openssl --enable-avresample --enable-gnutls --enable-gpl --enable-hardcoded-tables --enable-libfreetype --enable-libopenh264 --enable-libx264 --enable-pic --enable-pthreads --enable-shared --enable-static --enable-version3 --enable-zlib --enable-libmp3lame
  libavutil      56. 51.100 / 56. 51.100
  libavcodec     58. 91.100 / 58. 91.100
  libavformat    58. 45.100 / 58. 45.100
  libavdevice    58. 10.100 / 58. 10.100
  libavfilter     7. 85.100 /  7. 85.100
  libavresample   4.  0.  0 /  4.  0.  0
  libswscale      5.  7.100 /  5.  7.100
  libswresample   3.  7.100 /  3.  7.100
  libpostproc    55.  7.100 / 55.  7.100
Input #0, wav, from '/data/workspace/ntyoshi/dataset/valentini/noisy_trainset_wav/p254_257.wav':
  Metadata:
    encoder         : Lavf58.45.100
  Duration: 00:00:04.14, bitrate: 256 kb/s
    Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, 1 channels, s16, 256 kb/s

Sorry for the long comment!

@adefossez
Copy link
Contributor

Sorry can you do the same with passing num_workers=0? This will try to load a single file at once, which will avoid having so many errors in parallel. Also replace this line:
https://github.com/facebookresearch/denoiser/blob/master/denoiser/audio.py#L63
with for (file, file_size), examples in zip(self.files, self.num_examples): and add file_size in the print call in the except:

@ntyoshi
Copy link
Author

ntyoshi commented Feb 23, 2021

Fixed audio.py:

...
        for (file, file_size), examples in zip(self.files, self.num_examples):
            if index >= examples:
                index -= examples
                continue
            num_frames = 0
            offset = 0
            if self.length is not None:
                offset = self.stride * index
                num_frames = self.length
            try:
                out, sr = torchaudio.load(str(file), offset=offset, num_frames=num_frames)
            except Exception:
                print(file, examples, offset, file_size); raise
...

conf/config.yml:

...
# Logging and printing, and does not impact training
num_prints: 5
device: cuda
num_workers: 0
verbose: 0
show: 0   # just show the model and its size and exit
...

I got this messages:

$ bash launch_valentini.sh
[2021-02-23 15:33:19,178][__main__][INFO] - For logs, checkpoints and samples check /data/workspace/ntyoshi/outputs/exp_bandmask=0.2,demucs.causal=1,demucs.hidden=48,demucs.resample=4,dset=valentini,remix=1,segment=4.5,shift=8000,shift_same=True,stft_loss=True,stride=0.5
[2021-02-23 15:33:19,708][denoiser.executor][INFO] - Starting 1 worker processes for DDP.
[2021-02-23 15:33:19,966][__main__][INFO] - For logs, checkpoints and samples check /data/workspace/ntyoshi/outputs/exp_bandmask=0.2,demucs.causal=1,demucs.hidden=48,demucs.resample=4,dset=valentini,remix=1,segment=4.5,shift=8000,shift_same=True,stft_loss=True,stride=0.5
[2021-02-23 15:33:23,088][denoiser.solver][INFO] - ----------------------------------------------------------------------
[2021-02-23 15:33:23,088][denoiser.solver][INFO] - Training...
/data/workspace/ntyoshi/dataset/valentini/noisy_trainset_wav/p250_002.wav 10 72000 143813
[2021-02-23 15:33:23,108][__main__][ERROR] - Some error happened
Traceback (most recent call last):
  File "train.py", line 104, in main
    _main(args)
  File "train.py", line 98, in _main
    run(args)
  File "train.py", line 79, in run
    solver.train()
  File "/data/home/ntyoshi/denoiser/denoiser/solver.py", line 137, in train
    train_loss = self._run_one_epoch(epoch)
  File "/data/home/ntyoshi/denoiser/denoiser/solver.py", line 200, in _run_one_epoch
    for i, data in enumerate(logprog):
  File "/data/home/ntyoshi/denoiser/denoiser/utils.py", line 126, in __next__
    value = next(self._iterator)
  File "/home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 345, in __next__
    data = self._next_data()
  File "/home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 385, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "/home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/data/home/ntyoshi/denoiser/denoiser/data.py", line 96, in __getitem__
    return self.noisy_set[index], self.clean_set[index]
  File "/data/home/ntyoshi/denoiser/denoiser/audio.py", line 74, in __getitem__
    out, sr = torchaudio.load(str(file), offset=offset, num_frames=num_frames)
  File "/home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torchaudio/__init__.py", line 85, in load
    filetype=filetype,
  File "/home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torchaudio/_sox_backend.py", line 47, in load
    filetype
RuntimeError: Offset past EOF

@adefossez
Copy link
Contributor

that is very weird. Can you check ffprobe /data/workspace/ntyoshi/dataset/valentini/noisy_trainset_wav/p250_002.wav, as well as

file = ' /data/workspace/ntyoshi/dataset/valentini/noisy_trainset_wav/p250_002.wav'
siginfo, _ = torchaudio.info(file)
length = siginfo.length // siginfo.channels
print(length)

Maybe also try with a more recent version of torchaudio?

@ntyoshi
Copy link
Author

ntyoshi commented Mar 3, 2021

Thanks for your response!
Please see the result before,

ffprobe:

$ ffprobe /data/workspace/ntyoshi/dataset/valentini/noisy_trainset_wav/p250_002.wav
ffprobe version 4.3.1 Copyright (c) 2007-2020 the FFmpeg developers
  built with gcc 7.5.0 (crosstool-NG 1.24.0.131_87df0e6_dirty)
  configuration: --prefix=/home/ntyoshi/anaconda3/envs/denoiser --cc=/home/conda/feedstock_root/build_artifacts/ffmpeg_1602879523915/_build_env/bin/x86_64-conda-linux-gnu-cc --disable-doc --disable-openssl --enable-avresample --enable-gnutls --enable-gpl --enable-hardcoded-tables --enable-libfreetype --enable-libopenh264 --enable-libx264 --enable-pic --enable-pthreads --enable-shared --enable-static --enable-version3 --enable-zlib --enable-libmp3lame
  libavutil      56. 51.100 / 56. 51.100
  libavcodec     58. 91.100 / 58. 91.100
  libavformat    58. 45.100 / 58. 45.100
  libavdevice    58. 10.100 / 58. 10.100
  libavfilter     7. 85.100 /  7. 85.100
  libavresample   4.  0.  0 /  4.  0.  0
  libswscale      5.  7.100 /  5.  7.100
  libswresample   3.  7.100 /  3.  7.100
  libpostproc    55.  7.100 / 55.  7.100
Input #0, wav, from '/data/workspace/ntyoshi/dataset/valentini/noisy_trainset_wav/p250_002.wav':
  Metadata:
    encoder         : Lavf58.45.100
  Duration: 00:00:03.00, bitrate: 256 kb/s
    Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, 1 channels, s16, 256 kb/s
  • Before torch audio updating

torchaudio version:

$ pip show torchaudio
Name: torchaudio
Version: 0.5.1
Summary: An audio package for PyTorch
Home-page: https://github.com/pytorch/audio
Author: Soumith Chintala, David Pollack, Sean Naren, Peter Goldsborough
Author-email: soumith@pytorch.org
License: UNKNOWN
Location: /data/home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages
Requires: torch
Required-by: denoiser

python script result:

$ python
Python 3.7.9 (default, Aug 31 2020, 12:42:55) 
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torchaudio
>>> file = '/data/workspace/ntyoshi/dataset/valentini/noisy_trainset_wav/p250_002.wav'
>>> siginfo, _ = torchaudio.info(file)
>>> length = siginfo.length // siginfo.channels
>>> print(length)
47938
  • After torch audio updating

torchaudio version:

$ pip show torchaudio
Name: torchaudio
Version: 0.7.0
Summary: An audio package for PyTorch
Home-page: https://github.com/pytorch/audio
Author: Soumith Chintala, David Pollack, Sean Naren, Peter Goldsborough
Author-email: soumith@pytorch.org
License: UNKNOWN
Location: /data/home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages
Requires: torch
Required-by: denoiser

python script result:

$ python
Python 3.7.9 (default, Aug 31 2020, 12:42:55) 
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torchaudio
/home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torchaudio/backend/utils.py:54: UserWarning: "sox" backend is being deprecated. The default backend will be changed to "sox_io" backend in 0.8.0 and "sox" backend will be removed in 0.9.0. Please migrate to "sox_io" backend. Please refer to https://github.com/pytorch/audio/issues/903 for the detail.
  '"sox" backend is being deprecated. '
>>> file = '/data/workspace/ntyoshi/dataset/valentini/noisy_trainset_wav/p250_002.wav'
>>> siginfo, _ = torchaudio.info(file)
>>> length = siginfo.length // siginfo.channels
>>> print(length)
47938

I'm trying on condo environment. Let me know if you need other information.
Thanks again!

@adefossez
Copy link
Contributor

The file size here 47938 doesn't match what is stored in the json (143813).
The only explanation I can think of is that the file size changed between when the list of files was computed and now. Could you try to remove the clean.json and noisy.json file, and regenerate them ?

@ntyoshi
Copy link
Author

ntyoshi commented Mar 4, 2021

Could you try to remove the clean.json and noisy.json file, and regenerate them ?

It worked!
Thank you!

@ntyoshi ntyoshi closed this as completed Mar 19, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants