RuntimeError: Offset past EOF #54

ntyoshi · 2021-02-20T05:40:28Z

Hi,

I'm trying to reproduce your model.
I got an error when I started training on GPUs with launch_valentini.sh.
The error was 'Offset past EOF' but I'm not familiar with the error.
I didn't change conf/conf.yaml except for output directory of logs.
Can you give me any advices I should check next step?

Thank you.

Script output:

$ bash launch_valentini.sh
[2021-02-19 21:30:45,199][__main__][INFO] - For logs, checkpoints and samples check /data/workspace/ntyoshi/outputs/exp_bandmask=0.2,demucs.causal=1,demucs.hidden=48,demucs.resample=4,dset=valentini,remix=1,segment=4.5,shift=8000,shift_same=True,stft_loss=True,stride=0.5
[2021-02-19 21:30:45,719][denoiser.executor][INFO] - Starting 1 worker processes for DDP.
[2021-02-19 21:30:46,017][__main__][INFO] - For logs, checkpoints and samples check /data/workspace/ntyoshi/outputs/exp_bandmask=0.2,demucs.causal=1,demucs.hidden=48,demucs.resample=4,dset=valentini,remix=1,segment=4.5,shift=8000,shift_same=True,stft_loss=True,stride=0.5
[2021-02-19 21:30:49,350][denoiser.solver][INFO] - ----------------------------------------------------------------------
[2021-02-19 21:30:49,351][denoiser.solver][INFO] - Training...
[2021-02-19 21:30:49,483][__main__][ERROR] - Some error happened
Traceback (most recent call last):
  File "train.py", line 104, in main
    _main(args)
  File "train.py", line 98, in _main
    run(args)
  File "train.py", line 79, in run
    solver.train()
  File "/data/home/ntyoshi/denoiser/denoiser/solver.py", line 137, in train
    train_loss = self._run_one_epoch(epoch)
  File "/data/home/ntyoshi/denoiser/denoiser/solver.py", line 200, in _run_one_epoch
    for i, data in enumerate(logprog):
  File "/data/home/ntyoshi/denoiser/denoiser/utils.py", line 126, in __next__
    value = next(self._iterator)
  File "/home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 345, in __next__
    data = self._next_data()
  File "/home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 856, in _next_data
    return self._process_data(data)
  File "/home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 881, in _process_data
    data.reraise()
  File "/home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torch/_utils.py", line 395, in reraise
    raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/data/home/ntyoshi/denoiser/denoiser/data.py", line 96, in __getitem__
    return self.noisy_set[index], self.clean_set[index]
  File "/data/home/ntyoshi/denoiser/denoiser/audio.py", line 72, in __getitem__
    out, sr = torchaudio.load(str(file), offset=offset, num_frames=num_frames)
  File "/home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torchaudio/__init__.py", line 85, in load
    filetype=filetype,
  File "/home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torchaudio/_sox_backend.py", line 47, in load
    filetype
RuntimeError: Offset past EOF

[2021-02-19 21:30:49,532][denoiser.executor][ERROR] - Worker 0 died, killing all workers

The text was updated successfully, but these errors were encountered:

adefossez · 2021-02-22T18:00:35Z

Could you try to add some debug statements around this line: https://github.com/facebookresearch/denoiser/blob/master/denoiser/audio.py#L72

            try:
                out, sr = torchaudio.load(str(file), offset=offset, num_frames=num_frames)
            except Exception:
                print(file, examples, offset); raise

and then report the offending file? If you have ffmpeg installed, you can run ``ffprobe PATH_TO_FILE` so that i have more info to debug the issue.

ntyoshi · 2021-02-22T22:39:27Z

Hi @abhshkdz,

The code you gave me output the following lines:

[2021-02-22 14:27:41,384][denoiser.solver][INFO] - Training...
/data/workspace/ntyoshi/dataset/valentini/noisy_trainset_wav/p228_301.wav 9 56000
/data/workspace/ntyoshi/dataset/valentini/noisy_trainset_wav/p239_287.wav 12 72000
/data/workspace/ntyoshi/dataset/valentini/noisy_trainset_wav/p259_464.wav 10 48000
/data/workspace/ntyoshi/dataset/valentini/noisy_trainset_wav/p279_312.wav 14 88000
/data/workspace/ntyoshi/dataset/valentini/noisy_trainset_wav/p278_021.wav 44 224000
/data/workspace/ntyoshi/dataset/valentini/noisy_trainset_wav/p250_002.wav 10 72000
/data/workspace/ntyoshi/dataset/valentini/noisy_trainset_wav/p259_006.wav 25 144000
/data/workspace/ntyoshi/dataset/valentini/noisy_trainset_wav/p276_010.wav 13 96000
[2021-02-22 14:27:41,507][__main__][ERROR] - Some error happened
Traceback (most recent call last):
  File "train.py", line 104, in main
    _main(args)
  File "train.py", line 98, in _main
    run(args)
  File "train.py", line 79, in run
    solver.train()
  File "/data/home/ntyoshi/denoiser/denoiser/solver.py", line 137, in train
    train_loss = self._run_one_epoch(epoch)
  File "/data/home/ntyoshi/denoiser/denoiser/solver.py", line 200, in _run_one_epoch
    for i, data in enumerate(logprog):
  File "/data/home/ntyoshi/denoiser/denoiser/utils.py", line 126, in __next__
    value = next(self._iterator)
  File "/home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 345, in __next__
    data = self._next_data()
  File "/home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 856, in _next_data
    return self._process_data(data)
  File "/home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 881, in _process_data
    data.reraise()
  File "/home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torch/_utils.py", line 395, in reraise
    raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/data/home/ntyoshi/denoiser/denoiser/data.py", line 96, in __getitem__
    return self.noisy_set[index], self.clean_set[index]
  File "/data/home/ntyoshi/denoiser/denoiser/audio.py", line 73, in __getitem__
    out, sr = torchaudio.load(str(file), offset=offset, num_frames=num_frames)
  File "/home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torchaudio/__init__.py", line 85, in load
    filetype=filetype,
  File "/home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torchaudio/_sox_backend.py", line 47, in load
    filetype
RuntimeError: Offset past EOF

/data/workspace/ntyoshi/dataset/valentini/noisy_trainset_wav/p254_257.wav 17 80000
/data/workspace/ntyoshi/dataset/valentini/noisy_trainset_wav/p259_016.wav 32 216000
/data/workspace/ntyoshi/dataset/valentini/noisy_trainset_wav/p279_283.wav 7 48000
[2021-02-22 14:27:41,566][denoiser.executor][ERROR] - Worker 0 died, killing all workers

And when I tried ffprove to one of the path I got before the error was happened, I got these message:

$ ffprobe /data/workspace/ntyoshi/dataset/valentini/noisy_trainset_wav/p228_301.wav
ffprobe version 4.3.1 Copyright (c) 2007-2020 the FFmpeg developers
  built with gcc 7.5.0 (crosstool-NG 1.24.0.131_87df0e6_dirty)
  configuration: --prefix=/home/ntyoshi/anaconda3/envs/denoiser --cc=/home/conda/feedstock_root/build_artifacts/ffmpeg_1602879523915/_build_env/bin/x86_64-conda-linux-gnu-cc --disable-doc --disable-openssl --enable-avresample --enable-gnutls --enable-gpl --enable-hardcoded-tables --enable-libfreetype --enable-libopenh264 --enable-libx264 --enable-pic --enable-pthreads --enable-shared --enable-static --enable-version3 --enable-zlib --enable-libmp3lame
  libavutil      56. 51.100 / 56. 51.100
  libavcodec     58. 91.100 / 58. 91.100
  libavformat    58. 45.100 / 58. 45.100
  libavdevice    58. 10.100 / 58. 10.100
  libavfilter     7. 85.100 /  7. 85.100
  libavresample   4.  0.  0 /  4.  0.  0
  libswscale      5.  7.100 /  5.  7.100
  libswresample   3.  7.100 /  3.  7.100
  libpostproc    55.  7.100 / 55.  7.100
Input #0, wav, from '/data/workspace/ntyoshi/dataset/valentini/noisy_trainset_wav/p228_301.wav':
  Metadata:
    encoder         : Lavf58.45.100
  Duration: 00:00:02.78, bitrate: 256 kb/s
    Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, 1 channels, s16, 256 kb/s

Then that after the error, here:

$ ffprobe /data/workspace/ntyoshi/dataset/valentini/noisy_trainset_wav/p254_257.wav
ffprobe version 4.3.1 Copyright (c) 2007-2020 the FFmpeg developers
  built with gcc 7.5.0 (crosstool-NG 1.24.0.131_87df0e6_dirty)
  configuration: --prefix=/home/ntyoshi/anaconda3/envs/denoiser --cc=/home/conda/feedstock_root/build_artifacts/ffmpeg_1602879523915/_build_env/bin/x86_64-conda-linux-gnu-cc --disable-doc --disable-openssl --enable-avresample --enable-gnutls --enable-gpl --enable-hardcoded-tables --enable-libfreetype --enable-libopenh264 --enable-libx264 --enable-pic --enable-pthreads --enable-shared --enable-static --enable-version3 --enable-zlib --enable-libmp3lame
  libavutil      56. 51.100 / 56. 51.100
  libavcodec     58. 91.100 / 58. 91.100
  libavformat    58. 45.100 / 58. 45.100
  libavdevice    58. 10.100 / 58. 10.100
  libavfilter     7. 85.100 /  7. 85.100
  libavresample   4.  0.  0 /  4.  0.  0
  libswscale      5.  7.100 /  5.  7.100
  libswresample   3.  7.100 /  3.  7.100
  libpostproc    55.  7.100 / 55.  7.100
Input #0, wav, from '/data/workspace/ntyoshi/dataset/valentini/noisy_trainset_wav/p254_257.wav':
  Metadata:
    encoder         : Lavf58.45.100
  Duration: 00:00:04.14, bitrate: 256 kb/s
    Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, 1 channels, s16, 256 kb/s

Sorry for the long comment!

adefossez · 2021-02-23T13:40:11Z

Sorry can you do the same with passing num_workers=0? This will try to load a single file at once, which will avoid having so many errors in parallel. Also replace this line:
https://github.com/facebookresearch/denoiser/blob/master/denoiser/audio.py#L63
with for (file, file_size), examples in zip(self.files, self.num_examples): and add file_size in the print call in the except:

ntyoshi · 2021-02-23T23:38:07Z

Fixed audio.py:

...
        for (file, file_size), examples in zip(self.files, self.num_examples):
            if index >= examples:
                index -= examples
                continue
            num_frames = 0
            offset = 0
            if self.length is not None:
                offset = self.stride * index
                num_frames = self.length
            try:
                out, sr = torchaudio.load(str(file), offset=offset, num_frames=num_frames)
            except Exception:
                print(file, examples, offset, file_size); raise
...

conf/config.yml:

...
# Logging and printing, and does not impact training
num_prints: 5
device: cuda
num_workers: 0
verbose: 0
show: 0   # just show the model and its size and exit
...

I got this messages:

$ bash launch_valentini.sh
[2021-02-23 15:33:19,178][__main__][INFO] - For logs, checkpoints and samples check /data/workspace/ntyoshi/outputs/exp_bandmask=0.2,demucs.causal=1,demucs.hidden=48,demucs.resample=4,dset=valentini,remix=1,segment=4.5,shift=8000,shift_same=True,stft_loss=True,stride=0.5
[2021-02-23 15:33:19,708][denoiser.executor][INFO] - Starting 1 worker processes for DDP.
[2021-02-23 15:33:19,966][__main__][INFO] - For logs, checkpoints and samples check /data/workspace/ntyoshi/outputs/exp_bandmask=0.2,demucs.causal=1,demucs.hidden=48,demucs.resample=4,dset=valentini,remix=1,segment=4.5,shift=8000,shift_same=True,stft_loss=True,stride=0.5
[2021-02-23 15:33:23,088][denoiser.solver][INFO] - ----------------------------------------------------------------------
[2021-02-23 15:33:23,088][denoiser.solver][INFO] - Training...
/data/workspace/ntyoshi/dataset/valentini/noisy_trainset_wav/p250_002.wav 10 72000 143813
[2021-02-23 15:33:23,108][__main__][ERROR] - Some error happened
Traceback (most recent call last):
  File "train.py", line 104, in main
    _main(args)
  File "train.py", line 98, in _main
    run(args)
  File "train.py", line 79, in run
    solver.train()
  File "/data/home/ntyoshi/denoiser/denoiser/solver.py", line 137, in train
    train_loss = self._run_one_epoch(epoch)
  File "/data/home/ntyoshi/denoiser/denoiser/solver.py", line 200, in _run_one_epoch
    for i, data in enumerate(logprog):
  File "/data/home/ntyoshi/denoiser/denoiser/utils.py", line 126, in __next__
    value = next(self._iterator)
  File "/home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 345, in __next__
    data = self._next_data()
  File "/home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 385, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "/home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/data/home/ntyoshi/denoiser/denoiser/data.py", line 96, in __getitem__
    return self.noisy_set[index], self.clean_set[index]
  File "/data/home/ntyoshi/denoiser/denoiser/audio.py", line 74, in __getitem__
    out, sr = torchaudio.load(str(file), offset=offset, num_frames=num_frames)
  File "/home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torchaudio/__init__.py", line 85, in load
    filetype=filetype,
  File "/home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torchaudio/_sox_backend.py", line 47, in load
    filetype
RuntimeError: Offset past EOF

adefossez · 2021-03-02T14:28:30Z

that is very weird. Can you check ffprobe /data/workspace/ntyoshi/dataset/valentini/noisy_trainset_wav/p250_002.wav, as well as

file = ' /data/workspace/ntyoshi/dataset/valentini/noisy_trainset_wav/p250_002.wav'
siginfo, _ = torchaudio.info(file)
length = siginfo.length // siginfo.channels
print(length)

Maybe also try with a more recent version of torchaudio?

ntyoshi · 2021-03-03T06:21:49Z

Thanks for your response!
Please see the result before,

ffprobe:

$ ffprobe /data/workspace/ntyoshi/dataset/valentini/noisy_trainset_wav/p250_002.wav
ffprobe version 4.3.1 Copyright (c) 2007-2020 the FFmpeg developers
  built with gcc 7.5.0 (crosstool-NG 1.24.0.131_87df0e6_dirty)
  configuration: --prefix=/home/ntyoshi/anaconda3/envs/denoiser --cc=/home/conda/feedstock_root/build_artifacts/ffmpeg_1602879523915/_build_env/bin/x86_64-conda-linux-gnu-cc --disable-doc --disable-openssl --enable-avresample --enable-gnutls --enable-gpl --enable-hardcoded-tables --enable-libfreetype --enable-libopenh264 --enable-libx264 --enable-pic --enable-pthreads --enable-shared --enable-static --enable-version3 --enable-zlib --enable-libmp3lame
  libavutil      56. 51.100 / 56. 51.100
  libavcodec     58. 91.100 / 58. 91.100
  libavformat    58. 45.100 / 58. 45.100
  libavdevice    58. 10.100 / 58. 10.100
  libavfilter     7. 85.100 /  7. 85.100
  libavresample   4.  0.  0 /  4.  0.  0
  libswscale      5.  7.100 /  5.  7.100
  libswresample   3.  7.100 /  3.  7.100
  libpostproc    55.  7.100 / 55.  7.100
Input #0, wav, from '/data/workspace/ntyoshi/dataset/valentini/noisy_trainset_wav/p250_002.wav':
  Metadata:
    encoder         : Lavf58.45.100
  Duration: 00:00:03.00, bitrate: 256 kb/s
    Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, 1 channels, s16, 256 kb/s

Before torch audio updating

torchaudio version:

$ pip show torchaudio
Name: torchaudio
Version: 0.5.1
Summary: An audio package for PyTorch
Home-page: https://github.com/pytorch/audio
Author: Soumith Chintala, David Pollack, Sean Naren, Peter Goldsborough
Author-email: soumith@pytorch.org
License: UNKNOWN
Location: /data/home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages
Requires: torch
Required-by: denoiser

python script result:

$ python
Python 3.7.9 (default, Aug 31 2020, 12:42:55) 
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torchaudio
>>> file = '/data/workspace/ntyoshi/dataset/valentini/noisy_trainset_wav/p250_002.wav'
>>> siginfo, _ = torchaudio.info(file)
>>> length = siginfo.length // siginfo.channels
>>> print(length)
47938

After torch audio updating

torchaudio version:

$ pip show torchaudio
Name: torchaudio
Version: 0.7.0
Summary: An audio package for PyTorch
Home-page: https://github.com/pytorch/audio
Author: Soumith Chintala, David Pollack, Sean Naren, Peter Goldsborough
Author-email: soumith@pytorch.org
License: UNKNOWN
Location: /data/home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages
Requires: torch
Required-by: denoiser

python script result:

$ python
Python 3.7.9 (default, Aug 31 2020, 12:42:55) 
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torchaudio
/home/ntyoshi/anaconda3/envs/denoiser/lib/python3.7/site-packages/torchaudio/backend/utils.py:54: UserWarning: "sox" backend is being deprecated. The default backend will be changed to "sox_io" backend in 0.8.0 and "sox" backend will be removed in 0.9.0. Please migrate to "sox_io" backend. Please refer to https://github.com/pytorch/audio/issues/903 for the detail.
  '"sox" backend is being deprecated. '
>>> file = '/data/workspace/ntyoshi/dataset/valentini/noisy_trainset_wav/p250_002.wav'
>>> siginfo, _ = torchaudio.info(file)
>>> length = siginfo.length // siginfo.channels
>>> print(length)
47938

I'm trying on condo environment. Let me know if you need other information.
Thanks again!

adefossez · 2021-03-04T15:45:26Z

The file size here 47938 doesn't match what is stored in the json (143813).
The only explanation I can think of is that the file size changed between when the list of files was computed and now. Could you try to remove the clean.json and noisy.json file, and regenerate them ?

ntyoshi · 2021-03-04T21:07:42Z

Could you try to remove the clean.json and noisy.json file, and regenerate them ?

It worked!
Thank you!

ntyoshi closed this as completed Mar 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: Offset past EOF #54

RuntimeError: Offset past EOF #54

ntyoshi commented Feb 20, 2021

adefossez commented Feb 22, 2021

ntyoshi commented Feb 22, 2021

adefossez commented Feb 23, 2021

ntyoshi commented Feb 23, 2021

adefossez commented Mar 2, 2021

ntyoshi commented Mar 3, 2021

adefossez commented Mar 4, 2021

ntyoshi commented Mar 4, 2021

RuntimeError: Offset past EOF #54

RuntimeError: Offset past EOF #54

Comments

ntyoshi commented Feb 20, 2021

adefossez commented Feb 22, 2021

ntyoshi commented Feb 22, 2021

adefossez commented Feb 23, 2021

ntyoshi commented Feb 23, 2021

adefossez commented Mar 2, 2021

ntyoshi commented Mar 3, 2021

adefossez commented Mar 4, 2021

ntyoshi commented Mar 4, 2021