Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Corrupted 1b_lyrics checkpoint? #25

Closed
Desm0nt opened this issue May 2, 2020 · 11 comments
Closed

Corrupted 1b_lyrics checkpoint? #25

Desm0nt opened this issue May 2, 2020 · 11 comments

Comments

@Desm0nt
Copy link

Desm0nt commented May 2, 2020

Have the same issue on local machine (Ubuntu 20.04, 1080Ti, Anaconda, python 3.7, all installed as in readme) and on Google CoLab.

When fetching checkpoint for 1b_lyrics model and try to start:

(jukebox) desm0nt@desm0nt-linux:~/jukebox$ python jukebox/sample.py --model=1b_lyrics --name=sample_1b --levels=3 --sample_length_in_seconds=20 --total_sample_length_in_seconds=180 --sr=44100 --n_samples=4 --hop_fraction=0.5,0.5,0.125
Using cuda True
{'name': 'sample_1b', 'levels': 3, 'sample_length_in_seconds': 20, 'total_sample_length_in_seconds': 180, 'sr': 44100, 'n_samples': 4, 'hop_fraction': (0.5, 0.5, 0.125)}
Setting sample length to 881920 (i.e. 19.998185941043083 seconds) to be multiple of 128
Downloading from gce
Restored from /home/desm0nt/.cache/jukebox-assets/models/5b/vqvae.pth.tar
0: Loading vqvae in eval mode
Conditioning on 1 above level(s)
Checkpointing convs
Checkpointing convs
Loading artist IDs from /home/desm0nt/jukebox/jukebox/data/ids/v2_artist_ids.txt
Loading artist IDs from /home/desm0nt/jukebox/jukebox/data/ids/v2_genre_ids.txt
Level:0, Cond downsample:4, Raw to tokens:8, Sample length:65536
Downloading from gce
Restored from /home/desm0nt/.cache/jukebox-assets/models/5b/prior_level_0.pth.tar
0: Loading prior in eval mode
Conditioning on 1 above level(s)
Checkpointing convs
Checkpointing convs
Loading artist IDs from /home/desm0nt/jukebox/jukebox/data/ids/v2_artist_ids.txt
Loading artist IDs from /home/desm0nt/jukebox/jukebox/data/ids/v2_genre_ids.txt
Level:1, Cond downsample:4, Raw to tokens:32, Sample length:262144
Downloading from gce
Restored from /home/desm0nt/.cache/jukebox-assets/models/5b/prior_level_1.pth.tar
0: Loading prior in eval mode
Creating cond. autoregress with prior bins [79, 2048], 
dims [384, 6144], 
shift [ 0 79]
input shape 6528
input bins 2127
Self copy is False
Loading artist IDs from /home/desm0nt/jukebox/jukebox/data/ids/v3_artist_ids.txt
Loading artist IDs from /home/desm0nt/jukebox/jukebox/data/ids/v3_genre_ids.txt
Level:2, Cond downsample:None, Raw to tokens:128, Sample length:786432
Downloading from gce
Traceback (most recent call last):
  File "jukebox/sample.py", line 237, in <module>
    fire.Fire(run)
  File "/home/desm0nt/anaconda3/envs/jukebox/lib/python3.7/site-packages/fire/core.py", line 127, in Fire
    component_trace = _Fire(component, args, context, name)
  File "/home/desm0nt/anaconda3/envs/jukebox/lib/python3.7/site-packages/fire/core.py", line 366, in _Fire
    component, remaining_args)
  File "/home/desm0nt/anaconda3/envs/jukebox/lib/python3.7/site-packages/fire/core.py", line 542, in _CallCallable
    result = fn(*varargs, **kwargs)
  File "jukebox/sample.py", line 234, in run
    save_samples(model, device, hps, sample_hps)
  File "jukebox/sample.py", line 157, in save_samples
    vqvae, priors = make_model(model, device, hps)
  File "/home/desm0nt/jukebox/jukebox/make_models.py", line 185, in make_model
    priors = [make_prior(setup_hparams(priors[level], dict()), vqvae, 'cpu') for level in levels]
  File "/home/desm0nt/jukebox/jukebox/make_models.py", line 185, in <listcomp>
    priors = [make_prior(setup_hparams(priors[level], dict()), vqvae, 'cpu') for level in levels]
  File "/home/desm0nt/jukebox/jukebox/make_models.py", line 169, in make_prior
    restore(hps, prior, hps.restore_prior)
  File "/home/desm0nt/jukebox/jukebox/make_models.py", line 54, in restore
    checkpoint = load_checkpoint(checkpoint_path)
  File "/home/desm0nt/jukebox/jukebox/make_models.py", line 37, in load_checkpoint
    checkpoint = t.load(restore, map_location=t.device('cpu'))
  File "/home/desm0nt/anaconda3/envs/jukebox/lib/python3.7/site-packages/torch/serialization.py", line 529, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "/home/desm0nt/anaconda3/envs/jukebox/lib/python3.7/site-packages/torch/serialization.py", line 709, in _legacy_load
    deserialized_objects[key]._set_from_file(f, offset, f_should_read_directly)
RuntimeError: unexpected EOF, expected 61312207 more bytes. The file might be corrupted.
corrupted double-linked list
Aborted (core dumped)
@Desm0nt Desm0nt closed this as completed May 2, 2020
@Jovonni
Copy link

Jovonni commented May 2, 2020

@Desm0nt can you post the solution before you close it?

@Desm0nt
Copy link
Author

Desm0nt commented May 3, 2020

@Jovonni on local machine I just run out of free space in /home drive and it's abort the downloading process. But I still don't know what the problem happens in CoLab and how to fix it.

@ssrp
Copy link

ssrp commented May 4, 2020

@Desm0nt @Jovonni Sorry, I still couldn't fix this problem. Does anybody have a solution? I have 50GB free space on the system, 128GB CPU RAM and 32 GB GPU memory.

@apeguero1
Copy link

I had this problem after prematurely exiting a sampling execution before the prior models could finish downloading. You might have a truncated prior model file already saved in your cache folder.

Try clearing the cache found at /root/.cache/jukebox-assets (in google colab). For the 5b model I had to delete /root/.cache/jukebox-assets/models/5b/prior_level_0.pth.tar so that a fresh download would start instead of trying to read the existing file.

@ssrp
Copy link

ssrp commented May 7, 2020

@apeguero1 Thank you for the prompt response -- it works now! :)

@LeapGamer
Copy link

I am getting this error when following the main instructions:

(jukebox) C:\Users\james\jukebox>python jukebox/sample.py --model=5b_lyrics --name=sample_5b --levels=3 --sample_length_in_seconds=20 --total_sample_length_in_seconds=180 --sr=44100 --n_samples=6 --hop_fraction=0.5,0.5,0.125 C:\Users\james\Anaconda3\envs\jukebox\lib\site-packages\librosa\util\decorators.py:9: NumbaDeprecationWarning: An import was requested from a module that has moved location. Import requested from: 'numba.decorators', please update to use 'numba.core.decorators' or pin to Numba version 0.48.0. This alias will not be present in Numba version 0.50.0. from numba.decorators import jit as optional_jit C:\Users\james\Anaconda3\envs\jukebox\lib\site-packages\librosa\util\decorators.py:9: NumbaDeprecationWarning: An import was requested from a module that has moved location. Import of 'jit' requested from: 'numba.decorators', please update to use 'numba.core.decorators' or pin to Numba version 0.48.0. This alias will not be present in Numba version 0.50.0. from numba.decorators import jit as optional_jit Using cuda True {'name': 'sample_5b', 'levels': 3, 'sample_length_in_seconds': 20, 'total_sample_length_in_seconds': 180, 'sr': 44100, 'n_samples': 6, 'hop_fraction': (0.5, 0.5, 0.125)} Setting sample length to 881920 (i.e. 19.998185941043083 seconds) to be multiple of 128 Downloading from gce Traceback (most recent call last): File "jukebox/sample.py", line 279, in <module> fire.Fire(run) File "C:\Users\james\Anaconda3\envs\jukebox\lib\site-packages\fire\core.py", line 127, in Fire component_trace = _Fire(component, args, context, name) File "C:\Users\james\Anaconda3\envs\jukebox\lib\site-packages\fire\core.py", line 366, in _Fire component, remaining_args) File "C:\Users\james\Anaconda3\envs\jukebox\lib\site-packages\fire\core.py", line 542, in _CallCallable result = fn(*varargs, **kwargs) File "jukebox/sample.py", line 276, in run save_samples(model, device, hps, sample_hps) File "jukebox/sample.py", line 181, in save_samples vqvae, priors = make_model(model, device, hps) File "c:\users\james\jukebox\jukebox\make_models.py", line 191, in make_model vqvae = make_vqvae(setup_hparams(vqvae, dict(sample_length=hps.get('sample_length', 0), sample_length_in_seconds=hps.get('sample_length_in_seconds', 0))), device) File "c:\users\james\jukebox\jukebox\make_models.py", line 95, in make_vqvae restore_model(hps, vqvae, hps.restore_vqvae) File "c:\users\james\jukebox\jukebox\make_models.py", line 55, in restore_model checkpoint = load_checkpoint(checkpoint_path) File "c:\users\james\jukebox\jukebox\make_models.py", line 37, in load_checkpoint checkpoint = t.load(restore, map_location=t.device('cpu')) File "C:\Users\james\Anaconda3\envs\jukebox\lib\site-packages\torch\serialization.py", line 386, in load return _load(f, map_location, pickle_module, **pickle_load_args) File "C:\Users\james\Anaconda3\envs\jukebox\lib\site-packages\torch\serialization.py", line 563, in _load magic_number = pickle_module.load(f, **pickle_load_args) EOFError: Ran out of input

I have deleted the cache and still get it. I have 30gb of space. Any ideas?

@LeapGamer
Copy link

LeapGamer commented Jun 21, 2020

I am on Windows, my vqvae.pth.tar is 0KB. Happens with both 5b and 1b model.

@NoiseGener8r
Copy link

NoiseGener8r commented Jul 3, 2020

I have the same issue. I've tried deleting the file found at C:\Users\rfnoi\.cache\jukebox-assets\models\5b\vqvae.pth.tar, but it re-creates it and crashes with the same error EOFError: Ran out of input.

E: I have also attempted this with the 1b model. Nothing new is found in C:\Users\rfnoi\.cache\jukebox-assets\models\. Should I expect a /1b/ directory?

@TheLionArye
Copy link

I have the same issue. I've tried deleting the file found at C:\Users\rfnoi\.cache\jukebox-assets\models\5b\vqvae.pth.tar, but it re-creates it and crashes with the same error EOFError: Ran out of input.

E: I have also attempted this with the 1b model. Nothing new is found in C:\Users\rfnoi\.cache\jukebox-assets\models\. Should I expect a /1b/ directory?

did you get your answer? I'm having that problem too

@mwcm
Copy link

mwcm commented Apr 16, 2021

same issue here, WIndows 10

@cicinwad
Copy link

same issue here, WIndows 10

No issue here, Windows 8.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants