Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error at the generation stage by MusicGen stereo model #30217

Closed
4 tasks
ElizavetaSedova opened this issue Apr 12, 2024 · 3 comments
Closed
4 tasks

Error at the generation stage by MusicGen stereo model #30217

ElizavetaSedova opened this issue Apr 12, 2024 · 3 comments
Labels

Comments

@ElizavetaSedova
Copy link

ElizavetaSedova commented Apr 12, 2024

System Info

I used the code to convert the original stereo model, but got an error when trying to generate a sound.
IndexError: index 4 is out of range

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

My code:

my_path ="path_to_exported_stereo_model"
model = MusicgenForConditionalGeneration.from_pretrained(my_path, 
                                                         torchscript=True, return_dict=False)

processor = AutoProcessor.from_pretrained(my_path)

sample_length = 3 # seconds
device = "cuda"
model.to(device)
model.eval();

n_tokens = sample_length * model.config.audio_encoder.frame_rate + 3
sampling_rate = model.config.audio_encoder.sampling_rate

inputs = processor( 
    text=["music"],
    return_tensors="pt",
).to(device)

audio_values = model.generate(**inputs, do_sample=True, guidance_scale=3, max_new_tokens=n_tokens,
                             top_k=250, temperature=1)

There is th error:

---> 11 audio_values = model.generate(**inputs, do_sample=True, guidance_scale=3, max_new_tokens=n_tokens,
     12                              top_k=250, temperature=1)

File /opt/conda/envs/musicgen/lib/python3.9/site-packages/torch/utils/_contextlib.py:115, in context_decorator.<locals>.decorate_context(*args, **kwargs)
    112 @functools.wraps(func)
    113 def decorate_context(*args, **kwargs):
    114     with ctx_factory():
--> 115         return func(*args, **kwargs)

File /opt/conda/envs/musicgen/lib/python3.9/site-packages/transformers/models/musicgen/modeling_musicgen.py:2469, in MusicgenForConditionalGeneration.generate(self, inputs, generation_config, logits_processor, stopping_criteria, synced_gpus, **kwargs)
   2466 if audio_scales is None:
   2467     audio_scales = [None] * batch_size
-> 2469 output_values = self.audio_encoder.decode(
   2470     output_ids,
   2471     audio_scales=audio_scales,
   2472 )
   2474 if generation_config.return_dict_in_generate:
   2475     outputs.sequences = output_values.audio_values

File /opt/conda/envs/musicgen/lib/python3.9/site-packages/transformers/models/encodec/modeling_encodec.py:742, in EncodecModel.decode(self, audio_codes, audio_scales, padding_mask, return_dict)
    740     if len(audio_codes) != 1:
    741         raise ValueError(f"Expected one frame, got {len(audio_codes)}")
--> 742     audio_values = self._decode_frame(audio_codes[0], audio_scales[0])
    743 else:
    744     decoded_frames = []

File /opt/conda/envs/musicgen/lib/python3.9/site-packages/transformers/models/encodec/modeling_encodec.py:706, in EncodecModel._decode_frame(self, codes, scale)
    704 def _decode_frame(self, codes: torch.Tensor, scale: Optional[torch.Tensor] = None) -> torch.Tensor:
    705     codes = codes.transpose(0, 1)
--> 706     embeddings = self.quantizer.decode(codes)
    707     outputs = self.decoder(embeddings)
    708     if scale is not None:

File /opt/conda/envs/musicgen/lib/python3.9/site-packages/transformers/models/encodec/modeling_encodec.py:434, in EncodecResidualVectorQuantizer.decode(self, codes)
    432 quantized_out = torch.tensor(0.0, device=codes.device)
    433 for i, indices in enumerate(codes):
--> 434     layer = self.layers[i]
    435     quantized = layer.decode(indices)
    436     quantized_out = quantized_out + quantized

File /opt/conda/envs/musicgen/lib/python3.9/site-packages/torch/nn/modules/container.py:295, in ModuleList.__getitem__(self, idx)
    293     return self.__class__(list(self._modules.values())[idx])
    294 else:
--> 295     return self._modules[self._get_abs_string_index(idx)]

File /opt/conda/envs/musicgen/lib/python3.9/site-packages/torch/nn/modules/container.py:285, in ModuleList._get_abs_string_index(self, idx)
    283 idx = operator.index(idx)
    284 if not (-len(self) <= idx < len(self)):
--> 285     raise IndexError('index {} is out of range'.format(idx))
    286 if idx < 0:
    287     idx += len(self)

IndexError: index 4 is out of range

Expected behavior

Generation works for the mono model, but does not work for the stereo model.

@amyeroberts
Copy link
Collaborator

cc @sanchit-gandhi @ylacombe

@ylacombe
Copy link
Contributor

Hey @ElizavetaSedova, thanks for opening this issue, any reason you didn't use the stereo checkpoints on the hub who are already converted?

Also, could you push the model you converted to the hub and send the repository id here so that I can try to reproduce the error ?

Many thanks

@ElizavetaSedova
Copy link
Author

@ylacombe This is my mistake, I didn’t notice that I was using import from a library transformers that I hadn’t updated to the new version. I'm closing this question because I don’t have a problem with stereo models anymore.

Thanks for your reply!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants