Error at the generation stage by MusicGen stereo model #30217

ElizavetaSedova · 2024-04-12T13:35:16Z

System Info

I used the code to convert the original stereo model, but got an error when trying to generate a sound.
IndexError: index 4 is out of range

Who can help?

No response

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

My code:

my_path ="path_to_exported_stereo_model"
model = MusicgenForConditionalGeneration.from_pretrained(my_path, 
                                                         torchscript=True, return_dict=False)

processor = AutoProcessor.from_pretrained(my_path)

sample_length = 3 # seconds
device = "cuda"
model.to(device)
model.eval();

n_tokens = sample_length * model.config.audio_encoder.frame_rate + 3
sampling_rate = model.config.audio_encoder.sampling_rate

inputs = processor( 
    text=["music"],
    return_tensors="pt",
).to(device)

audio_values = model.generate(**inputs, do_sample=True, guidance_scale=3, max_new_tokens=n_tokens,
                             top_k=250, temperature=1)

There is th error:

---> 11 audio_values = model.generate(**inputs, do_sample=True, guidance_scale=3, max_new_tokens=n_tokens,
     12                              top_k=250, temperature=1)

File /opt/conda/envs/musicgen/lib/python3.9/site-packages/torch/utils/_contextlib.py:115, in context_decorator.<locals>.decorate_context(*args, **kwargs)
    112 @functools.wraps(func)
    113 def decorate_context(*args, **kwargs):
    114     with ctx_factory():
--> 115         return func(*args, **kwargs)

File /opt/conda/envs/musicgen/lib/python3.9/site-packages/transformers/models/musicgen/modeling_musicgen.py:2469, in MusicgenForConditionalGeneration.generate(self, inputs, generation_config, logits_processor, stopping_criteria, synced_gpus, **kwargs)
   2466 if audio_scales is None:
   2467     audio_scales = [None] * batch_size
-> 2469 output_values = self.audio_encoder.decode(
   2470     output_ids,
   2471     audio_scales=audio_scales,
   2472 )
   2474 if generation_config.return_dict_in_generate:
   2475     outputs.sequences = output_values.audio_values

File /opt/conda/envs/musicgen/lib/python3.9/site-packages/transformers/models/encodec/modeling_encodec.py:742, in EncodecModel.decode(self, audio_codes, audio_scales, padding_mask, return_dict)
    740     if len(audio_codes) != 1:
    741         raise ValueError(f"Expected one frame, got {len(audio_codes)}")
--> 742     audio_values = self._decode_frame(audio_codes[0], audio_scales[0])
    743 else:
    744     decoded_frames = []

File /opt/conda/envs/musicgen/lib/python3.9/site-packages/transformers/models/encodec/modeling_encodec.py:706, in EncodecModel._decode_frame(self, codes, scale)
    704 def _decode_frame(self, codes: torch.Tensor, scale: Optional[torch.Tensor] = None) -> torch.Tensor:
    705     codes = codes.transpose(0, 1)
--> 706     embeddings = self.quantizer.decode(codes)
    707     outputs = self.decoder(embeddings)
    708     if scale is not None:

File /opt/conda/envs/musicgen/lib/python3.9/site-packages/transformers/models/encodec/modeling_encodec.py:434, in EncodecResidualVectorQuantizer.decode(self, codes)
    432 quantized_out = torch.tensor(0.0, device=codes.device)
    433 for i, indices in enumerate(codes):
--> 434     layer = self.layers[i]
    435     quantized = layer.decode(indices)
    436     quantized_out = quantized_out + quantized

File /opt/conda/envs/musicgen/lib/python3.9/site-packages/torch/nn/modules/container.py:295, in ModuleList.__getitem__(self, idx)
    293     return self.__class__(list(self._modules.values())[idx])
    294 else:
--> 295     return self._modules[self._get_abs_string_index(idx)]

File /opt/conda/envs/musicgen/lib/python3.9/site-packages/torch/nn/modules/container.py:285, in ModuleList._get_abs_string_index(self, idx)
    283 idx = operator.index(idx)
    284 if not (-len(self) <= idx < len(self)):
--> 285     raise IndexError('index {} is out of range'.format(idx))
    286 if idx < 0:
    287     idx += len(self)

IndexError: index 4 is out of range

Expected behavior

Generation works for the mono model, but does not work for the stereo model.

The text was updated successfully, but these errors were encountered:

amyeroberts · 2024-04-12T15:18:53Z

cc @sanchit-gandhi @ylacombe

ylacombe · 2024-04-18T16:25:35Z

Hey @ElizavetaSedova, thanks for opening this issue, any reason you didn't use the stereo checkpoints on the hub who are already converted?

Also, could you push the model you converted to the hub and send the repository id here so that I can try to reproduce the error ?

Many thanks

ElizavetaSedova · 2024-04-18T17:02:21Z

@ylacombe This is my mistake, I didn’t notice that I was using import from a library transformers that I hadn’t updated to the new version. I'm closing this question because I don’t have a problem with stereo models anymore.

Thanks for your reply!

amyeroberts added the Audio label Apr 12, 2024

ElizavetaSedova closed this as completed Apr 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error at the generation stage by MusicGen stereo model #30217

Error at the generation stage by MusicGen stereo model #30217

ElizavetaSedova commented Apr 12, 2024 •

edited

amyeroberts commented Apr 12, 2024

ylacombe commented Apr 18, 2024

ElizavetaSedova commented Apr 18, 2024

Error at the generation stage by MusicGen stereo model #30217

Error at the generation stage by MusicGen stereo model #30217

Comments

ElizavetaSedova commented Apr 12, 2024 • edited

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

amyeroberts commented Apr 12, 2024

ylacombe commented Apr 18, 2024

ElizavetaSedova commented Apr 18, 2024

ElizavetaSedova commented Apr 12, 2024 •

edited