RuntimeError: Sizes of tensors must match except in dimension 0. Expected size 80000 but got size 79659 for tensor number 30 in the list #1324

Pkoiralap · 2023-04-11T22:56:08Z

I was experiencing this issue and was able to solve it using guard rails around the waveform creation process. However, I believe there should be a better way to do this and that's why I am not creating a PR with the fix. If you think this fix works, I can submit a PR.

So in the audio/pipelines/speaker_diarization.py file. i.e. this bit here:

# chunk: Segment(t, t + duration)
# masks: (num_frames, local_num_speakers) np.ndarray
waveform, _ = self._audio.crop(
    file,
    chunk,
    duration=duration,
    mode="pad",
)
# waveform: (1, num_samples) torch.Tensor
...

creates a tensor of a different size than what is expected. The simple fix I am using for now is:

# chunk: Segment(t, t + duration)
# masks: (num_frames, local_num_speakers) np.ndarray
waveform, _ = self._audio.crop(
    file,
    chunk,
    duration=duration,
    mode="pad",
)
# waveform: (1, num_samples) torch.Tensor
if waveform.shape[1] < num_samples:
    pad_num = int(num_samples - waveform.shape[1])
    waveform = torch.nn.functional.pad(waveform, (0, pad_num), "constant", 0)

This is hacky, but it works for me. Also because I didn't have a lot of time reading through the code, I am not sure if this is the right fix for the problem. As I said earlier, depending on whether this is a 'good' fix, I can create a PR accordingly.

Thanks

The text was updated successfully, but these errors were encountered:

github-actions · 2023-04-11T22:56:27Z

Thank you for your issue. Give us a little time to review it.

PS. You might want to check the FAQ if you haven't done so already.

This is an automated reply, generated by FAQtory

hbredin · 2023-04-12T10:18:54Z

Would you mind narrowing down when this problem occurs?

This is most likely a bug in the implementation of the mode="pad" option which is already supposed to handle this corner case...

stale · 2023-10-09T21:51:31Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Pkoiralap added a commit to Pkoiralap/pyannote-audio that referenced this issue Apr 13, 2023

fix pyannote#1324: adjust pad_end to match num_frames

8c238a5

Pkoiralap mentioned this issue Apr 13, 2023

fix #1324: adjust pad_end to match num_frames #1326

Closed

stale bot added the wontfix label Oct 9, 2023

stale bot closed this as completed Nov 9, 2023

fablau mentioned this issue Nov 28, 2023

Unable to create a correct diarization #1567

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: Sizes of tensors must match except in dimension 0. Expected size 80000 but got size 79659 for tensor number 30 in the list #1324

RuntimeError: Sizes of tensors must match except in dimension 0. Expected size 80000 but got size 79659 for tensor number 30 in the list #1324

Pkoiralap commented Apr 11, 2023

github-actions bot commented Apr 11, 2023

hbredin commented Apr 12, 2023

stale bot commented Oct 9, 2023

RuntimeError: Sizes of tensors must match except in dimension 0. Expected size 80000 but got size 79659 for tensor number 30 in the list #1324

RuntimeError: Sizes of tensors must match except in dimension 0. Expected size 80000 but got size 79659 for tensor number 30 in the list #1324

Comments

Pkoiralap commented Apr 11, 2023

github-actions bot commented Apr 11, 2023

hbredin commented Apr 12, 2023

stale bot commented Oct 9, 2023