spkrec-ecapa-voxceleb-mel-spec model modifies mel spectrum in place when used with CPU #2390

jacobjwebber · 2024-02-05T18:59:23Z

Describe the bug

Thanks for your wonderful project. I would love to make my first speechbrain contribution and help fix this issue :). when I encode a mel spectrogram using a CPU using MelSpectrogramEncoder, the underlying mel spectrogram data is modified

Expected behaviour

Calculating the embedding should not modify the original data

To Reproduce

test_in_place.py

from speechbrain.pretrained import MelSpectrogramEncoder
import torch
from torch.nn.functional import mse_loss
import torchaudio

# On CPU
waveform, sr = torchaudio.load("tests/samples/TTS/LJ050-0131.wav")
waveform = torchaudio.functional.resample(waveform, sr, 16000)
spk_emb_encoder = MelSpectrogramEncoder.from_hparams(source="speechbrain/spkrec-ecapa-voxceleb-mel-spec", savedir="spk_emb_encoder_checkpoints", run_opts={"device": "cpu"})
mel = spk_emb_encoder.mel_spectogram(waveform)
mel_copy = torch.clone(mel)
spkr_emb = spk_emb_encoder.encode_mel_spectrogram(mel_copy)
print(mse_loss(mel, mel_copy))

# On CUDA device
waveform, sr = torchaudio.load("tests/samples/TTS/LJ050-0131.wav")
waveform = torchaudio.functional.resample(waveform, sr, 16000)
spk_emb_encoder = MelSpectrogramEncoder.from_hparams(source="speechbrain/spkrec-ecapa-voxceleb-mel-spec", savedir="spk_emb_encoder_checkpoints", run_opts={"device": "cuda"})
mel = spk_emb_encoder.mel_spectogram(waveform)
mel_copy = torch.clone(mel)
spkr_emb = spk_emb_encoder.encode_mel_spectrogram(mel_copy)
print(mse_loss(mel, mel_copy))

python tests/test_in_place.py
tensor(33.1424)
tensor(0.)

Environment Details

speechbrain '0.5.16'

Relevant Log Output

No response

Additional Context

It only happens with CPU

asumagic · 2024-04-11T13:22:30Z

Thanks for the convenient repro. I would note that this is not really specific to the CPU device: The bug seems to be in-place modification of the input tensor.

Since you're passing a CPU tensor in the second example, it gets migrated to CUDA and thus effectively cloned. If you .to("cuda") the waveform yourself the bug also reproduces.

asumagic · 2024-04-11T13:26:53Z

The bug is in InputNormalization when the norm_type is either "sentence" or "speaker", in which case it performs in-place assignments. I will make a PR soon.

jacobjwebber added the bug Something isn't working label Feb 5, 2024

Adel-Moumen assigned asumagic Apr 8, 2024

Adel-Moumen added this to the v1.0.1 milestone Apr 8, 2024

asumagic mentioned this issue Apr 11, 2024

Fix in-place input normalization when using sentence/speaker norm #2504

Merged

13 tasks

Adel-Moumen closed this as completed in #2504 Apr 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spkrec-ecapa-voxceleb-mel-spec model modifies mel spectrum in place when used with CPU #2390

spkrec-ecapa-voxceleb-mel-spec model modifies mel spectrum in place when used with CPU #2390

jacobjwebber commented Feb 5, 2024

asumagic commented Apr 11, 2024

asumagic commented Apr 11, 2024

spkrec-ecapa-voxceleb-mel-spec model modifies mel spectrum in place when used with CPU #2390

spkrec-ecapa-voxceleb-mel-spec model modifies mel spectrum in place when used with CPU #2390

Comments

jacobjwebber commented Feb 5, 2024

Describe the bug

Expected behaviour

To Reproduce

Environment Details

Relevant Log Output

Additional Context

asumagic commented Apr 11, 2024

asumagic commented Apr 11, 2024