Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spkrec-ecapa-voxceleb-mel-spec model modifies mel spectrum in place when used with CPU #2390

Closed
jacobjwebber opened this issue Feb 5, 2024 · 2 comments · Fixed by #2504
Closed
Assignees
Labels
bug Something isn't working
Milestone

Comments

@jacobjwebber
Copy link

Describe the bug

Thanks for your wonderful project. I would love to make my first speechbrain contribution and help fix this issue :). when I encode a mel spectrogram using a CPU using MelSpectrogramEncoder, the underlying mel spectrogram data is modified

Expected behaviour

Calculating the embedding should not modify the original data

To Reproduce

test_in_place.py

from speechbrain.pretrained import MelSpectrogramEncoder
import torch
from torch.nn.functional import mse_loss
import torchaudio

# On CPU
waveform, sr = torchaudio.load("tests/samples/TTS/LJ050-0131.wav")
waveform = torchaudio.functional.resample(waveform, sr, 16000)
spk_emb_encoder = MelSpectrogramEncoder.from_hparams(source="speechbrain/spkrec-ecapa-voxceleb-mel-spec", savedir="spk_emb_encoder_checkpoints", run_opts={"device": "cpu"})
mel = spk_emb_encoder.mel_spectogram(waveform)
mel_copy = torch.clone(mel)
spkr_emb = spk_emb_encoder.encode_mel_spectrogram(mel_copy)
print(mse_loss(mel, mel_copy))

# On CUDA device
waveform, sr = torchaudio.load("tests/samples/TTS/LJ050-0131.wav")
waveform = torchaudio.functional.resample(waveform, sr, 16000)
spk_emb_encoder = MelSpectrogramEncoder.from_hparams(source="speechbrain/spkrec-ecapa-voxceleb-mel-spec", savedir="spk_emb_encoder_checkpoints", run_opts={"device": "cuda"})
mel = spk_emb_encoder.mel_spectogram(waveform)
mel_copy = torch.clone(mel)
spkr_emb = spk_emb_encoder.encode_mel_spectrogram(mel_copy)
print(mse_loss(mel, mel_copy))

python tests/test_in_place.py
tensor(33.1424)
tensor(0.)

Environment Details

speechbrain '0.5.16'

Relevant Log Output

No response

Additional Context

It only happens with CPU

@jacobjwebber jacobjwebber added the bug Something isn't working label Feb 5, 2024
@Adel-Moumen Adel-Moumen added this to the v1.0.1 milestone Apr 8, 2024
@asumagic
Copy link
Collaborator

Thanks for the convenient repro. I would note that this is not really specific to the CPU device: The bug seems to be in-place modification of the input tensor.

Since you're passing a CPU tensor in the second example, it gets migrated to CUDA and thus effectively cloned. If you .to("cuda") the waveform yourself the bug also reproduces.

@asumagic
Copy link
Collaborator

The bug is in InputNormalization when the norm_type is either "sentence" or "speaker", in which case it performs in-place assignments. I will make a PR soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants