[Bug]: Custom SpeakerID model troubles #1996
Unanswered
andresgvargas
asked this question in
Q&A
Replies: 2 comments
-
Hey @andresgvargas, I convert your issue to a discussion as it does not seems to be related to an issue with SpeechBrain. Could you please share some of your training log regarding your model please ? What are you training data? thanks. |
Beta Was this translation helpful? Give feedback.
0 replies
-
Hi @Adel-Moumen ! |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Describe the bug
Hi everyone!
I-m having some trouble using a custom model that I trained with my own audios. After the training I used some of the colabs tutorials to load and use my custom model.
If I use the 'clasify_batch' it correctly ID's the speaker, as shown here:
But if i tried to compare audios with 'verify_files' it always returns a tensor of 0, as shown here:
This behaviour just happens if I try to do it with my own model, as the one made from VoxCeleb that's on HugginFace works just fine, so I'd like some guidance to know where am I failing or if it is something about my use of SpeechBrain
Expected behaviour
I expected values diferrent from 0 when using 'verify_files'
To Reproduce
Inference YAML
pretrain folders:
pretrained_path: best_model/
Model parameters
n_mels: 40
sample_rate: 48000
n_classes: 33 # In this case, we have 28 speakers
emb_dim: 512 # dimensionality of the embeddings
Feature extraction
compute_features: !new:speechbrain.lobes.features.Fbank
n_mels: !ref <n_mels>
Mean and std normalization of the input features
mean_var_norm: !new:speechbrain.processing.features.InputNormalization
norm_type: sentence
std_norm: False
Mean and std normalization of the input features
mean_var_norm_emb: !new:speechbrain.processing.features.InputNormalization
norm_type: sentence
std_norm: False
embedding_model: !new:custom_model.Xvector
in_channels: !ref <n_mels>
activation: !name:torch.nn.LeakyReLU
tdnn_blocks: 5
tdnn_channels: [512, 512, 512, 512, 1500]
tdnn_kernel_sizes: [5, 3, 3, 1, 1]
tdnn_dilations: [1, 2, 3, 1, 1]
lin_neurons: !ref <emb_dim>
classifier: !new:custom_model.Classifier
input_shape: [null, null, !ref <emb_dim>]
activation: !name:torch.nn.LeakyReLU
lin_blocks: 1
lin_neurons: !ref <emb_dim>
out_neurons: !ref <n_classes>
label_encoder: !new:speechbrain.dataio.encoder.CategoricalEncoder
modules:
compute_features: !ref <compute_features>
embedding_model: !ref <embedding_model>
classifier: !ref
mean_var_norm: !ref <mean_var_norm>
# mean_var_norm_emb: !ref <mean_var_norm_emb>
pretrainer: !new:speechbrain.utils.parameter_transfer.Pretrainer
loadables:
embedding_model: !ref <embedding_model>
classifier: !ref
label_encoder: !ref <label_encoder>
mean_var_norm: !ref <mean_var_norm>
# mean_var_norm_emb: !ref <mean_var_norm_emb>
paths:
embedding_model: !ref <pretrained_path>/embedding_model.ckpt
classifier: !ref <pretrained_path>/classifier.ckpt
label_encoder: !ref <pretrained_path>/label_encoder.txt
mean_var_norm: !ref <pretrained_path>/normalizer.ckpt
# mean_var_norm_emb: !ref <pretrained_path>/normalizer.ckpt
Versions
No response
Relevant log output
No response
Additional context
No response
Beta Was this translation helpful? Give feedback.
All reactions