New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
About fairseq wav2vec2 extractor #43
Comments
Hi @xieyuankun Sorry for the inconvenience. To make the life easier, please run this script to install the conda environment.
This is written in https://github.com/nii-yamagishilab/project-NN-Pytorch-scripts#python-environment You can also follow the script and install fairseq manually. |
Thanks for you quick reply @TonyWangX
I ’ll keep trying to fix this bug. By the way, may i ask you another question about wav2vec2 extraction. Have you ever tried the huggingface version of wav2vec2 ? I have used following code to extract wav2vec2 feature.
However, the above wav2vec2_vector with LCNN backbone got 12%+ EER for 19LA. So, what's wrong with this extraction process? |
I guess that wav2vec_big_960h is a fine-tuned model. Could you try https://dl.fbaipublicfiles.com/fairseq/wav2vec/libri960_big.pt instead? |
I haven't used huggingface wav2vec hence cannot guess. Maybe related to the above issue, if the model is fine-tuned for ASR, using the last self-attention layer may not be a good option. FYI, not relevant to the above issue, some prefer using intermediate layers' output from the wav2vec 10.21437/Interspeech.2022-11460 |
Thanks! You are right ! |
Thank you for the feedback! |
hello, i have a problem about wav2vec2 feature extrcator.
i use your code
class SSLModel(torch_nn.Module)
to extract wav2vec2 feature and get errorTraceback (most recent call last): File "/home/xieyuankun/asvspoof2019_wav2vec2_fairseq/preprocess.py", line 274, in <module> w2vmodel = SSLModel('/home/xieyuankun/data/wav2vec_big_960h.pt', 1024) File "/home/xieyuankun/asvspoof2019_wav2vec2_fairseq/preprocess.py", line 228, in __init__ md, _, _ = fq.checkpoint_utils.load_model_ensemble_and_task([mpath]) File "/home/xieyuankun/miniconda3/lib/python3.9/site-packages/fairseq/checkpoint_utils.py", line 473, in load_model_ensemble_and_task model = task.build_model(cfg.model, from_checkpoint=True) File "/home/xieyuankun/miniconda3/lib/python3.9/site-packages/fairseq/tasks/audio_pretraining.py", line 197, in build_model model = super().build_model(model_cfg, from_checkpoint) File "/home/xieyuankun/miniconda3/lib/python3.9/site-packages/fairseq/tasks/fairseq_task.py", line 338, in build_model model = models.build_model(cfg, self, from_checkpoint) File "/home/xieyuankun/miniconda3/lib/python3.9/site-packages/fairseq/models/__init__.py", line 106, in build_model return model.build_model(cfg, task) File "/home/xieyuankun/miniconda3/lib/python3.9/site-packages/fairseq/models/wav2vec/wav2vec2_asr.py", line 208, in build_model w2v_encoder = Wav2VecEncoder(cfg, len(task.target_dictionary)) TypeError: object of type 'NoneType' has no len()
it seems like a problem from fairseq
fq.checkpoint_utils.load_model_ensemble_and_task([mpath])
which didn't recognize the task:Wav2VecEncoder(cfg, len(task.target_dictionary))
The text was updated successfully, but these errors were encountered: