k2SSL: a Faster and Better Framework for Self-Supervised Speech Representation Learning #1500

yfyeung · 2024-02-18T03:46:03Z

In this PR, we decoupled HuBERT from fairseq, making it independent from the fairseq library while maintaining full equivalence with the original pre-training logic (model architecture, data normalization, masking strategy, loss computation...). We conducted comparisons on the outputs of some layers to ensure this equivalence. Additionally, we support the checkpoints from fairseq (hubert_base_ls960, hubert_large_ll60k, hubert_xtralarge_ll60k).
Then, we optimized the pre-train loss, significantly reducing peak memory usage and even slightly enhancing performance. Unfortunately, this improvement rendered the original HuBERT's half-precision unstable. We adopted ScaledAdam as the optimizer and Eden as the scheduler and replaced the Transformer encoder with the Zipformer encoder. This approach further reduced peak memory usage and enhanced performance, maintaining stability in half-precision.

kobenaxie · 2024-02-20T13:10:11Z

Hi @yfyeung ,

How to get k-means file to train the zipformer based HuBERT pretrain model ?
Can we use fbank as the model input like w2vbert

yfyeung · 2024-02-21T02:48:16Z

How to get k-means file to train the zipformer based HuBERT pretrain model ?

For LibriSpeech, we directly use the k-means labels from hubert_base_ls960.

Can we use fbank as the model input like w2vbert

Yes, you can replace the ConvFeatureExtractionModel with the Conv2dSubsampling.

kafan1986 · 2024-02-25T16:58:15Z

@yfyeung What are approximate increase in WER and training time and inference if this K2SSL is used with say Hubert base?

egs/librispeech/SSL/hubert/ssl_datamodule.py

egs/librispeech/SSL/hubert/model.py

checkpoint convert script

danpovey · 2024-04-04T15:28:41Z

Guys, I just noticed this, it seems like a great contribution.
I'd rather not have these things wait so long... let me merge it now and if we have any changes we want, we can do them later on.

teowenshen · 2024-04-12T01:11:51Z

Hi there @yfyeung , first of all thank you for creating this SSL recipe!

I tried running your zipformer/ codes, but my model diverged at epoch 33 and pretraining ended with a Grad scale is small error.

Throughout pretraining before the divergence, I noticed my grad scale tended to fluctuate between 0.125 and 2.

Did you face the same issues?

EDIT: I was also wondering if you tried toggling the loss reduction to mean instead of sum. Maybe that will stabilise training?

My commands. I adapted the batch size to my setup, maintaining the same accum_grad * max_duration * world_size.

# pretraining
python zipformer/pretrain.py \
    --world-size 4 \
    --use-fp16 1 \
    --num-epochs 50 \
    --manifest-dir data/raw \
    --max-duration 350 \
    --accum-grad 2 \
    --exp-dir zipformer/exp2/pretrain

As per your explanation, I used the same 500 k-means labels from simple_kmeans.

yfyeung · 2024-04-12T01:49:53Z

Hi there @yfyeung , first of all thank you for creating this SSL recipe!

I tried running your zipformer/ codes, but my model diverged at epoch 33 and pretraining ended with a Grad scale is small error.

Throughout pretraining before the divergence, I noticed my grad scale tended to fluctuate between 0.125 and 2.

Did you face the same issues?

EDIT: I was also wondering if you tried toggling the loss reduction to mean instead of sum. Maybe that will stabilise training?

My commands. I adapted the batch size to my setup, maintaining the same accum_grad * max_duration * world_size.
# pretraining
python zipformer/pretrain.py \
    --world-size 4 \
    --use-fp16 1 \
    --num-epochs 50 \
    --manifest-dir data/raw \
    --max-duration 350 \
    --accum-grad 2 \
    --exp-dir zipformer/exp2/pretrain
As per your explanation, I used the same 500 k-means labels from simple_kmeans.

Hi, hope this message finds you well.

My training command is as follows:

./zipformer/pretrain.py \
  --world-size 8 \
  --num-epochs 291 \
  --start-epoch 1 \
  --use-fp16 1 \
  --exp-dir zipformer/exp_pretrain \
  --full-libri 1 \
  --max-duration 600 \
  --accum-grad 1 \
  --do-normalize 0 \
  --mask-prob 0.8 \
  --dropout-input 0.1 \
  --dropout-features 0.1 \
  --feature-grad-mult 0.1 \
  --untie-final-proj 1 \
  --num-encoder-layers 2,2,3,4,3,2 \
  --feedforward-dim 512,768,1024,1536,1024,768 \
  --encoder-dim 192,256,448,768,448,192 \
  --encoder-unmasked-dim 192,192,256,256,256,192 \
  --base-lr 0.045

EDIT: I was also wondering if you tried toggling the loss reduction to mean instead of sum. Maybe that will stabilise training?

Regarding your question about toggling the loss reduction to mean instead of sum to stabilize training: the mean reduction is typically used for multi-GPU simulations to ensure uniform scaling, while sum reduction is preferred for larger batch sizes as it helps stabilize the gradient estimate. It’s not a good way to optimize for both large batch sizes and multi-GPU setups simultaneously.

Fine-tuning command is:

./zipformer/finetune.py \
  --world-size 8 \
  --num-epochs 222 \
  --start-epoch 1 \
  --use-fp16 1 \
  --exp-dir zipformer/exp_finetune \
  --pretrained-dir zipformer/exp_pretrain/epoch-291.pt \
  --full-libri 0 \
  --max-duration 600 \
  --accum-grad 1 \
  --do-normalize 0 \
  --mask-prob 0.65 \
  --mask-channel-prob 0.5 \
  --mask-channel-length 64 \
  --feature-grad-mult 0.0 \
  --num-encoder-layers 2,2,3,4,3,2 \
  --feedforward-dim 512,768,1024,1536,1024,768 \
  --encoder-dim 192,256,448,768,448,192 \
  --encoder-unmasked-dim 192,192,256,256,256,192 \
  --base-lr 0.002

Decoding uses greedy search to identify the top K candidates based on two key parameters: --epoch and --avg:

for ((epoch=100; epoch<=222; epoch+=1)); do
  for ((avg=1; avg<=$epoch-1; avg+=1)); do
    ./zipformer/decode.py \
        --epoch $epoch \
        --avg $avg \
        --exp-dir ./zipformer/exp_finetune \
        --do-normalize 0 \
        --max-duration 1000 \
        --decoding-method greedy_search \
        --num-encoder-layers 2,2,3,4,3,2 \
        --feedforward-dim 512,768,1024,1536,1024,768 \
        --encoder-dim 192,256,448,768,448,192 \
        --encoder-unmasked-dim 192,192,256,256,256,192
  done
done

Then use modified beam search on these top K candidates:

epoch=
avg=
./zipformer/decode.py \
      --epoch $epoch \
      --avg $avg \
      --exp-dir ./zipformer/exp_finetune \
      --do-normalize 0 \
      --max-duration 1000 \
      --decoding-method modified_beam_search \
      --beam-size 8 \
      --num-encoder-layers 2,2,3,4,3,2 \
      --feedforward-dim 512,768,1024,1536,1024,768 \
      --encoder-dim 192,256,448,768,448,192 \
      --encoder-unmasked-dim 192,192,256,256,256,192

teowenshen · 2024-04-12T02:02:47Z

I see! Thanks for the explanation!

Meanwhile, can you share your finetuning and decoding commands as well?

yfyeung · 2024-04-12T02:25:19Z

I see! Thanks for the explanation!

Meanwhile, can you share your finetuning and decoding commands as well?

Sure, I updated my comment. You can perform pruning in the process of searching the decoding space.

danpovey · 2024-04-12T12:24:01Z

@teowenshen is there any chance you can run with from your --start-epoch=33 with the --inf-check=True option, assuming pretrain.py supports these options like train.py; and show us the log? If the options are not there we should add them. I want to see where the inf grad is coming from, maybe we can fix it with more info.

danpovey · 2024-04-12T12:25:04Z

Also, @yfyeung we normally have a README.md and/or RESULTS.md that show typical sequences of training and testing commands, and associated results. Is there any chance of adding those?
Is a link to a paper going to come later?

teowenshen · 2024-04-12T13:10:40Z

I want to see where the inf grad is coming from, maybe we can fix it with more info.

Yes, please find the logs for epoch 33 as attached.

librispeech_SSL_zipformer_pretrain_ep33_infcheck.txt

I couldn't run --print-diagnostics 1 due to this error:

Error getting eigenvalues, trying another method.
Error getting eigenvalues, trying another method.
Error getting eigenvalues, trying another method.
Error getting eigenvalues, trying another method.
/workspace/icefall/icefall/diagnostics.py:255: UserWarning: ComplexHalf support is experimental and many operators don't support it yet. (Triggered internally at /opt/conda/conda-bld/pytorch_1695392067780/work/aten/src/ATen/EmptyTensor.cpp:31.)
  eigs, _ = torch.linalg.eig(stats)
/workspace/icefall/icefall/diagnostics.py:255: UserWarning: ComplexHalf support is experimental and many operators don't support it yet. (Triggered internally at /opt/conda/conda-bld/pytorch_1695392067780/work/aten/src/ATen/EmptyTensor.cpp:31.)
  eigs, _ = torch.linalg.eig(stats)
/workspace/icefall/icefall/diagnostics.py:255: UserWarning: ComplexHalf support is experimental and many operators don't support it yet. (Triggered internally at /opt/conda/conda-bld/pytorch_1695392067780/work/aten/src/ATen/EmptyTensor.cpp:31.)
  eigs, _ = torch.linalg.eig(stats)
/workspace/icefall/icefall/diagnostics.py:255: UserWarning: ComplexHalf support is experimental and many operators don't support it yet. (Triggered internally at /opt/conda/conda-bld/pytorch_1695392067780/work/aten/src/ATen/EmptyTensor.cpp:31.)
  eigs, _ = torch.linalg.eig(stats)
Traceback (most recent call last):
  File "/mnt/host/icefall-k2ssl/egs/librispeech/SSL/zipformer/pretrain.py", line 1380, in <module>
    main()
  File "/mnt/host/icefall-k2ssl/egs/librispeech/SSL/zipformer/pretrain.py", line 1371, in main
    mp.spawn(run, args=(world_size, args), nprocs=world_size, join=True)
  File "/opt/conda/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 246, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method="spawn")
  File "/opt/conda/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 202, in start_processes
    while not context.join():
  File "/opt/conda/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 163, in join
    raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException: 

-- Process 0 terminated with the following error:
Traceback (most recent call last):
  File "/workspace/icefall/icefall/diagnostics.py", line 248, in print_diagnostics
    eigs, _ = torch.linalg.eigh(stats)
RuntimeError: "linalg_eigh_cuda" not implemented for 'Half'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 74, in _wrap
    fn(i, *args)
  File "/mnt/host/icefall-k2ssl/egs/librispeech/SSL/zipformer/pretrain.py", line 1276, in run
    diagnostic.print_diagnostics()
  File "/workspace/icefall/icefall/diagnostics.py", line 517, in print_diagnostics
    self.diagnostics[k].print_diagnostics()
  File "/workspace/icefall/icefall/diagnostics.py", line 255, in print_diagnostics
    eigs, _ = torch.linalg.eig(stats)
RuntimeError: torch.linalg.eig: input tensor should not contain infs or NaNs.

danpovey · 2024-04-13T07:03:21Z

for diaagnostics need to disable fp16 and halve batch size.

…

On Friday, April 12, 2024, Teo Wen Shen ***@***.***> wrote: I want to see where the inf grad is coming from, maybe we can fix it with more info. Yes, please find the logs for epoch 33 as attached. librispeech_SSL_zipformer_pretrain_ep33_infcheck.txt <https://github.com/k2-fsa/icefall/files/14959129/librispeech_SSL_zipformer_pretrain_ep33_infcheck.txt> I couldn't run --print-diagnostics 1 due to this error: Error getting eigenvalues, trying another method. Error getting eigenvalues, trying another method. Error getting eigenvalues, trying another method. Error getting eigenvalues, trying another method. /workspace/icefall/icefall/diagnostics.py:255: UserWarning: ComplexHalf support is experimental and many operators don't support it yet. (Triggered internally at /opt/conda/conda-bld/pytorch_1695392067780/work/aten/src/ATen/EmptyTensor.cpp:31.) eigs, _ = torch.linalg.eig(stats) /workspace/icefall/icefall/diagnostics.py:255: UserWarning: ComplexHalf support is experimental and many operators don't support it yet. (Triggered internally at /opt/conda/conda-bld/pytorch_1695392067780/work/aten/src/ATen/EmptyTensor.cpp:31.) eigs, _ = torch.linalg.eig(stats) /workspace/icefall/icefall/diagnostics.py:255: UserWarning: ComplexHalf support is experimental and many operators don't support it yet. (Triggered internally at /opt/conda/conda-bld/pytorch_1695392067780/work/aten/src/ATen/EmptyTensor.cpp:31.) eigs, _ = torch.linalg.eig(stats) /workspace/icefall/icefall/diagnostics.py:255: UserWarning: ComplexHalf support is experimental and many operators don't support it yet. (Triggered internally at /opt/conda/conda-bld/pytorch_1695392067780/work/aten/src/ATen/EmptyTensor.cpp:31.) eigs, _ = torch.linalg.eig(stats) Traceback (most recent call last): File "/mnt/host/icefall-k2ssl/egs/librispeech/SSL/zipformer/pretrain.py", line 1380, in <module> main() File "/mnt/host/icefall-k2ssl/egs/librispeech/SSL/zipformer/pretrain.py", line 1371, in main mp.spawn(run, args=(world_size, args), nprocs=world_size, join=True) File "/opt/conda/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 246, in spawn return start_processes(fn, args, nprocs, join, daemon, start_method="spawn") File "/opt/conda/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 202, in start_processes while not context.join(): File "/opt/conda/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 163, in join raise ProcessRaisedException(msg, error_index, failed_process.pid) torch.multiprocessing.spawn.ProcessRaisedException: -- Process 0 terminated with the following error: Traceback (most recent call last): File "/workspace/icefall/icefall/diagnostics.py", line 248, in print_diagnostics eigs, _ = torch.linalg.eigh(stats) RuntimeError: "linalg_eigh_cuda" not implemented for 'Half' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/opt/conda/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 74, in _wrap fn(i, *args) File "/mnt/host/icefall-k2ssl/egs/librispeech/SSL/zipformer/pretrain.py", line 1276, in run diagnostic.print_diagnostics() File "/workspace/icefall/icefall/diagnostics.py", line 517, in print_diagnostics self.diagnostics[k].print_diagnostics() File "/workspace/icefall/icefall/diagnostics.py", line 255, in print_diagnostics eigs, _ = torch.linalg.eig(stats) RuntimeError: torch.linalg.eig: input tensor should not contain infs or NaNs. — Reply to this email directly, view it on GitHub <#1500 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZFLO3CZWT65GEBXVRV5KDY47MONAVCNFSM6AAAAABDN2HKACVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANJRG4ZTINJWGU> . You are receiving this because you modified the open/close state.Message ID: ***@***.***>

danpovey · 2024-04-13T08:26:02Z

The error was unusual, it was an infinity in the forward-pass. This is because you used the wav2vec2 frontend and it doesn't have any balancers or similar code to stop large values appearing. ScaledAdam can make large values appear faster than Adam would, although even with Adam they'll appear eventually unless steps are taken to stop it.

   x = conv(x)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1568, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/container.py", line 215, in forward
    input = module(input)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1581, in _call_impl
    hook_result = hook(self, args, result)
  File "/workspace/icefall/icefall/hooks.py", line 41, in forward_hook
    raise ValueError(
ValueError: The sum of module.feature_extractor.conv_layers.2.0.output is not finite: tensor([[[  -6.5234,   -6.5078,   -6.6094,  ...,   -6.5820,   -6.5469,
            -6.5469],
         [  -0.7900,   -0.6479,   -0.5444,  ...,   -0.9287,   -0.9971,
            -0.9380],
         [  -7.3672,   -8.1250,   -8.5938,  ...,   -7.8672,   -7.9023,
            -7.8047],

Anyway, this PR
#1593
should fix the issue without causing any model incompatibility. I haven't tested it though.

yfyeung · 2024-04-13T09:13:58Z

Also, @yfyeung we normally have a README.md and/or RESULTS.md that show typical sequences of training and testing commands, and associated results. Is there any chance of adding those?
Is a link to a paper going to come later?

Sure, I will add those after the anonymity period ends, including the model checkpoint/tensorboard/pre-training logs/fine-tuning logs/decoding logs, and RESULTS.md. And if things go well, also a link to the paper.

yifanyeung added 4 commits February 18, 2024 11:44

Add k2SSL

a2bf39a

fix flake8

b070d04

fix for black

809bdb0

fix for black

c0a5601

yfyeung requested a review from csukuangfj February 18, 2024 05:19

fix for black

911bfac

yfyeung mentioned this pull request Feb 18, 2024

[WIP] Add a SSL recipe on LibriSpeech dataset #1425

Closed

4 tasks

Merge branch 'k2-fsa:master' into k2ssl

f2f102d

csukuangfj requested a review from yaozengwei February 26, 2024 13:07

yfyeung and others added 3 commits February 27, 2024 10:48

Merge branch 'k2-fsa:master' into k2ssl

bb266b7

Update ssl_datamodule.py

8515d92

Fix bugs in HubertDataset

99044e1

marcoyang1998 reviewed Mar 1, 2024

View reviewed changes

egs/librispeech/SSL/hubert/ssl_datamodule.py Outdated Show resolved Hide resolved

egs/librispeech/SSL/hubert/model.py Outdated Show resolved Hide resolved

yifanyeung and others added 9 commits March 1, 2024 20:14

update comments

bed950d

Merge branch 'k2-fsa:master' into k2ssl

660f647

add librilight

9321f8a

Merge branch 'k2-fsa:master' into k2ssl

ea0b631

add checkpoint convert script

952abee

format

ac73f60

Merge pull request #2 from zzasdf/k2ssl-util

482c24e

checkpoint convert script

Merge branch 'k2-fsa:master' into k2ssl

898b0ce

Merge branch 'k2-fsa:master' into k2ssl

dfbacbe

danpovey merged commit 87843e9 into k2-fsa:master Apr 4, 2024
143 checks passed

yfyeung deleted the k2ssl branch April 5, 2024 04:42

yfyeung restored the k2ssl branch April 5, 2024 04:43

JinZr mentioned this pull request Apr 11, 2024

Any plan to support SSL training with zipformer encoder ? #1397

Closed

teowenshen mentioned this pull request Apr 22, 2024

Prevent large values in conv module in wav2vec2_module.py in SSL recipe #1593

Open

yfyeung deleted the k2ssl branch April 26, 2024 11:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

k2SSL: a Faster and Better Framework for Self-Supervised Speech Representation Learning #1500

k2SSL: a Faster and Better Framework for Self-Supervised Speech Representation Learning #1500

yfyeung commented Feb 18, 2024 •

edited

kobenaxie commented Feb 20, 2024

yfyeung commented Feb 21, 2024 •

edited

kafan1986 commented Feb 25, 2024

danpovey commented Apr 4, 2024

teowenshen commented Apr 12, 2024 •

edited

yfyeung commented Apr 12, 2024 •

edited

teowenshen commented Apr 12, 2024

yfyeung commented Apr 12, 2024 •

edited

danpovey commented Apr 12, 2024

danpovey commented Apr 12, 2024

teowenshen commented Apr 12, 2024

danpovey commented Apr 13, 2024 via email

danpovey commented Apr 13, 2024

yfyeung commented Apr 13, 2024 •

edited

k2SSL: a Faster and Better Framework for Self-Supervised Speech Representation Learning #1500

k2SSL: a Faster and Better Framework for Self-Supervised Speech Representation Learning #1500

Conversation

yfyeung commented Feb 18, 2024 • edited

kobenaxie commented Feb 20, 2024

yfyeung commented Feb 21, 2024 • edited

kafan1986 commented Feb 25, 2024

danpovey commented Apr 4, 2024

teowenshen commented Apr 12, 2024 • edited

yfyeung commented Apr 12, 2024 • edited

teowenshen commented Apr 12, 2024

yfyeung commented Apr 12, 2024 • edited

danpovey commented Apr 12, 2024

danpovey commented Apr 12, 2024

teowenshen commented Apr 12, 2024

danpovey commented Apr 13, 2024 via email

danpovey commented Apr 13, 2024

yfyeung commented Apr 13, 2024 • edited

yfyeung commented Feb 18, 2024 •

edited

yfyeung commented Feb 21, 2024 •

edited

teowenshen commented Apr 12, 2024 •

edited

yfyeung commented Apr 12, 2024 •

edited

yfyeung commented Apr 12, 2024 •

edited

yfyeung commented Apr 13, 2024 •

edited