Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Lhotse/K2 example #45

Open
wants to merge 119 commits into
base: main
Choose a base branch
from
Open

[WIP] Lhotse/K2 example #45

wants to merge 119 commits into from

Commits on Dec 24, 2020

  1. initial commit

    freewym committed Dec 24, 2020
    Configuration menu
    Copy the full SHA
    5d1001d View commit details
    Browse the repository at this point in the history
  2. asr models related

    freewym committed Dec 24, 2020
    Configuration menu
    Copy the full SHA
    dbb20ad View commit details
    Browse the repository at this point in the history
  3. decoding related

    freewym committed Dec 24, 2020
    Configuration menu
    Copy the full SHA
    d6033af View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    eadf2a8 View commit details
    Browse the repository at this point in the history
  5. wsj recipe and other fixes

    freewym committed Dec 24, 2020
    Configuration menu
    Copy the full SHA
    6148a8a View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    ee70584 View commit details
    Browse the repository at this point in the history
  7. fix

    freewym committed Dec 24, 2020
    Configuration menu
    Copy the full SHA
    93c1de4 View commit details
    Browse the repository at this point in the history
  8. validation on wer

    freewym committed Dec 24, 2020
    Configuration menu
    Copy the full SHA
    e864565 View commit details
    Browse the repository at this point in the history
  9. environment configurations

    freewym committed Dec 24, 2020
    Configuration menu
    Copy the full SHA
    8782e5d View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    9363a71 View commit details
    Browse the repository at this point in the history
  11. Configuration menu
    Copy the full SHA
    0802783 View commit details
    Browse the repository at this point in the history
  12. Configuration menu
    Copy the full SHA
    d276c82 View commit details
    Browse the repository at this point in the history
  13. Configuration menu
    Copy the full SHA
    c1a54da View commit details
    Browse the repository at this point in the history
  14. Configuration menu
    Copy the full SHA
    47dc107 View commit details
    Browse the repository at this point in the history
  15. Configuration menu
    Copy the full SHA
    eb1d55f View commit details
    Browse the repository at this point in the history
  16. lm training

    freewym committed Dec 24, 2020
    Configuration menu
    Copy the full SHA
    f0930b1 View commit details
    Browse the repository at this point in the history
  17. Configuration menu
    Copy the full SHA
    0e2926b View commit details
    Browse the repository at this point in the history
  18. word lm related; add unigram/temporal label smoothing and update the wsj

    recipe accordingly; code adaptation/changes according to the commits
    from Apr 12, 2019 to May 21, 2019
    freewym committed Dec 24, 2020
    Configuration menu
    Copy the full SHA
    9e479ca View commit details
    Browse the repository at this point in the history
  19. Configuration menu
    Copy the full SHA
    cd8803d View commit details
    Browse the repository at this point in the history
  20. Configuration menu
    Copy the full SHA
    82a1175 View commit details
    Browse the repository at this point in the history
  21. Configuration menu
    Copy the full SHA
    3ea6bb9 View commit details
    Browse the repository at this point in the history
  22. Configuration menu
    Copy the full SHA
    36baed5 View commit details
    Browse the repository at this point in the history
  23. Configuration menu
    Copy the full SHA
    f4bcc7f View commit details
    Browse the repository at this point in the history
  24. LM arch changes

    freewym committed Dec 24, 2020
    Configuration menu
    Copy the full SHA
    9eb1020 View commit details
    Browse the repository at this point in the history
  25. swbd

    Hang Lyu authored and freewym committed Dec 24, 2020
    Configuration menu
    Copy the full SHA
    9745aa7 View commit details
    Browse the repository at this point in the history
  26. add coverage to libripspeech recipe; make best metric for ASR configu…

    …rable by using --best-checkpoint-metric; code adaptation/changes according to the commits from Jun 26, 2019 to Jul 3, 2019
    freewym committed Dec 24, 2020
    Configuration menu
    Copy the full SHA
    7ae0222 View commit details
    Browse the repository at this point in the history
  27. fix swbd recipe; code adaptation/changes according to the commits fro…

    …m Jul 17, 2019 to Jul 19, 2019
    freewym committed Dec 24, 2020
    Configuration menu
    Copy the full SHA
    6ab3bd5 View commit details
    Browse the repository at this point in the history
  28. Configuration menu
    Copy the full SHA
    236c927 View commit details
    Browse the repository at this point in the history
  29. improve swbd recipe; comtinue training while lr is no less than --min…

    …-lr; code adaptation/changes according to the commits from Jul 24, 2019 to Aug 1, 2019
    freewym committed Dec 24, 2020
    Configuration menu
    Copy the full SHA
    3b38acd View commit details
    Browse the repository at this point in the history
  30. Configuration menu
    Copy the full SHA
    833ad03 View commit details
    Browse the repository at this point in the history
  31. add --eos-factor for beam search to alleviate the problem of too shor…

    …t transcripts with LM fusion
    freewym committed Dec 24, 2020
    Configuration menu
    Copy the full SHA
    2777ac7 View commit details
    Browse the repository at this point in the history
  32. tokenize each sentence such that it ends with <space>; modify look-ah…

    …ead LM accordingly; modify the WSJ recipe accordingly
    freewym committed Dec 24, 2020
    Configuration menu
    Copy the full SHA
    5cf1c41 View commit details
    Browse the repository at this point in the history
  33. switch to pip install sentencepiece; modify Librispeech/SWBD recipes …

    …accordingly; code adaptation/changes according to the commits on Aug 21 and 30, 2019
    freewym committed Dec 24, 2020
    Configuration menu
    Copy the full SHA
    b2d3ef2 View commit details
    Browse the repository at this point in the history
  34. update tensorized tree implementation

    ctongfei authored and freewym committed Dec 24, 2020
    Configuration menu
    Copy the full SHA
    1c19fe5 View commit details
    Browse the repository at this point in the history
  35. Configuration menu
    Copy the full SHA
    f21c428 View commit details
    Browse the repository at this point in the history
  36. Update README.md; add logo; slightly change LM weight and beam size for

    Librispeech; code adaptation/changes according to the commits on Sep 17, 2019
    ctongfei authored and freewym committed Dec 24, 2020
    Configuration menu
    Copy the full SHA
    bac7415 View commit details
    Browse the repository at this point in the history
  37. compansate for the removal of torch.rand() from distributed_init() re…

    …cently introduced in fairseq, to make ASR results reproducible
    freewym committed Dec 24, 2020
    Configuration menu
    Copy the full SHA
    91d9fc7 View commit details
    Browse the repository at this point in the history
  38. Configuration menu
    Copy the full SHA
    6a8f4d3 View commit details
    Browse the repository at this point in the history
  39. a better LM for Librispeech yielding better WERs; code

    adaptation/changes according to the commits on Sep 27, 2019
    freewym committed Dec 24, 2020
    Configuration menu
    Copy the full SHA
    32825b5 View commit details
    Browse the repository at this point in the history
  40. Configuration menu
    Copy the full SHA
    c63caa8 View commit details
    Browse the repository at this point in the history
  41. set --distributed-port=-1 if ngpus=1; code adaptation/changes accordi…

    …ng to the commits on Oct 11, 2019
    freewym committed Dec 24, 2020
    Configuration menu
    Copy the full SHA
    0e610d5 View commit details
    Browse the repository at this point in the history
  42. change warmup scheduling for ReduceLROnPlateauV2; code adaptation/cha…

    …nges according to the commits on Oct 18, 2019
    freewym committed Dec 24, 2020
    Configuration menu
    Copy the full SHA
    67fdff1 View commit details
    Browse the repository at this point in the history
  43. remove warmup code in ReduceLROnPlateauV2 as fariseq just added it; c…

    …ode adaptation/changes according to the commits on Oct 23, 2019
    freewym committed Dec 24, 2020
    Configuration menu
    Copy the full SHA
    aab85e5 View commit details
    Browse the repository at this point in the history
  44. add gpu.conf for SGE qsub

    freewym committed Dec 24, 2020
    Configuration menu
    Copy the full SHA
    3018d20 View commit details
    Browse the repository at this point in the history
  45. Fixed error when using fp16

    Fixed error when using fp16. Followed example from https://github.com/pytorch/fairseq/blob/master/fairseq/models/lstm.py#L299
    Shujian2015 authored and freewym committed Dec 24, 2020
    Configuration menu
    Copy the full SHA
    a1b76df View commit details
    Browse the repository at this point in the history
  46. Configuration menu
    Copy the full SHA
    75581ae View commit details
    Browse the repository at this point in the history
  47. Configuration menu
    Copy the full SHA
    0e91c3b View commit details
    Browse the repository at this point in the history
  48. Configuration menu
    Copy the full SHA
    7f3a0bc View commit details
    Browse the repository at this point in the history
  49. Configuration menu
    Copy the full SHA
    5744fb9 View commit details
    Browse the repository at this point in the history
  50. Configuration menu
    Copy the full SHA
    4617036 View commit details
    Browse the repository at this point in the history
  51. remove coverage term for beam search decoding as it has been superced…

    …ed by eos thresholding
    freewym committed Dec 24, 2020
    Configuration menu
    Copy the full SHA
    9dd8992 View commit details
    Browse the repository at this point in the history
  52. fix bugs causing build failure; a bunch of lint changes; rename

    TokenDictionary->AsrDictionary, TokenTextDataset->AsrTextDataset
    freewym committed Dec 24, 2020
    Configuration menu
    Copy the full SHA
    69dbd15 View commit details
    Browse the repository at this point in the history
  53. scheduled sampling rate scheduler

    ctongfei authored and freewym committed Dec 24, 2020
    Configuration menu
    Copy the full SHA
    732ab08 View commit details
    Browse the repository at this point in the history
  54. decouple scheduled sampling rate scheduler; rename all appearances of…

    … "dict" variables to "dictionary" as "dict" is a reserved keyword in Python; affect swbd results due to some numerical issue of PyTorch
    freewym committed Dec 24, 2020
    Configuration menu
    Copy the full SHA
    ad0157e View commit details
    Browse the repository at this point in the history
  55. Configuration menu
    Copy the full SHA
    13e2d62 View commit details
    Browse the repository at this point in the history
  56. Configuration menu
    Copy the full SHA
    390d157 View commit details
    Browse the repository at this point in the history
  57. isolate greedy search code from criterions (#19)

    * isolate greedy search code from criterions
    freewym committed Dec 24, 2020
    Configuration menu
    Copy the full SHA
    3c5b3c9 View commit details
    Browse the repository at this point in the history
  58. remove the need to pass tgt dataset to criterions by adding a raw tex…

    …t field to collated samples
    
    (#20)
    freewym committed Dec 24, 2020
    Configuration menu
    Copy the full SHA
    54870cb View commit details
    Browse the repository at this point in the history
  59. Configuration menu
    Copy the full SHA
    3677002 View commit details
    Browse the repository at this point in the history
  60. Configuration menu
    Copy the full SHA
    f1bed6f View commit details
    Browse the repository at this point in the history
  61. code adaptation/changes according to the commits from Jan 16 to Jan 1…

    …7, 2020; move decode log files to decode result dirs
    freewym committed Dec 24, 2020
    Configuration menu
    Copy the full SHA
    458828d View commit details
    Browse the repository at this point in the history
  62. move WER validation with greedy decoding code from criterion to task;…

    … valid loss is now based on teacher enforcing instead of greedy decoding; rename {,label_smoothed_}cross_entropy_with_wer.py to {,label_smoothed_}cross_entropy_v2.py
    freewym committed Dec 24, 2020
    Configuration menu
    Copy the full SHA
    293a068 View commit details
    Browse the repository at this point in the history
  63. code adaptation/changes according to the commits on Jan 20, 2020; cos…

    …metic changes for lookahead LM
    freewym committed Dec 24, 2020
    Configuration menu
    Copy the full SHA
    80f564d View commit details
    Browse the repository at this point in the history
  64. isolate LSTMLanguageModel from speech_lstm.py and rename it to LSTMLa…

    …nguageModelEspresso to avoid naming conflicts with the fairseq\'s LSTMLanguageModel introduced recently
    freewym committed Dec 24, 2020
    Configuration menu
    Copy the full SHA
    d488b92 View commit details
    Browse the repository at this point in the history
  65. Configuration menu
    Copy the full SHA
    ad102bd View commit details
    Browse the repository at this point in the history
  66. Configuration menu
    Copy the full SHA
    19ec973 View commit details
    Browse the repository at this point in the history
  67. add options to accept utt2num_frames files to speed up the data loadi…

    …ng (#22)
    
    * add options to accept utt2num_frames files to speed up the data loading
    freewym committed Dec 24, 2020
    Configuration menu
    Copy the full SHA
    9c69029 View commit details
    Browse the repository at this point in the history
  68. Configuration menu
    Copy the full SHA
    698305f View commit details
    Browse the repository at this point in the history
  69. Configuration menu
    Copy the full SHA
    ea7732b View commit details
    Browse the repository at this point in the history
  70. move duplicated network parsers to espresso/speech_tools/utils.py; re…

    …move remaining coverage option passed in wsj recipe
    freewym committed Dec 24, 2020
    Configuration menu
    Copy the full SHA
    7170edc View commit details
    Browse the repository at this point in the history
  71. Configuration menu
    Copy the full SHA
    95fcd90 View commit details
    Browse the repository at this point in the history
  72. Configuration menu
    Copy the full SHA
    41c5725 View commit details
    Browse the repository at this point in the history
  73. SpecAugment (#21)

    freewym committed Dec 24, 2020
    Configuration menu
    Copy the full SHA
    4b8d3be View commit details
    Browse the repository at this point in the history
  74. code adaptation/changes according to the commits on Mar 11, 2020; change

    logs/->log/; rename SpeechDataset->AsrDataset,
      Scp*Dataset->FeatScp*Dataset, score*.sh->score*_e2e.sh; remove validation on train subset from wsj recipe
    freewym committed Dec 24, 2020
    Configuration menu
    Copy the full SHA
    11e106f View commit details
    Browse the repository at this point in the history
  75. fix specaug indexing

    freewym committed Dec 24, 2020
    Configuration menu
    Copy the full SHA
    c05a336 View commit details
    Browse the repository at this point in the history
  76. code adaptation/changes according to the commits on Mar 24-Apr 3, 202…

    …0; use data.encoders.{bpe,tokenizer} for wordpiece decode
    freewym committed Dec 24, 2020
    Configuration menu
    Copy the full SHA
    8772f69 View commit details
    Browse the repository at this point in the history
  77. Configuration menu
    Copy the full SHA
    e40b033 View commit details
    Browse the repository at this point in the history
  78. update the qsub script for gpu jobs; code adaptation/changes accordin…

    …g to the commits on Apr 16
    freewym committed Dec 24, 2020
    Configuration menu
    Copy the full SHA
    b9c19c2 View commit details
    Browse the repository at this point in the history
  79. use EncoderOut for SpeechLSTMEncoder's output; code adaptation/change…

    …s according to the commits on Apr 21
    freewym committed Dec 24, 2020
    Configuration menu
    Copy the full SHA
    b35d210 View commit details
    Browse the repository at this point in the history
  80. Configuration menu
    Copy the full SHA
    cfa899c View commit details
    Browse the repository at this point in the history
  81. Configuration menu
    Copy the full SHA
    be4fd6a View commit details
    Browse the repository at this point in the history
  82. Configuration menu
    Copy the full SHA
    967dd34 View commit details
    Browse the repository at this point in the history
  83. Configuration menu
    Copy the full SHA
    af5a564 View commit details
    Browse the repository at this point in the history
  84. Configuration menu
    Copy the full SHA
    9d3b8b3 View commit details
    Browse the repository at this point in the history
  85. Configuration menu
    Copy the full SHA
    c8c1dfb View commit details
    Browse the repository at this point in the history
  86. code adaptation/changes according to the commits on Jun 24-25, 2020; fix

    validation loss in LSTM models
    freewym committed Dec 24, 2020
    Configuration menu
    Copy the full SHA
    027595b View commit details
    Browse the repository at this point in the history
  87. Update Transformer models (#31)

    * update transformer
    
    * initial recipe
    
    * fix transformer
    
    * add encoder positional embeddings
    
    * add more recipes
    freewym committed Dec 24, 2020
    Configuration menu
    Copy the full SHA
    e42ec43 View commit details
    Browse the repository at this point in the history
  88. ignore flake8's FileNotFoundError for soft links to kaldi files; code…

    … adaptation/changes according to the commits on Jul 8, 2020
    freewym committed Dec 24, 2020
    Configuration menu
    Copy the full SHA
    8600f76 View commit details
    Browse the repository at this point in the history
  89. Configuration menu
    Copy the full SHA
    bc6901b View commit details
    Browse the repository at this point in the history
  90. Configuration menu
    Copy the full SHA
    7c21d3d View commit details
    Browse the repository at this point in the history
  91. Configuration menu
    Copy the full SHA
    4b253b1 View commit details
    Browse the repository at this point in the history
  92. fix reorder_encoder_out in SpeechChunkTransformerEncoder; code adapta…

    …tion/changes according to the commits on Jul 28, 2020
    freewym committed Dec 24, 2020
    Configuration menu
    Copy the full SHA
    d2f264b View commit details
    Browse the repository at this point in the history
  93. reorder the elements of the returned tuple of TdnnModel.forward();

    export KALDI_ROOT to adapt to the recent changes in kaldi_io; code adaptation/changes according to the commits on Aug 3-4, 2020
    freewym committed Dec 24, 2020
    Configuration menu
    Copy the full SHA
    a92409c View commit details
    Browse the repository at this point in the history
  94. updates for new PyChain (#37)

    * add support for output l2 regularization and xent regularization; add a bichar WSJ recipe; add missing soft links to kaldi files
    
    * move ChainLossFunction here from PyChain
    freewym committed Dec 24, 2020
    Configuration menu
    Copy the full SHA
    c73c4de View commit details
    Browse the repository at this point in the history
  95. Configuration menu
    Copy the full SHA
    afaef92 View commit details
    Browse the repository at this point in the history
  96. Configuration menu
    Copy the full SHA
    5351152 View commit details
    Browse the repository at this point in the history
  97. Configuration menu
    Copy the full SHA
    9d71078 View commit details
    Browse the repository at this point in the history
  98. Configuration menu
    Copy the full SHA
    e34b27d View commit details
    Browse the repository at this point in the history
  99. Configuration menu
    Copy the full SHA
    3fe02ae View commit details
    Browse the repository at this point in the history
  100. Configuration menu
    Copy the full SHA
    8c02b45 View commit details
    Browse the repository at this point in the history
  101. Configuration menu
    Copy the full SHA
    707ce3a View commit details
    Browse the repository at this point in the history
  102. code adaptation/changes according to the commits on Oct 18-Nov 3, 202…

    …0 (lots of changes, mostly for adapting to hydra configs and code formatting)
    freewym committed Dec 24, 2020
    Configuration menu
    Copy the full SHA
    262c0a2 View commit details
    Browse the repository at this point in the history
  103. Configuration menu
    Copy the full SHA
    513b171 View commit details
    Browse the repository at this point in the history
  104. code adaptation/changes according to the commits on Nov 11, 2020; obtain

    feat_dim in setup_task() instead
    freewym committed Dec 24, 2020
    Configuration menu
    Copy the full SHA
    6c6ee41 View commit details
    Browse the repository at this point in the history
  105. Configuration menu
    Copy the full SHA
    2b68caf View commit details
    Browse the repository at this point in the history
  106. code adaptation/changes according to the commits on Nov 16-20, 2020; …

    …fix a
    
    bug in Multi-level LM when getting cached states
    freewym committed Dec 24, 2020
    Configuration menu
    Copy the full SHA
    72fa596 View commit details
    Browse the repository at this point in the history
  107. fix length tensor device issue in lf_mmi loss; code adaptation/change…

    …s according to the commits on Dec 3-12, 2020
    freewym committed Dec 24, 2020
    Configuration menu
    Copy the full SHA
    6b4e571 View commit details
    Browse the repository at this point in the history
  108. Configuration menu
    Copy the full SHA
    1f812eb View commit details
    Browse the repository at this point in the history

Commits on Dec 26, 2020

  1. Lhotse/K2 support

    freewym committed Dec 26, 2020
    Configuration menu
    Copy the full SHA
    b7b8937 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    8926fa3 View commit details
    Browse the repository at this point in the history
  3. add random split of negatives

    freewym committed Dec 26, 2020
    Configuration menu
    Copy the full SHA
    083ea69 View commit details
    Browse the repository at this point in the history
  4. misc fixes

    freewym committed Dec 26, 2020
    Configuration menu
    Copy the full SHA
    ecf8423 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    3713dc1 View commit details
    Browse the repository at this point in the history
  6. fixes

    freewym committed Dec 26, 2020
    Configuration menu
    Copy the full SHA
    4fb0d87 View commit details
    Browse the repository at this point in the history
  7. f

    freewym committed Dec 26, 2020
    Configuration menu
    Copy the full SHA
    dc3874a View commit details
    Browse the repository at this point in the history
  8. decoding related

    freewym committed Dec 26, 2020
    Configuration menu
    Copy the full SHA
    63ceaf2 View commit details
    Browse the repository at this point in the history
  9. some changes

    freewym committed Dec 26, 2020
    Configuration menu
    Copy the full SHA
    66d84af View commit details
    Browse the repository at this point in the history

Commits on Dec 28, 2020

  1. fix negative loss

    freewym committed Dec 28, 2020
    Configuration menu
    Copy the full SHA
    8a345cb View commit details
    Browse the repository at this point in the history
  2. refactor code

    freewym committed Dec 28, 2020
    Configuration menu
    Copy the full SHA
    1b05966 View commit details
    Browse the repository at this point in the history