Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standalone Transducer v1.1 #5140

Merged
merged 61 commits into from
Jun 20, 2023
Merged

Standalone Transducer v1.1 #5140

merged 61 commits into from
Jun 20, 2023

Conversation

b-flo
Copy link
Member

@b-flo b-flo commented Apr 25, 2023

This PR is an update for the standalone Transducer version. It stitches several things I'm using as a baseline for new features (that won't be included here). Mainly:

  • General: Fix various docstrings, documentation, variable names, etc + add LayerDrop.
  • Streaming: Rework audio processing + some minor fixes for performance.
  • Encoder: Add E-Branchformer (offline/streaming).
  • Decoder: Add MEGA (w/ chunking) + fix decoder interface.

Some results are given for Libri-100 in another PR.

P.S: It's missing unit and integration tests for MEGA, I'll add it later this week.

@b-flo
Copy link
Member Author

b-flo commented Jun 1, 2023

@sw005320 Just to confirm, Is it required to use this single PR to update multiple files? I mean, easily, it can be split into 3/4 PRs. Makes a little difficult to check which function may fail. If there is no problem, I will try focusing on MEGA. But it may be ok.

It's not, sorry about that. I worked off the list on several things based on different branches and it's was becoming a mess to handle because of the dependencies. I prefer to stitch what's needed for the baseline, I can segment from here though.

Note that you for the new additions, it should be self-contained (i.e.: you only need to focus on specific files):

  • MEGA: decoder/mega_decoder.py, decoder/blocks/mega.py and decoder/modules/mega/*.py
  • RWKV: decoder/rwkv_decoder.py, decoder/blocks/rwkv.py and decoder/modules/rwkv/*.py
  • Ebranchformer: encoder/blocks/ebranchformer.py and encoder/modules/convolution.py

Again, sorry about that mess.

@Fhrozen
Copy link
Member

Fhrozen commented Jun 5, 2023

LGTM, fix test errors and ready to merge.

@b-flo
Copy link
Member Author

b-flo commented Jun 5, 2023

I added the the unit tests for RWKV and fixed some other ones. Integration tests are still missing but we can do that later (I have to redesign them, it's a mess).

Btw, I'm not sure what's the cause but I have the following error for a test on my side:

test/espnet2/bin/test_asr_transducer_inference.py:53: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
espnet2/tasks/abs_task.py:1054: in main
    cls.main_worker(args)
espnet2/tasks/abs_task.py:1147: in main_worker
    set_all_random_seed(args.seed)
espnet2/torch_utils/set_all_random_seed.py:10: in set_all_random_seed
    torch.random.manual_seed(seed)
tools/venv/lib/python3.9/site-packages/torch/random.py:40: in manual_seed
    torch.cuda.manual_seed_all(seed)
tools/venv/lib/python3.9/site-packages/torch/cuda/random.py:113: in manual_seed_all
    _lazy_call(cb, seed_all=True)
tools/venv/lib/python3.9/site-packages/torch/cuda/__init__.py:153: in _lazy_call
    callable()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

    def cb():
        for i in range(device_count()):
            default_generator = torch.cuda.default_generators[i]
>           default_generator.manual_seed(seed)
E           RuntimeError: CUDA error: an illegal memory access was encountered

It wasn't the case before and I'm wondering if it's due to some master changes or if my env is borked somehow?
I traced the error but I honestly can't figure out the main issue.

@b-flo
Copy link
Member Author

b-flo commented Jun 5, 2023

Hum, I didn't think about this one because I thought ninja was already a dependency (https://ninja-build.org).
@sw005320 Can the package be added to CI runs? I don't want to add an installation at an higher level, users can install the package in their env if needed. It's only used to load WKV kernel module during training.

Edit: I guess I can safely add ninja dependency to the warp-transducer installation. However, the cpp extension can only be loaded if a CUDA is available. How do you think I should handle that in regard to CI/tests? Should I filter them out or is there a GPU/CUDA executor available for some builds? @kamo-naoyuki Do you have an opinion/idea?

Edit2: For now, let's skip RWKV unit tests when no GPU is available.

@mergify mergify bot added the Installation label Jun 6, 2023
@pyf98
Copy link
Collaborator

pyf98 commented Jun 7, 2023

The E-Branchformer part looks good to me.

@mergify mergify bot added the README label Jun 7, 2023
@sw005320
Copy link
Contributor

sw005320 commented Jun 8, 2023

@b-flo,

Edit2: For now, let's skip RWKV unit tests when no GPU is available.

Sounds good to me.
The refactoring part looks good to me and other individual functions do not seem to have an issue.
So, once it becomes a good shape, please go ahead and merge this PR.

@b-flo
Copy link
Member Author

b-flo commented Jun 8, 2023

So, once it becomes a good shape, please go ahead and merge this PR.

I think it's in somewhat good shape but there are some things I'm not happy with:

  • How the decoder state(s) is defined/handled, I think it's inefficient. If you have some opinions/ideas for a (future) redesign, it would be appreciated!
  • RWKV is a bit unstable and training parameters dependence is high. It may be due to bugs I introduced but some regularization and changes are needed (e.g.: initialization) to make it behave properly. For now, I could stabilize it a bit by adding back the R/K/V/output projection biases.

Anyway, I am mostly interested in rough performance/cost for now. Given the role of the Transducer decoder in a "vanilla" setup, the architecture won't make a big difference IMO. I'm finishing the experiments w/ Libri-100 and we can merge the PR.

@sw005320 sw005320 merged commit 3297e10 into espnet:master Jun 20, 2023
24 of 25 checks passed
@sw005320
Copy link
Contributor

Thanks a lot, @b-flo!

@b-flo b-flo deleted the refactoring branch June 22, 2023 09:07
@b-flo b-flo added this to the v.202307 milestone Jun 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI Travis, Circle CI, etc Documentation ESPnet2 Installation New Features README RNNT (RNN) transducer related issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants