Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#1782] Add max position params to speech recognition #1783

Closed
wants to merge 2 commits into from

Conversation

mgaido91
Copy link
Contributor

@mgaido91 mgaido91 commented Mar 5, 2020

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
  • Did you read the contributor guideline?
  • Did you make sure to update the docs?
  • Did you write any new necessary tests?

What does this PR do?

Fixes #1782.

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

@facebook-github-bot
Copy link
Contributor

Hi @mgaido91!

Thank you for your pull request and welcome to our community.We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file.

In order for us to review and merge your code, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

If you have received this in error or have any questions, please contact us at cla@fb.com. Thanks!

@facebook-github-bot
Copy link
Contributor

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks!

@mgaido91
Copy link
Contributor Author

mgaido91 commented Mar 5, 2020

the failure seems unrelated to this PR, but I don't know how to retrigger the tests, can anyone help me? Thanks.

@erip
Copy link
Contributor

erip commented Mar 5, 2020

@mgaido91 Windows never passes.

@myleott myleott requested a review from okhonko March 7, 2020 15:02
Copy link
Contributor

@okhonko okhonko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution @mgaido91 !

Please take a look at my minor comments.

@@ -77,6 +77,10 @@ def add_args(parser):
parser.add_argument(
"--silence-token", default="\u2581", help="token for silence (used by w2l)"
)
parser.add_argument('--max-source-positions', default=2048, type=int, metavar='N',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default value of 2048 may be too small since in speech recognition the source is sequence of frames, not tokens.
With this change we may start filtering out large portion of librispeech data by default for example.

Maybe we can set default as sys.maxsize for both of the source and target sequence to keep existing behavior (no data filtering by max size).

@@ -77,6 +77,10 @@ def add_args(parser):
parser.add_argument(
"--silence-token", default="\u2581", help="token for silence (used by w2l)"
)
parser.add_argument('--max-source-positions', default=2048, type=int, metavar='N',
help='max number of tokens in the source sequence')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: max number of frames

Copy link
Contributor

@okhonko okhonko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing my comments @mgaido91 !
Please consider changing the default for max-target-positions as well.

Copy link
Contributor

@okhonko okhonko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks

@mgaido91
Copy link
Contributor Author

Thanks for the review @okhonko . Is there anything else I can do to push this PR forward? Thanks in advance for the guidance.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@myleott has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@myleott merged this pull request in a12c5c5.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Speech recognition can OOM with large audio sequences
4 participants