-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Speech AutoModels #13655
Add Speech AutoModels #13655
Conversation
# Only load from `config.architectures`, AutoModelForCTC and AutoModelForConditionalGeneration | ||
# do not exist yet. | ||
"pt": () if is_torch_available() else (), | ||
"pt": (AutoModelForCTC, AutoModelForSpeechSeq2Seq) if is_torch_available() else (), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think that's cleaner @Narsil
pipeline( | ||
task="automatic-speech-recognition", | ||
model="hf-internal-testing/tiny-random-wav2vec2", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a tokenizer was recently added to this repo so created a new repo without tokenizer. Also it should give a OSError error IMO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me.
(Not too related to this PR) The fact that this pipeline yields to two different AutoModel*
implementation reminds me of the ongoing issue relative to the 1-to-1 mapping that auto models/pipelines have, as:
- Some pipelines map to several auto models (like this one)
- Some auto models map to several pipelines (
AutoModelForSeq2SeqLM
maps ~4 pipelines)
Should there be a slight refactor so that auto models map 1-to-1 to a pipeline? This could introduce AutoModelForSummarization
, AutoModelForTranslation
, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, and the AutoClass names should work well down the road 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thanks for adding!
I'm not really in favor of aligning pipelines and In this case here an auto model class called
On the other hand, I agreed with @Narsil that for the pipelines it doesn't really make sense to have More generally, I believe that the So I'm not really in favor of the 1-to-1 alignment I think. => Would maybe be a good idea to have a chat about this more generally though! |
I agree with @patrickvonplaten actually. In my mind, On the other hand, for pipelines, relying on Forcing 1-1 here would mean forcing model developers to care about pipeline and enabling all potential uses for a given head. It would also means that users wouldn't necessarily know which Keeping the current architecture is alright I think. |
* upload * correct * correct * correct * finish * up * up * up again
What does this PR do?
This PR adds two auto models for speech:
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.