[AutoProcessor] Add Wav2Vec2WithLM & small fix #14675

patrickvonplaten · 2021-12-08T12:57:11Z

What does this PR do?

Make AutoProcessor work correctly with local files and small fix

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

src/transformers/__init__.py

patrickvonplaten · 2021-12-08T13:24:27Z

src/transformers/models/auto/processing_auto.py

@@ -39,6 +39,7 @@
        ("speech_to_text_2", "Speech2Text2Processor"),
        ("trocr", "TrOCRProcessor"),
        ("wav2vec2", "Wav2Vec2Processor"),
+        ("wav2vec2_with_lm", "Wav2Vec2ProcessorWithLM"),


need new folder for this

LysandreJik

LGTM

LysandreJik · 2021-12-08T13:26:15Z

src/transformers/models/auto/processing_auto.py

@@ -145,6 +146,9 @@ def from_pretrained(cls, pretrained_model_name_or_path, **kwargs):
            key: kwargs[key] for key in ["revision", "use_auth_token", "local_files_only"] if key in kwargs
        }
        model_files = get_list_of_files(pretrained_model_name_or_path, **get_list_of_files_kwargs)
+        # strip to file name
+        model_files = [f.split("/")[-1] for f in model_files]


Shouldn't that be handled in get_list_of_files?

yeah I thought so too in the beginning, but think it's cleaner to get the actual file names from this function instead of trimming to the last file. What do you think? cc @sgugger as well

It should be done at the model_files level to be consistent with the distant repos.

patrickvonplaten · 2021-12-08T14:09:43Z

Test failure is unrelated

patrickvonplaten · 2021-12-08T14:51:22Z

Failures are unrelated - merging

sgugger

Can't comment on the merged PR but I think we should strip the filenames in the get_list_of_files method to have a behavior that is consistent between repos and local folders. The files are not directly usable and you have to use cached_path anyway to actually use them, which will put again the folder name at the beginning.

sgugger · 2021-12-08T18:30:27Z

src/transformers/models/auto/processing_auto.py

@@ -145,6 +146,9 @@ def from_pretrained(cls, pretrained_model_name_or_path, **kwargs):
            key: kwargs[key] for key in ["revision", "use_auth_token", "local_files_only"] if key in kwargs
        }
        model_files = get_list_of_files(pretrained_model_name_or_path, **get_list_of_files_kwargs)
+        # strip to file name
+        model_files = [f.split("/")[-1] for f in model_files]


It should be done at the model_files level to be consistent with the distant repos.

* [AutoProcessor] Add Wav2Vec2WithLM & small fix * revert line removal * Update src/transformers/__init__.py * add test * up * up * small fix

patrickvonplaten added 2 commits December 8, 2021 13:57

[AutoProcessor] Add Wav2Vec2WithLM & small fix

1490a93

revert line removal

6671da9

patrickvonplaten commented Dec 8, 2021

View reviewed changes

src/transformers/__init__.py Outdated Show resolved Hide resolved

patrickvonplaten added 2 commits December 8, 2021 13:59

Update src/transformers/__init__.py

77a8d83

add test

f26a036

patrickvonplaten commented Dec 8, 2021

View reviewed changes

LysandreJik approved these changes Dec 8, 2021

View reviewed changes

patrickvonplaten added 2 commits December 8, 2021 14:31

up

623f888

up

bdf8b34

LysandreJik approved these changes Dec 8, 2021

View reviewed changes

small fix

c6830af

patrickvonplaten merged commit ee4fa2e into huggingface:master Dec 8, 2021

patrickvonplaten deleted the add_wav2vec2_with_lm_to_autoprocessor branch December 8, 2021 14:51

sgugger reviewed Dec 8, 2021

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AutoProcessor] Add Wav2Vec2WithLM & small fix #14675

[AutoProcessor] Add Wav2Vec2WithLM & small fix #14675

patrickvonplaten commented Dec 8, 2021 •

edited

patrickvonplaten Dec 8, 2021

LysandreJik left a comment

LysandreJik Dec 8, 2021

patrickvonplaten Dec 8, 2021 •

edited

sgugger Dec 8, 2021

patrickvonplaten commented Dec 8, 2021

patrickvonplaten commented Dec 8, 2021

sgugger left a comment

sgugger Dec 8, 2021

[AutoProcessor] Add Wav2Vec2WithLM & small fix #14675

[AutoProcessor] Add Wav2Vec2WithLM & small fix #14675

Conversation

patrickvonplaten commented Dec 8, 2021 • edited

What does this PR do?

Before submitting

Who can review?

patrickvonplaten Dec 8, 2021

Choose a reason for hiding this comment

LysandreJik left a comment

Choose a reason for hiding this comment

LysandreJik Dec 8, 2021

Choose a reason for hiding this comment

patrickvonplaten Dec 8, 2021 • edited

Choose a reason for hiding this comment

sgugger Dec 8, 2021

Choose a reason for hiding this comment

patrickvonplaten commented Dec 8, 2021

patrickvonplaten commented Dec 8, 2021

sgugger left a comment

Choose a reason for hiding this comment

sgugger Dec 8, 2021

Choose a reason for hiding this comment

patrickvonplaten commented Dec 8, 2021 •

edited

patrickvonplaten Dec 8, 2021 •

edited