Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate canonical Multilingual Librispeech #4060

Merged

Conversation

polinaeterna
Copy link
Contributor

@polinaeterna polinaeterna commented Mar 30, 2022

Deprecate canonical Multilingual Librispeech in favor of the community one which supports streaming.

However, there is a problem regarding new ASR template schema: since it's changed, I guess all community datasets that use this template do not work with new version of the library, including MLS. Should we somehow notify users about that or is it possible to change this line ourselves? For MLS specifically, I cannot change the code directly as I'm not the member of the Facebook org.

Hm, and the code should be change after the release, no?

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Mar 30, 2022

The documentation is not available anymore as the PR was closed or merged.

@lhoestq
Copy link
Member

lhoestq commented Mar 30, 2022

Yes, as discussed in #4006 we should update facebook/multilingual_librispeech indeed before we do a release. @anton-l could you help taking care of updating facebook/multilingual_librispeech ? We need to update the task template

task_templates=[AutomaticSpeechRecognition(audio_column="audio", transcription_column="text")],

and write that datasets>=2.1 is necessary to load it in the dataset card.

Once the change is done we can merge this PR and do the release I think

@anton-l
Copy link
Member

anton-l commented Mar 31, 2022

@polinaeterna @lhoestq
Updated the script and the dataset card: https://huggingface.co/datasets/facebook/multilingual_librispeech

@polinaeterna
Copy link
Contributor Author

@anton-l @lhoestq now previewer doesn't work for this datasets as it cannot recognize new audio_column argument:
image

I'm not an expert in previewer things, where should I look into the corresponding code?

@severo
Copy link
Contributor

severo commented Apr 1, 2022

Yes, there are several datasets with the same error, eg huggingface/dataset-viewer#188. I'm not sure what I should do to fix this? Upgrade datasets to master?

@lhoestq
Copy link
Member

lhoestq commented Apr 1, 2022

@anton-l ended up removing the task template in facebook/multilingual_librispeech to make it work for the current version of datasets and fix the viewer :) thanks !

@polinaeterna
Copy link
Contributor Author

@lhoestq can we merge now? ^^

Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, thanks :)

@polinaeterna polinaeterna merged commit 2df3e2b into huggingface:master Apr 1, 2022
@polinaeterna polinaeterna deleted the deprecate-ml-librispeech branch April 1, 2022 12:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants