Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Speech to text notebook 211 #271

Merged
merged 27 commits into from
Dec 14, 2021
Merged

Add Speech to text notebook 211 #271

merged 27 commits into from
Dec 14, 2021

Conversation

Debskij
Copy link

@Debskij Debskij commented Sep 27, 2021

No description provided.

@Debskij Debskij added this to In progress in Notebooks Roadmap via automation Sep 27, 2021
@ryanloney ryanloney added the new notebook new jupyter notebook label Sep 30, 2021
@Debskij Debskij changed the title [WIP] Add Speech to text notebook Add Speech to text notebook 211 Nov 7, 2021
@ryanloney
Copy link
Contributor

This notebook does not follow the contribution guide. Please review the guide. We need things like a header/title and layout the is similar to other notebooks.

A few other points:

  • there should be a way to preview the audio clip by pressing play (should be doable with ipython)
  • audio input type is too strict, need to be more flexible (more context below)
  • can .ogg or other formats be supported?
  • please do not use acronyms like CTC in headings
  • please follow case style for headers and sub headings that matches other notebooks

Is there any way for us to take ANY .wav file and make sure it meets these requirements? (Without adding additional dependencies to the requirements.txt?)

assert sample_width == 2, "Only 16-bit WAV PCM supported"
assert compression_type == 'NONE', "Only linear PCM WAV files supported"
assert channel_num == 1, "Only mono WAV PCM supported"
assert sampling_rate == 16000, "Only 16 KHz audio supported"

librosa seems to have the ability to load OGG and MP3. Can we use it for this? https://librosa.org/doc/main/generated/librosa.load.html

Copy link
Contributor

@ryanloney ryanloney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make the following changes before approval to merge

notebooks/211-speech-to-text/README.md Outdated Show resolved Hide resolved
notebooks/211-speech-to-text/211-speech-to-text.ipynb Outdated Show resolved Hide resolved
notebooks/211-speech-to-text/211-speech-to-text.ipynb Outdated Show resolved Hide resolved
notebooks/211-speech-to-text/211-speech-to-text.ipynb Outdated Show resolved Hide resolved
notebooks/211-speech-to-text/211-speech-to-text.ipynb Outdated Show resolved Hide resolved
notebooks/211-speech-to-text/211-speech-to-text.ipynb Outdated Show resolved Hide resolved
notebooks/211-speech-to-text/211-speech-to-text.ipynb Outdated Show resolved Hide resolved
notebooks/211-speech-to-text/211-speech-to-text.ipynb Outdated Show resolved Hide resolved
@ryanloney
Copy link
Contributor

@Debskij
Please replace the .wav sample file with edge_to_cloud.ogg (in the .zip file below). Then it is good to merge.
edge_to_cloud.zip

"id": "b7e9d9b9",
"metadata": {},
"source": [
"### Run Decoding and Print Output."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"### Run Decoding and Print Output."
"### Run Decoding and Print Output"

"id": "a566de49",
"metadata": {},
"source": [
"### Do Inference!\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"### Do Inference!\n",
"### Do Inference\n",

@ryanloney
Copy link
Contributor

@Debskij please clear the outputs from Jupyter cells and commit clean version.

@ryanloney
Copy link
Contributor

@helena-intel I'm fine with merging this tomorrow, as long as you are

@ryanloney
Copy link
Contributor

ryanloney commented Dec 9, 2021

Just for fun, I tried transcribing a CSPAN video (public domain) to see how long it would take. This is a 17 minute video and the model was able to process on my laptop's Tiger Lake CPU in 4s and only 2s on iGPU. https://youtu.be/Wp-WiNXH6hI

This is the output. I'm impressed.
cspan.txt

Copy link
Contributor

@helena-intel helena-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Debskij ! This is a great notebook - so many possibilities. I love that this allows you to run speech-to-text for private data that you do not want to upload to some cloud server.

I will approve and merge this now. I have one non-blocking change request: there is a DeprecationWarning about waveplot. The replacement (according to docstring) is waveshow. I tried to simply replace the method but that did not work. Can you make a separate PR that changes waveplot to waveshow?

image

@helena-intel helena-intel merged commit a3a0ae3 into openvinotoolkit:main Dec 14, 2021
Notebooks Roadmap automation moved this from In progress to Done Dec 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new notebook new jupyter notebook
Projects
Development

Successfully merging this pull request may close these issues.

None yet

4 participants