Add Speech to text notebook 211 #271

Debskij · 2021-09-27T10:53:52Z

No description provided.

ryanloney · 2021-11-08T20:07:57Z

This notebook does not follow the contribution guide. Please review the guide. We need things like a header/title and layout the is similar to other notebooks.

A few other points:

there should be a way to preview the audio clip by pressing play (should be doable with ipython)
audio input type is too strict, need to be more flexible (more context below)
can .ogg or other formats be supported?
please do not use acronyms like CTC in headings
please follow case style for headers and sub headings that matches other notebooks

Is there any way for us to take ANY .wav file and make sure it meets these requirements? (Without adding additional dependencies to the requirements.txt?)

assert sample_width == 2, "Only 16-bit WAV PCM supported"
assert compression_type == 'NONE', "Only linear PCM WAV files supported"
assert channel_num == 1, "Only mono WAV PCM supported"
assert sampling_rate == 16000, "Only 16 KHz audio supported"

librosa seems to have the ability to load OGG and MP3. Can we use it for this? https://librosa.org/doc/main/generated/librosa.load.html

ryanloney

Please make the following changes before approval to merge

notebooks/211-speech-to-text/README.md

notebooks/211-speech-to-text/211-speech-to-text.ipynb

ryanloney · 2021-11-17T16:21:54Z

@Debskij
Please replace the .wav sample file with edge_to_cloud.ogg (in the .zip file below). Then it is good to merge.
edge_to_cloud.zip

ryanloney · 2021-11-17T16:23:29Z

notebooks/211-speech-to-text/211-speech-to-text.ipynb

+   "id": "b7e9d9b9",
+   "metadata": {},
+   "source": [
+    "### Run Decoding and Print Output."


Suggested change

"### Run Decoding and Print Output."

"### Run Decoding and Print Output"

ryanloney · 2021-11-17T16:25:26Z

notebooks/211-speech-to-text/211-speech-to-text.ipynb

+   "id": "a566de49",
+   "metadata": {},
+   "source": [
+    "### Do Inference!\n",


Suggested change

"### Do Inference!\n",

"### Do Inference\n",

notebooks/211-speech-to-text/211-speech-to-text.ipynb

Co-authored-by: Ryan Loney <ryanloney@gmail.com>

Co-authored-by: Adrian Boguszewski <adekboguszewski@gmail.com>

ryanloney · 2021-12-09T21:07:00Z

@Debskij please clear the outputs from Jupyter cells and commit clean version.

ryanloney · 2021-12-09T23:03:44Z

@helena-intel I'm fine with merging this tomorrow, as long as you are

ryanloney · 2021-12-09T23:31:18Z

Just for fun, I tried transcribing a CSPAN video (public domain) to see how long it would take. This is a 17 minute video and the model was able to process on my laptop's Tiger Lake CPU in 4s and only 2s on iGPU. https://youtu.be/Wp-WiNXH6hI

This is the output. I'm impressed.
cspan.txt

helena-intel

Thanks @Debskij ! This is a great notebook - so many possibilities. I love that this allows you to run speech-to-text for private data that you do not want to upload to some cloud server.

I will approve and merge this now. I have one non-blocking change request: there is a DeprecationWarning about waveplot. The replacement (according to docstring) is waveshow. I tried to simply replace the method but that did not work. Can you make a separate PR that changes waveplot to waveshow?

Debskij added this to In progress in Notebooks Roadmap via automation Sep 27, 2021

ryanloney added the new notebook new jupyter notebook label Sep 30, 2021

Debskij changed the title ~~[WIP] Add Speech to text notebook~~ Add Speech to text notebook 211 Nov 7, 2021

helena-intel requested review from ryanloney, raymondlo84 and adrianboguszewski November 8, 2021 10:20

ryanloney requested changes Nov 15, 2021

View reviewed changes

adrianboguszewski reviewed Nov 16, 2021

View reviewed changes