-
Notifications
You must be signed in to change notification settings - Fork 246
Add Silero Speech-To-Text models #153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Deploy preview for pytorch-hub-preview ready! Built with commit 78f39ca |
Maybe I did something wrong, but I cannot see our model's page here |
Hi, @ailzhang @bertmaher @wconstab could you please help with assigning a correct person to review the submission? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@snakers4
Thanks for contributing!
If you move your md
file out of docs
you should be able to see it on the webpage.
Also once you do that, code snippet in the the markdown will be executed in the CI so you probably also need a working example there (e.g. test_files
)
Hi,
Moved it to the root folder
Added a small validation dataset download so that it works end-to-end, tested it locally |
fixed the imports and path issues |
looks like all is fixed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great great addition.
I have one important comment around dependencies (so that the hub example works in Google Colab) and another comment which is a suggestion.
Please take a look
# see https://github.com/snakers4/silero-models for utils and more examples | ||
|
||
device = torch.device('cpu') # gpu also works, but our models are fast enough for CPU | ||
model, decoder, utils = torch.hub.load(github='snakers4/silero-models', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would this require omegaconf and torchaudio to be installed?
if so, you should add a cell like in https://github.com/pytorch/hub/blob/master/nvidia_deeplearningexamples_waveglow.md#example
which pip installs these extra packages.
that will make the thing instantly run in Google Colab, and is really valuable!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi,
would this require omegaconf and torchaudio to be installed?
Yeah, this would
That is why I needed to include them in your CI environment above
Actually in colab I also needed to install soundfile, as we are using it as a backend for TorchAudio
if so, you should add a cell like in
that will make the thing instantly run in Google Colab, and is really valuable!
yeah, this totally makes sense
by the way, you can see a more extended colab version here
I will add this shortly
snakers4_silero-models_stt.md
Outdated
read_audio, | ||
prepare_model_input) = utils # see function signature for details | ||
|
||
torch.hub.download_url_to_file('http://www.openslr.org/resources/83/midlands_english_female.zip', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of a zip file, extracting it and then running a batch of wav files through, it would be much nicer and more illustrative of downloading a single wav file and processing it through.
like:
torch.hub.download_url_to_file('some download url for speech.wav', dst='speech.wav)
input = prepare_model_input(glob('midlands_english_female/*.wav'), device=device)
output = model(input)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one of reasons why I am downloading whole validation dataset and running a batch is to demonstrate the models are actually quite fast on CPU as well as on GPU
but I see your point
I will add a default example with one file, but I would nevertheless keep the utils
because for a first time user batching may be an issue, so I would like to solve that
I will add the changes shortly
thanks for the great contribution. it should go live sometime today, maybe in a couple of hours. |
checked here I guess it is not there yet, looking forward to the release! on a side note, as a small self-funded team we feel extremely proud to have made something worthy of including in this hub |
apparently we've moved to a once-per-day update of the site now. it should be live by tomorrow morning. |
it is live now |
Please kindly review our Speech-To-Text models