Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Offline inference as service #537

Closed
mailong25 opened this issue Feb 14, 2020 · 10 comments
Closed

Offline inference as service #537

mailong25 opened this issue Feb 14, 2020 · 10 comments
Assignees

Comments

@mailong25
Copy link

I'm looking for making the prediction on a single wav file without the need of loading pre-trained AM and LM models every single time. These models should only be loaded once at the beginning.

I'm not referring to online decoding (real-time decoding). I've read Python bindings examples and simple_streaming_asr_example and seen that it is not possible. Should I create my own code to do this?

@mailong25 mailong25 changed the title Offline inference pipeline Offline inference as service Feb 14, 2020
@optimusfzco
Copy link

Thank you for asking this, i am very interested as well.

@mailong25
Copy link
Author

@vineelpratap @avidov any suggestions ?

@avidov
Copy link
Contributor

avidov commented Feb 18, 2020

Trying to understand the ask here.
If I understand you correctly, you are asking for a:

  1. process that loads and stays up continuously
  2. and you want at any time later feed the input files into this continuously running process.

Is this correct ?

If so:
How do you want to feed the input when the process is running?

The examples can be modified to do something like that. I can give you some suggestions if you explain in more details what you need.

@avidov
Copy link
Contributor

avidov commented Feb 18, 2020

At some point we'll probably release a small library for plugging wav2letter into a service (e.g. web-services or web-site).
Will this cover your need?

@mailong25
Copy link
Author

Yes, you're right. I'm building a python-based ASR applications and I try to intergrate ASR to our system. So I need either:

  1. A web service as you mentioned above. That way I can feed an audio (as binary format) to the process and get the transcription result. The python code will look like this:
    import Wav2LetterClient
    model = Wav2LetterClient(port = 'xxx', ip = 'xxx')
    audio = open('sample.wav','rb')
    model.transcribe(audio)

  2. A python bindings that allow me to load an Acoustic modeling (eg, convglu, tds), make prediction on single wav file, and return emissions and transitions scores.

@optimusfzco
Copy link

Hello,
I am working on an ASR app as well, i am more concerned with feeding audio live from microphone to achieve live transcription.

@avidov
Copy link
Contributor

avidov commented Feb 19, 2020

After thinking about it with the team I see the following:

  1. Creating ASR services is supported by https://github.com/facebookresearch/wav2letter/blob/master/inference/inference/examples/AudioToWords.h
    With use example in:
    https://github.com/facebookresearch/wav2letter/blob/master/inference/inference/examples/MultithreadedStreamingASRExample.cpp#L273-L280

  2. For feeding audio live from microphone we have:
    https://github.com/facebookresearch/wav2letter/blob/master/inference/inference/examples/AudioToWords.h#L23-L29

  3. For quick on the fly testing we suggest adding an interactive executable with a tiny shell.
    You can enter a file name at the shell and it will dump the transcription. Will look something like:

$ ./interactive_streaming_asr_example

/some/file/name.wav
.... transcription
/some/other/file/name.wav
.... transcription

I think this covers what you need. Please correct me otherwise.

@mailong25
Copy link
Author

Thank you for your instruction, but where the feature_extractor.bin comes from ?

https://github.com/facebookresearch/wav2letter/blob/master/inference/inference/examples/MultithreadedStreamingASRExample.cpp#L76

@avidov
Copy link
Contributor

avidov commented Feb 20, 2020

Added interactive_streaming_asr_example
45110ba

Interactive mode loads the models once and then waits for command line requests. It has a tiny command line shell that support:

  1. Transcribing audio files:
    input=[full path to audio file]
  2. Redirecting output to a file:
    output=[full path to output text file]
  3. Redirecting to stdout
    output=stdout
  4. Convenient use from Python script/shell using popen(). tlikhomanenko@ will release a tutorial for that soon.

Will add a tutorial soon at:
https://github.com/facebookresearch/wav2letter/wiki/Inference-Run-Examples

Hope that this will cover your needs. Please let me know.

Regarding:

Thank you for your instruction, but where the feature_extractor.bin comes from ?

https://github.com/facebookresearch/wav2letter/blob/master/inference/inference/examples/MultithreadedStreamingASRExample.cpp#L76

Is this comment belong to this thread?

@mailong25
Copy link
Author

Thanks, the changes cover all my needs.
Regarding:

Thank you for your instruction, but where the feature_extractor.bin comes from ?

https://github.com/facebookresearch/wav2letter/blob/master/inference/inference/examples/MultithreadedStreamingASRExample.cpp#L76
Is this comment belong to this thread?

No, it doesn't . I'm gonna double check that. Any concerns will be open in another thread.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants