- The goal is to make a model which translates audio from meetings and encodes it to a text file, which on completion is summarized in to a smaller, concise list of bullet points covering the entire meeting. The initial idea is to implement it through pre-trained models, but the accuracy and efficiency has to take care of.
- There are a few considersation. First, that it should desirably also support mulit-lingual conversation, with english being the primary support, and additional support for hindi and at least the default regional language.
- The end goal however, is to wrap this model into a presentable web-app, with gui interface to start the recording, and add the administrators to the meetings, which will recieve the summary as a text file in the mail box automatically, without manual intervention.
To create a machine learning model which can listen to the audio from meetings and translate the speech to text, and finally output a summary of the entire meeting in a text format. This model can further be wrapped inside a graphical interface for easier access, where the summarized text has to be sent to the administrators of the meeting provided on onset.
- using google's standard api to convert speech to text1
- setting a baseline
- finding accuracy
- improving on the same
- dumping the output to a text file
- having got the baseline in
- to use them together and streamline the model
- bridge the gap between the two - preprocessing
- process the output from converted text with proper punctuation and markings
- then running the summarizing models on the processed text block
- thus, require a middleware leveragin a natural languge processing model
- major shift in workflow, found better option: 11
voskmodel for speech to text convertion- preprocessing through
transforermsmodel - finally, summarizing through
pipeline()
- Context aware summary
- change of paragraph in cases of change of paragraph
- might require deep learning,
- can seperate two summaries, general and context based summary
- Multilingual speech optimization
- Adapting to bandwidth, backup solutions
- recording set audio
- fall-back to recording and translating the saved audio instead of live time transcribing
Footnotes
-
GfG articles on speech-to-text Python: Con... ↩
-
Knowledge Base/ turing.com ↩
-
Sasha Bondar's blog post on reintech.io ↩
-
Official PyPi documentation for
SpeechRecognition↩