It offers users a convenience way to convert recording audios to transcript with identified speakers. The converting part is done by utilizing Amazon Transcribe and Amazon Comprehend, while the identifying part is achieved by utilizing VGGVox speaker identification model. Here is the link of a simple deom vedio for AWS hackathon. It is demonstrated by using 2016 America presidential debate on youtube.
- Python >= 3.7
- R >= 3.6.1
$ pip install -r requirements.txt
to install all dependencies wirtten in python.- To install all dependencies wirtten in R:
$ install.packages("data.table")
$ install.packages("dplyr")
$ install.packages("shiny")
$ install.packages("DT")
$ install.packages("shinydashboard")
$ install.packages("stringr")
-
First, run
$ python ui.py
, a GUI will pop up. \ -
Then do enrollment:
- Enter "speaker's name" in
使用者名稱
- Click
開始錄音
to start enroll the speaker's voice. - Click
結束錄音
if one finishes recording. - Iterate over the first three processes if there are multiple speakers.
- After all speakers are enrolled, click
開始辨識
.
- Enter "speaker's name" in
-
Third, start recoding by clicking
會議錄音
, and finish recording by clicking結束會議
. -
The result can be seen by cicking the generated
shiny.bat
file.
- Ray Yu (yuchio8156@gmail.com)
- Ling Tseng (lynn4261@gmail.com)
- Vincent Lai (watlz1533@gmail.com)
- Peiyu Ho (peiyu_ho@wistron.com)
- Gigi Yeh (gigi2jean@gmail.com)
- Hank Shih(leo21274@hotmail.com)
Thanks Hack For Good hold by AWS for providing the AWS Services, and the open source of VGGVox. Special thanks to Chen, Stuart, the Solutions Architect at Amazon Web Services (AWS), for technical supports.