Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

希望加入Automatic Forced Alignment(音文对齐)和Spleeter(音轨分离) #28

Open
holduan opened this issue May 31, 2021 · 0 comments

Comments

@holduan
Copy link

holduan commented May 31, 2021

背景:
目前ASR效果还可以(提前Spleeter不知到能不能提高识别率)。
但是断句太烂,音频识别出文本后,可以考虑手动/只能断句,最后进行AFA(Automatic Forced Alignment)。

看了音文强制对齐,识别率不低啊,感觉可以语音转文字,支持的语言也多
即使是影视素材,英文识别率也挺高的,就有个思路,不知道大佬有没有兴趣:
1、语音转文本:Montreal Forced Aligner (可以用其他转写)
2、英文智能标点:http://bark.phon.ioc.ee/punctuator (中文没了解)
3、手动/半自动断句(.!?等),部分修正("i'""i "等)
4、带时间戳音文对齐:Montreal Forced Aligner、aeneas、YouTube(添加字幕有自动对齐)

音文对齐工具
1、Montreal Forced Aligner(支持挺多语言的):
https://zhuanlan.zhihu.com/p/86657478

效果:
https://www.youtube.com/watch?v=LgrX5gNgxx0&ab_channel=MahdiChtourou
https://www.youtube.com/watch?v=VONAIXelJYg&ab_channel=AdvancedSkeleton
https://www.youtube.com/watch?v=OLXrlcnndBs&ab_channel=YutingHsueh

2、aeneas(多语言支持,有free Webapp):
https://www.readbeyond.it/aeneas/
使用感受:声音干净的素材对得很齐(如Audiobook,讲解等),影视在干净的地方还很准,嘈杂处对得很糟糕。
配合Spleeter/iZotope RX.8,人声分离后,准度会提高很多。
经过测试,Spleeter有600s持续运行时间限制,增加时长容易引起内存溢出,16G内存能跑800s左右。
SpleeterGui,傻瓜式安装,支持批量拖入,最高能跑700s左右。
长音频还是用iZotope RX.8吧。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants