Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aeneas实现音频强制对齐 #36

Open
liusaint opened this issue Apr 30, 2019 · 0 comments
Open

aeneas实现音频强制对齐 #36

liusaint opened this issue Apr 30, 2019 · 0 comments

Comments

@liusaint
Copy link
Owner

做英语学习类产品经常会遇到读句子的时候针对单个单词的类卡拉ok的高亮效果。 这里记录一下音频进度和单词的一一对应关系(类似于歌词文件,粒度为单词级别)如何生成。纯人工来校对的话人工成本还是比较大的。 专业的词语是强制对齐(Forced Alignment)。这里介绍python库aeneas,可针对每句或每个单词的时间节点的json文件,还可以批量操作。准确率还不错。文档:
https://github.com/readbeyond/aeneas

http://www.readbeyond.it/aeneas/

使用方式:

1.安装软件
一键安装包(windows版本和mac版本)
https://github.com/sillsdev/aeneas-installer/releases

2.准备文档。一个文件夹。取名,如folder。
包含

config.txt   //配置文件  包含格式、输出路径等
audios/      //音频和句子信息
   -- Can_you_see_me.txt   //包含对应句子文本
   -- Can_you_see_me.m4a   //对应音频。 与文本文件名一致
   -- Yes_can.txt          //可批量操作
   -- Yes_can.m4a      

3.打开命令行工具、终端。进入folder所在的目录下。创建一个output文件夹。
执行命令行: python -m aeneas.tools.execute_job folder/ output/

4.生成成功。到输出目录下找对应的文件生成文件。可自己写个简单的h5,上传生成的json和音频做准确率校验。

5.Windows下aeneas错误处理。the default input encoding is not UTF-8.You might want to set 'PYTHONIOENCODING=UTF-8' in your shell. 解决方案,终端进入python安装目录下,执行命令如:

cd C:\Python27\Scripts  
set PYTHONIOENCODING=UTF-8

6.config.txt配置,包含路径、格式等信息。

is_hierarchy_type=flat
is_hierarchy_prefix=audios/
is_text_file_relative_path=.
is_text_file_name_regex=.*\.txt
is_text_type=mplain
is_audio_file_relative_path=.
is_audio_file_name_regex=.*\.m4a
is_audio_file_detect_head_max=10.000
is_audio_file_detect_tail_max=10.000

os_job_file_name=output_example1
os_job_file_container=zip
os_job_file_hierarchy_type=flat
os_job_file_hierarchy_prefix=audios/
os_task_file_name=$PREFIX.json
os_task_file_format=json
os_task_file_smil_page_ref=$PREFIX.xhtml
os_task_file_smil_audio_ref=$PREFIX.m4a
os_task_file_levels=3



job_language=en
job_description=Example 1 (flat hierarchy, parsed text files)

7.输出。

{
 "fragments": [
  {
   "begin": "1.560",  
   "end": "2.070",  
   "lines": [
    "Thanks"
   ]
  }, 
  {
   "begin": "2.070",  
   "end": "2.360",  
   "lines": [
    "for"
   ]
  }, 
  {
   "begin": "2.360",  
   "end": "2.950",  
   "lines": [
    "taking"
   ]
  }, 
  {
   "begin": "2.950",  
   "end": "3.405",  
   "lines": [
    "care"
   ]
  }, 
  {
   "begin": "3.405",  
   "end": "3.750",  
   "lines": [
    "of"
   ]
  }, 
  {
   "begin": "3.750",  
   "end": "4.140",  
   "lines": [
    "my"
   ]
  }, 
  {
   "begin": "4.140",  
   "end": "4.520",  
   "lines": [
    "dog!"
   ]
  }
 ]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant