## 声道特征提取、保存、使用
- FeatureExtractor
- FeatureServer

### 1.HDF5格式保存特征
<img src="http://www-lium.univ-lemans.fr/sidekit/_images/hdf5_featurefile_format_1.png" height='20%' width='40%' />
包含13个数据集：
    
| 特征 | 描述 |
| -- | ----- |
| bnf | for the bottleneck featuers |
| bnf_mean | the mean vector |
| dnf_std | the standard deviation vector |
| cep | cepstral coefficients |
| cep_mean | the mean vector |
| cep_std | the standard deviation vector |
| energy | a vector of log-energy values |
| energy_mean | he mean vector |
| energy_std | the standard deviation vector |
| fb | the filter-bank coefficients |
| fb_mean | he mean vector |
| fb_std | the standard deviation vector |
| vad | a vector of binary values |

另一种HDF5结构
<img src="http://www-lium.univ-lemans.fr/sidekit/_images/hdf5_featurefile_format_2.png" height='20%' width='40%' />


### 2.FeaturesExtractor对象
FeaturesExtractor 输入语音文件 (WAV, SPHERE, raw PCM…)   
返回HDF5格式特征(log-energy, cepstral coefficients, filter-bank coefficient, bottleneck features).


#### 2.1.标准化输入输出文件名特征提取

FeaturesExtractor创建对象

In [None]:
# audio_filename_structure 语音文件保存结构
extractor = sidekit.FeaturesExtractor(audio_filename_structure="audio/nist_2004/{}.sph",
                                      # 特征文件保存结构
                                      feature_filename_structure="feat/sre04/{}.h5",
                                      # 样本频率
                                      sampling_frequency=None,
                                      lower_frequency=200,
                                      higher_frequency=3800,
                                      filter_bank="log",
                                      filter_bank_size=24,
                                      window_size=0.025,
                                      shift=0.01,
                                      # 梅谱系数
                                      ceps_number=20,
                                      vad="snr",
                                      snr=40,
                                      pre_emphasis=0.97,
                                      # 存储的特征参数
                                      save_param=["vad", "energy", "cep", "fb"],
                                      keep_all_features=True)

In [None]:
#  数据文件 audio/nist_2004/taaa.sph
extractor.save("taaa")

#  特征文件 feat/sre04/taaa.h5
fh = extractor.extract("taaa")

多线程并行处理

In [None]:
# 语音文件列表
show_list = ["taaa", "taaf"]
channel_list = [0, 0]

extractor.save_list(show_list=show_list,
                    channel_list=channel_list,
                    # 线程数
                    num_thread=10)

#### 2.2.非标准化文件名特征提取

创建FeatureExtractor对象

In [None]:
extractor = sidekit.FeaturesExtractor(audio_filename_structure=None,
                                      feature_filename_structure=None,
                                      sampling_frequency=None,
                                      lower_frequency=200,
                                      higher_frequency=3800,
                                      filter_bank="log",
                                      filter_bank_size=24,
                                      window_size=0.025,
                                      shift=0.01,
                                      ceps_number=20,
                                      vad="snr",
                                      snr=40,
                                      pre_emphasis=0.97,
                                      save_param=["vad", "energy", "cep", "fb"],
                                      keep_all_features=True)

处理文件并保存到磁盘

In [None]:
extractor.save(show="taaa",
               channel=0,
               input_audio_filename="audio/sre04/taaa.sph",
               output_feature_filename="feat/nist/taaa.h5")


extractor.save(show="xllb",
               channel=0,
               input_audio_filename="data/nist2005/xllb.sph",
               output_feature_filename="output/nist/xllb_a.h5")

不保存到磁盘

In [None]:
fh = extractor.extract(show="taaa",
                       channel=0,
                       input_audio_filename="audio/sre04/taaa.sph",
                       output_feature_filename="feat/nist/taaa.h5")

多线程处理

In [None]:
show_list = ["taaa", "xllb"]
input_file_list = ["audio/sre04/taaa.sph", "data/nist2005/xllb.sph"]
output_feature_list = ["feat/nist/taaa.h5", "output/nist/xllb_a.h5"]

extractor.save_list(show_list=show_list,,
                    channel_list=channel_list,
                    num_thread=10)

输出结构
<img src='http://www-lium.univ-lemans.fr/sidekit/_images/hdf5_featurefile_format_3.png' height='20%' width='40%' />

### 3.FeaturesServer对象

FeatureServer从一个或多个HDF5文件加载一个或多个数据集并对特征进行后处理（标准化，添加时间上下文，rasta过滤，特征选择......）

#### 3.1. 从单一文件中提取特征

##### 3.1.1. 从HDF5文件中提取特征
之前已经通过FeatureExtractor提取特征并保存到文件中  

创建FeaturesServer实例对象

In [None]:
server = sidekit.FeaturesServer(features_extractor=None,
                                feature_filename_structure="feat/sre04/{}.h5",
                                sources=None,
                                # [“cep”, “fb”, vad”, energy”, “bnf”]
                                dataset_list=["energy", "cep", "vad"],
                                mask="[0-12]",
                                # 正则化类型 “cmvn”, “cms”, “stg”
                                feat_norm="cmvn",
                                global_cmvn=None,
                                dct_pca=False,
                                dct_pca_config=None,
                                sdc=False,
                                sdc_config=None,
                                delta=True,
                                double_delta=True,
                                delta_filter=None,
                                context=None,
                                traps_dct_nb=None,
                                rasta=True,
                                keep_all_features=True)

后处理可以按以下顺序包括以下步骤：  
- rasta过滤  
- 添加时间上下文的一阶和二阶导数，DCT-PCA或Shifted Delta Cepstra。  
- 使用倒谱平均方差归一化（cmvn），倒谱平均减法（cms）或短期高斯化（stg）对特征进行归一化。  
- 如果“vad”包含在dataset_list中，则根据加载的VAD标签选择帧。如果“vad”不在dataset_list中，则保留所有帧  

使用该FeaturesServer

In [None]:
load(self, show, channel=0, input_feature_filename=None, label=None, start=None, stop=None)

##### 3.1.2. 从语音文件中提取特征
使用包含FeaturesExtractor的FeaturesServer来计算音频文件中的声学参数

In [None]:
server = sidekit.FeaturesServer(features_extractor=extractor,
                                feature_filename_structure=None,
                                sources=None,
                                dataset_list=["energy", "cep", "vad"],
                                mask="[0-12]",
                                feat_norm="cmvn",
                                global_cmvn=None,
                                dct_pca=False,
                                dct_pca_config=None,
                                sdc=False,
                                sdc_config=None,
                                delta=True,
                                double_delta=True,
                                delta_filter=None,
                                context=None,
                                traps_dct_nb=None,
                                rasta=True,
                                keep_all_features=True)

# FeaturesServer使用
features，label = server.load（show，channel = 0，input_feature_filename = featureFileName，label = None，start = None，stop = None）

#### 3.2. 从多个文件中提取特征

##### 3.2.1. 从多个HDF5文件中提取特征
我们将从第一组加载能量，从第二组加载几个倒谱系数以组合它们。VAD标签也将从第二组中获取。为此，我们创建了两个功能服务器（每个服务器一个），如下所示：

In [None]:
fs_1 = sidekit.FeaturesServer(feature_filename_structure="{}.h5",
                              dataset_list=["energy"],
                              context=None)

fs_2 = sidekit.FeaturesServer(feature_filename_structure="{}_2.h5",
                             dataset_list=["cep", "vad"],
                             mask="[0-12]",
                             delta=True,
                             double_delta=True,
                             rasta=True)

最后一步包括创建第三个FeatureServer，它将调用fs_1和fs_2，然后在对完整功能应用后处理之前组合这两种类型的功能：

In [None]:
fs = sidekit.FeaturesServer(sources=((fs_1, False), (fs_2, True)),
                                feat_norm="cmvn",
                                keep_all_features=False)

形成第一组的能量与第二组的倒谱系数连同它们的一阶和二阶导数连接。最终，CMVN应用于整个特征，并且仅基于来自第二组的VAD标签保留所选择的帧。所有这一切都是通过调用：

In [None]:
feat, label = fs.load("taaa")

得到的特征是40维特征帧（13个倒谱系数+13个增量+ 13个delta-delta和对数能量）。

##### 3.2.2 从一个HDF5和一个语音文件中提取特征
我们首先创建一个FeaturesExtractor处理音频文件和相关的FeaturesServer将管理 FeaturesExtractor：

In [None]:
extractor = sidekit.FeaturesExtractor(audio_filename_structure="{}.wav",
                                      sampling_frequency=8000,
                                      lower_frequency=0,
                                      higher_frequency=4000,
                                      filter_bank="log",
                                      filter_bank_size=40,
                                      window_size=0.025,
                                      shift=0.01,
                                      ceps_number=20,
                                      vad="snr",
                                      snr=40,
                                      pre_emphasis=0.97,
                                      save_param=["energy"],
                                      keep_all_features=True)

fs_1 = sidekit.FeaturesServer(features_extractor=extractor,
                                feature_filename_structure=None,
                                sources=None,
                                vad="snr",
                                snr=40,
                                dataset_list=["energy"],
                                keep_all_features=True)

然后，我们创建第二个FeatureServer，它将从第二组特征文件加载倒谱系数并执行一些后期处理：

In [None]:
fs_2 = sidekit.FeaturesServer(feature_filename_structure="{}_2.h5",
                             dataset_list=["cep", "vad"],
                             mask="[0-12]",
                             delta=True,
                             double_delta=True,
                             rasta=True)

我们现在将两个FeaturesExtractor合并到第三个并执行CMVN：

In [None]:
fs = sidekit.FeaturesServer(sources=((fs_1, False), (fs_2, True)),
                                feat_norm="cmvn",
                                keep_all_features=False)

得到的特征通过以下方式获得：

In [None]:
feat, label = fs.load("taab")