## 数据集封装器（Dataset Wrappers）的使用

### 一、模块内容

`Datasets` 中的子包为一些流行的音频数据集提供包装，以使它们更易于使用。

Two base class [`pyroomacoustics.datasets.base.Dataset`](https://pyroomacoustics.readthedocs.io/en/pypi-release/pyroomacoustics.datasets.base.html#pyroomacoustics.datasets.base.Dataset "pyroomacoustics.datasets.base.Dataset") and [`pyroomacoustics.datasets.base.Sample`](https://pyroomacoustics.readthedocs.io/en/pypi-release/pyroomacoustics.datasets.base.html#pyroomacoustics.datasets.base.Sample "pyroomacoustics.datasets.base.Sample") wrap together the audio samples and their meta data. The general idea is to create a sample object with an attribute containing all metadata. Dataset objects that have a collection of samples can then be created and can be filtered according to the values in the metadata.

`Datasets` 主要包括两个基类 `Dataset` 和 `Sample`，这两个基类将音频样本及其元数据封装在一起：

- [`pyroomacoustics.datasets.base.Dataset`](https://pyroomacoustics.readthedocs.io/en/pypi-release/pyroomacoustics.datasets.base.html#pyroomacoustics.datasets.base.Dataset "pyroomacoustics.datasets.base.Dataset")
- [`pyroomacoustics.datasets.base.Sample`](https://pyroomacoustics.readthedocs.io/en/pypi-release/pyroomacoustics.datasets.base.html#pyroomacoustics.datasets.base.Sample "pyroomacoustics.datasets.base.Sample")

**其主要思想：**是创建一个包含所有元数据属性的样本对象。然后，就可以创建一个包含一系列样本的数据集对象，并可以根据元数据中的值进行过滤了。

Many of the functions with `match` or `filter` will take an arbitrary number of keyword arguments. The keys should match some metadata in the samples. Then there are three ways that match occurs between a `key/value` pair and an `attribute` sharing the same key.

许多带有 `match` 或 `filter` 的函数将采用任意数量的关键字参数。键应与采样样本中的某些元数据匹配。然后，在键/值对与共享一个键的属性之间具有以下三种匹配方式：

1. `value == attribute`
2. `value` is a list and `attribute in value == True`
3. `value` is a callable (a function) and `value(attribute) == True`

In [None]:
# This example involves the CMU ARCTIC corpus available at
# http://www.festvox.org/cmu_arctic/

import matplotlib.pyplot as plt
import pyroomacoustics as pra

# Here, the corpus for speaker bdl is automatically downloaded
# if it is not available already
corpus = pra.datasets.CMUArcticCorpus(download=True, speaker=['bdl'])

# print dataset info and 10 sentences
print(corpus)
corpus.head(n=10)

# let's extract all samples containing the word 'what'
keyword = 'what'
matches = corpus.filter(text=lambda t : keyword in t)
print('The number of sentences containing "{}": {}'.format(keyword, len(matches)))
for s in matches.sentences:
    print('  *', s)

# if the sounddevice package is available, we can play the sample
matches[0].play()

# show the spectrogram
plt.figure()
matches[0].plot()
plt.show()
