Skip to content

Commit

Permalink
Merge pull request #22 from pplantinga/feat/readme-remove-old
Browse files Browse the repository at this point in the history
Feat/readme remove old
  • Loading branch information
pplantinga committed Apr 30, 2020
2 parents 8c50cc7 + 8ae72d7 commit 807eefd
Showing 1 changed file with 40 additions and 116 deletions.
156 changes: 40 additions & 116 deletions speechbrain/processing/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,13 +39,13 @@ Similarly to generating features on-the-fly, there are advantages to augmenting

The [`speechbrain/processing/speech_augmentation.py`](speechbrain/processing/speech_augmentation.py) file defines the set of augmentations for increasing the robustness of machine learning models, and for creating datasets for speech enhancement and other environment-related tasks. The current list of enhancements is below, with a link for each to an example of a config file with all options specified:

* Adding noise - [white noise example](cfg/minimal_examples/basic_processing/save_signals_with_noise.cfg) or [noise from csv file example](cfg/minimal_examples/basic_processing/save_signals_with_noise_csv.cfg)
* Adding reverberation - [reverb example](cfg/minimal_examples/basic_processing/save_signals_with_reverb.cfg)
* Adding babble - [babble example](cfg/minimal_examples/basic_processing/save_signals_with_babble.cfg)
* Speed perturbation - [perturbation example](cfg/minimal_examples/basic_processing/save_signals_with_speed_perturb.cfg)
* Dropping a frequency - [frequency drop example](cfg/minimal_examples/basic_processing/save_signals_with_drop_freq.cfg)
* Dropping chunks - [chunk drop example](cfg/minimal_examples/basic_processing/save_signals_with_drop_chunk.cfg)
* Clipping - [clipping example](cfg/minimal_examples/basic_processing/save_signals_with_clipping.cfg)
* Adding noise
* Adding reverberation
* Adding babble
* Speed perturbation
* Dropping a frequency
* Dropping chunks
* Clipping

In order to use these augmentations, a function is defined and used in the same way as the feature generation functions. More details about some important augmentations follows:

Expand All @@ -63,102 +63,34 @@ noise4, 17.65875, $noise_folder/noise4.wav, wav,
noise5, 13.685625, $noise_folder/noise5.wav, wav,
```

The function can then be defined in a configuration file to load data from the locations listed in this file. A simple example config file can be found at `cfg/minimal_examples/basic_processing/save_signals_with_noise_csv.cfg`, reproduced below:
The add_noise function can be defined in yaml:

```
[functions]
[add_noise]
class_name=speechbrain.processing.speech_augmentation.add_noise
csv_file=$noise_folder/noise_rel.csv
order=descending
batch_size=2
do_cache=True
snr_low=-6
snr_high=10
pad_noise=True
mix_prob=0.8
random_seed=0
[/add_noise]
[save]
class_name=speechbrain.data_io.data_io.save
sample_rate=16000
save_format=flac
parallel_write=True
[/save]
[/functions]
[computations]
id,wav,wav_len,*_=get_input_var()
wav_noise=add_noise(wav,wav_len)
save(wav_noise,id,wav_len)
[/computations]
# params.yaml
noise_folder: samples/noise_samples
add_noise: !speechbrain.processing.speech_augmentation.AddNoise
csv_file: !ref <noise_folder>/noise_rel.csv
replacements:
$noise_folder: !ref <noise_folder>
```

The `.csv` file is passed to this function through the csv_file parameter. This file will be processed in the same way that speech is processed, with ordering, batching, and caching options.

Adding noise has additional options that are not available to adding reverberation. The `snr_low` and `snr_high` parameters define a range of SNRs from which this function will randomly choose an SNR for mixing each sample. If the `pad_noise` parameter is `True`, any noise samples that are shorter than their respective speech clips will be replicated until the whole speech signal is covered.

## Adding Babble

Babble can be automatically generated by rotating samples in a batch and adding the samples at a high SNR. We provide this functionality, with similar SNR options to adding noise. The example from `cfg/minimal_examples/basic_processing/save_signals_with_babble.cfg` is reproduced here:
The `.csv` file is passed to this function through the csv_file parameter. This file will be processed in the same way that speech is processed, with ordering, batching, and caching options. When loaded, this function can be simply used to add noise:

```
[functions]
[add_babble]
class_name=speechbrain.processing.speech_augmentation.add_babble
speaker_count=4
snr_low=0
snr_high=10
mix_prob=0.8
random_seed=0
[/add_babble]
[save]
class_name=speechbrain.data_io.data_io.save
sample_rate=16000
save_format=flac
parallel_write=True
[/save]
[/functions]
[computations]
id,wav,wav_len,*_=get_input_var()
wav_noise=add_babble(wav,wav_len)
save(wav_noise,id,wav_len)
[/computations]
params = load_extended_yaml(open("params.yaml"))
noisy_wav = params.add_noise(wav)
```

The `speaker_count` option determines the number of speakers that are added to the mixture, before the SNR is determined. Once the babble mixture has been computed, a random SNR between `snr_low` and `snr_high` is computed, and the mixture is added at the appropriate level to the original speech. The batch size must be larger than the `speaker_count`.

Adding noise has additional options that are not available to adding reverberation. The `snr_low` and `snr_high` parameters define a range of SNRs from which this function will randomly choose an SNR for mixing each sample. If the `pad_noise` parameter is `True`, any noise samples that are shorter than their respective speech clips will be replicated until the whole speech signal is covered.

## Speed perturbation

Speed perturbation is a data augmentation strategy popularized by Kaldi. We provide it here with defaults that are similar to Kaldi's implementation. Our implementation is based on the included `resample` function, which comes from torchaudio. Our investigations showed that the implementation is efficient, since it is based on a polyphase filter that computes no more than the necessary information, and uses `conv1d` for fast convolutions. The example config is reproduced below:
Speed perturbation is a data augmentation strategy popularized by Kaldi. We provide it here with defaults that are similar to Kaldi's implementation. Our implementation is based on the included `resample` function, which comes from torchaudio. Our investigations showed that the implementation is efficient, since it is based on a polyphase filter that computes no more than the necessary information, and uses `conv1d` for fast convolutions.

```
[functions]
[speed_perturb]
class_name=speechbrain.processing.speech_augmentation.speed_perturb
orig_freq=$sample_rate
speeds=8,9,11,12
perturb_prob=0.8
random_seed=0
[/speed_perturb]
[save]
class_name=speechbrain.data_io.data_io.save
sample_rate=16000
save_format=flac
parallel_write=True
[/save]
[/functions]
[computations]
id,wav,wav_len,*_=get_input_var()
wav_perturb=speed_perturb(wav)
save(wav_perturb,id,wav_len)
[/computations]
# params.yaml
speed_perturb: !speechbrain.processing.speech_augmentation.SpeedPerturb
speeds: [9, 10, 11]
```

The `speeds` parameter takes a list of integers, which are divided by 10 to determine a fraction of the original speed. Of course the `resample` method can be used for arbitrary changes in speed, but simple ratios are more efficient. Passing 9, 10, and 11 for the `speeds` parameter (the default) mimics Kaldi's functionality.
Expand All @@ -169,30 +101,22 @@ The `speeds` parameter takes a list of integers, which are divided by 10 to dete
The remaining augmentations: dropping a frequency, dropping chunks, and clipping are straightforward. They augment the data by removing portions of the data so that a learning model does not rely too heavily on any one type of data. In addition, dropping frequencies and dropping chunks can be combined with speed perturbation to create an augmentation scheme very similar to SpecAugment. An example would be a config file like the following:

```
[functions]
[speed_perturb]
class_name=speechbrain.processing.speech_augmentation.speed_perturb
[/speed_perturb]
[drop_freq]
class_name=speechbrain.processing.speech_augmentation.drop_freq
[/drop_freq]
[drop_chunk]
class_name=speechbrain.processing.speech_augmentation.drop_chunk
[/drop_chunk]
[compute_STFT]
class_name=speechbrain.processing.features.STFT
[/compute_STFT]
[compute_spectrogram]
class_name=speechbrain.processing.features.spectrogram
[/compute_spectrogram]
[/functions]
[computations]
id,wav,wav_len,*_=get_input_var()
wav_perturb=speed_perturb(wav)
wav_drop=drop_freq(wav_perturb)
wav_chunk=drop_chunk(wav_drop,wav_len)
stft=compute_stft(wav_chunk)
augmented_spec=compute_spectrogram(stft)
[/computations]
# params.yaml
speed_perturb: !speechbrain.processing.speech_augmentation.speed_perturb
drop_freq: !speechbrain.processing.speech_augmentation.drop_freq
drop_chunk: !speechbrain.processing.speech_augmentation.drop_chunk
compute_stft: !speechbrain.processing.features.STFT
compute_spectrogram: !speechbrain.processing.features.spectrogram
```

```
# experiment.py
params = load_extended_yaml(open("params.yaml"))
def spec_augment(wav):
feat = speed_perturb(wav)
feat = drop_freq(feat)
feat = drop_chunk(feat)
feat = compute_stft(feat)
return compute_spectrogram(feat)
```

0 comments on commit 807eefd

Please sign in to comment.