Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding general data augmentation methods for speech preprocessing #5370

Merged
merged 25 commits into from
Aug 9, 2023
Merged
Show file tree
Hide file tree
Changes from 16 commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
e8fc9ae
Merge branch 'master' of github.com:espnet/espnet into tse
Emrys365 Jul 23, 2023
afd9cd8
Merge branch 'master' of github.com:espnet/espnet into tse
Emrys365 Jul 24, 2023
d38b5ff
Add general data augmentation methods for preprocessing
Emrys365 Jul 24, 2023
3f00c0d
Add config files for applying data augmentation
Emrys365 Jul 24, 2023
c5176c4
Add config files for applying data augmentation
Emrys365 Jul 24, 2023
54f43bf
Merge branch 'master' into tse
Emrys365 Jul 24, 2023
85b0741
Revert --num_workers 0
Emrys365 Jul 24, 2023
16f7887
Fix a typo
Emrys365 Jul 24, 2023
ca1be58
Update tasks
Emrys365 Jul 24, 2023
0963d3f
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 24, 2023
416cf6b
Update preprocessor
Emrys365 Jul 25, 2023
b93ef6b
Merge branch 'tse' of github.com:Emrys365/espnet into tse
Emrys365 Jul 25, 2023
525cb4f
Fix a bug in data augmentation
Emrys365 Jul 25, 2023
2b1aec3
Fix a bug in Enh data augmentation
Emrys365 Jul 25, 2023
866ce1e
Minor fix
Emrys365 Jul 25, 2023
70893b4
Fix egs2/mini_an4/enh1/conf/train_with_data_aug_debug.yaml
Emrys365 Jul 25, 2023
febb12a
Reflect comments
Emrys365 Jul 26, 2023
f0f366e
Update docstrings
Emrys365 Jul 26, 2023
ddf9097
Update ci/test_import_all.py to catch error information
Emrys365 Jul 27, 2023
e34868c
Update ci workflow to handle segmental fault
Emrys365 Jul 27, 2023
6e70c12
Merge master updates
Emrys365 Jul 27, 2023
56cd77f
Update arguments in preprocessors
Emrys365 Jul 27, 2023
5739182
Merge branch 'master' into tse
ftshijt Aug 1, 2023
3bc80ac
Update setup-python action in the job "test_import"
Emrys365 Aug 8, 2023
3a82677
Resolve conflicts
Emrys365 Aug 8, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
12 changes: 10 additions & 2 deletions ci/test_integration_espnet2.sh
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,12 @@ echo "==== feats_type=raw, token_types=bpe, model_conf.extract_feats_in_collect_
--feats_normalize "utterance_mvn" --python "${python}" \
--asr-args "--model_conf extract_feats_in_collect_stats=false --num_workers 0"

echo "==== feats_type=raw, token_types=bpe, model_conf.extract_feats_in_collect_stats=False, normalize=utt_mvn, with data augmentation ==="
./run.sh --ngpu 0 --stage 10 --stop-stage 13 --skip-upload false --feats-type "raw" --token-type "bpe" \
--asr_config "conf/train_asr_rnn_data_aug_debug.yaml" \
--feats_normalize "utterance_mvn" --python "${python}" \
--asr-args "--model_conf extract_feats_in_collect_stats=false --num_workers 0"

echo "==== use_streaming, feats_type=raw, token_types=bpe, model_conf.extract_feats_in_collect_stats=False, normalize=utt_mvn ==="
./run.sh --use_streaming true --ngpu 0 --stage 10 --stop-stage 13 --skip-upload false --feats-type "raw" --token-type "bpe" \
--feats_normalize "utterance_mvn" --python "${python}" \
Expand Down Expand Up @@ -171,8 +177,10 @@ if python -c 'import torch as t; from packaging.version import parse as L; asser
echo "==== feats_type=${t} with preprocessor ==="
./run.sh --ngpu 0 --stage 2 --stop-stage 10 --skip-upload false --feats-type "${t}" --ref-num 1 --python "${python}" \
--extra_wav_list "rirs.scp noises.scp" --enh_config ./conf/train_with_preprocessor_debug.yaml --enh-args "--num_workers 0"
./run.sh --ngpu 0 --stage 2 --stop-stage 10 --skip-upload false --feats-type "${t}" --ref-num 1 --python "${python}" \
--enh_config conf/train_with_dynamic_mixing_debug.yaml --ref-num 2 --enh-args "--num_workers 0"
./run.sh --ngpu 0 --stage 5 --stop-stage 10 --skip-upload false --feats-type "${t}" --ref-num 1 --python "${python}" \
--enh_config conf/train_with_data_aug_debug.yaml --enh-args "--num_workers 0"
./run.sh --ngpu 0 --stage 2 --stop-stage 10 --skip-upload false --feats-type "${t}" --ref-num 2 --python "${python}" \
sw005320 marked this conversation as resolved.
Show resolved Hide resolved
--enh_config conf/train_with_dynamic_mixing_debug.yaml --enh-args "--num_workers 0"
done
rm data/**/utt2category 2>/dev/null || true
rm -r dump
Expand Down
45 changes: 45 additions & 0 deletions egs2/mini_an4/asr1/conf/train_asr_rnn_data_aug_debug.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# This is a debug config for CI
encoder: vgg_rnn
encoder_conf:
num_layers: 1
hidden_size: 2
output_size: 2

decoder: rnn
decoder_conf:
hidden_size: 2

scheduler: reducelronplateau
scheduler_conf:
mode: min
factor: 0.5
patience: 1

use_preprocessor: true
preprocessor: default
preprocessor_conf:
fs: 16000
data_aug_effects: # no need to set the "sample_rate" argument for each effect here
- [0.1, "contrast", {"enhancement_amount": 75.0}]
- [0.1, "highpass", {"cutoff_freq": 5000, "Q": 0.707}]
- [0.1, "equalization", {"center_freq": 1000, "gain": 0, "Q": 0.707}]
- - 0.1
- - [0.3, "speed_perturb", {"factor": 0.9}]
- [0.3, "speed_perturb", {"factor": 1.1}]
- [0.3, "speed_perturb", {"factor": 1.3}]
data_aug_num: [1, 4]
data_aug_prob: 1.0


val_scheduler_criterion:
- valid
- loss
best_model_criterion:
- - valid
- acc
- max
keep_nbest_models: 1
max_epoch: 1
num_iters_per_epoch: 1
batch_type: folded
batch_size: 2
57 changes: 57 additions & 0 deletions egs2/mini_an4/enh1/conf/train_with_data_aug_debug.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
# This is a debug config for CI
encoder: stft
encoder_conf:
n_fft: 512
hop_length: 128

decoder: stft
decoder_conf:
n_fft: 512
hop_length: 128

separator: rnn
separator_conf:
rnn_type: blstm
num_spk: 1
nonlinear: relu
layer: 1
unit: 2
dropout: 0.2

preprocessor: enh
preprocessor_conf:
speech_volume_normalize: "0.5_1.0"
rir_scp: dump/raw/train_nodev/rirs.scp
rir_apply_prob: 1.0
noise_scp: dump/raw/train_nodev/noises.scp
noise_apply_prob: 1.0
noise_db_range: "5_20"
sample_rate: 16000
force_single_channel: true
categories:
- 1ch_16k
- 2ch_16k
data_aug_effects: # no need to set the "sample_rate" argument for each effect here
- [0.1, "contrast", {"enhancement_amount": 75.0}]
- [0.1, "highpass", {"cutoff_freq": 5000, "Q": 0.707}]
- - 0.1
- - [0.3, "clipping", {"min_quantile": 0.05, "max_quantile": 0.95}]
- [0.3, "corrupt_phase", {"scale": 0.1, "n_fft": 0.032, "hop_length": 0.008}]
data_aug_num: [1, 3]
data_aug_prob: 1.0

criterions:
# The first criterion
- name: mse
conf:
compute_on_mask: false
# the wrapper for the current criterion
# for single-talker case, we simplely use fixed_order wrapper
wrapper: fixed_order
wrapper_conf:
weight: 1.0

max_epoch: 1
num_iters_per_epoch: 1
batch_type: folded
batch_size: 2
2 changes: 1 addition & 1 deletion egs2/musdb18/enh1/conf/tuning/train_enh_conv_tasnet.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ decoder_conf:
stride: 10
separator: tcn
separator_conf:
num_spk: 2
num_spk: 4
layer: 8
stack: 4
bottleneck_dim: 256
Expand Down
5 changes: 4 additions & 1 deletion egs2/musdb18/enh1/run.sh
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,9 @@ num_dev=5000
num_eval=3000
sample_rate=16k

# 0, 1, 2, 3 represent drums, bass, vocals, and others, respectively.
ref_num=4


train_set="train_${sample_rate}"
valid_set="dev_${sample_rate}"
Expand All @@ -21,7 +24,7 @@ test_sets="test_${sample_rate} "
--test_sets "${test_sets}" \
--fs "${sample_rate}" \
--audio_format wav \
--ref_num 4 \
--ref_num ${ref_num} \
--lang en \
--ngpu 1 \
--local_data_opts "--sample_rate ${sample_rate} --num_train ${num_train} --num_dev ${num_dev} --num_eval ${num_eval}" \
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ unzip ${wdir}/spatialize_wsj0-mix.zip -d ${dir}
sed -i -e "s#data_in_root = './wsj0-mix/';#data_in_root = '${wsj0_2mix_wav}';#" \
-e "s#rir_root = './wsj0-mix/';#rir_root = '${wsj0_2mix_spatialized_wav}';#" \
-e "s#data_out_root = './wsj0-mix/';#data_out_root = '${wsj0_2mix_spatialized_wav}';#" \
-e "s#RIR-Generator-master/#RIR-Generator/" \
-e "s#RIR-Generator-master/#RIR-Generator/#" \
${dir}/spatialize_wsj0_mix.m

sed -i -e "s#MIN_OR_MAX=\"'min'\"#MIN_OR_MAX=\"'${min_or_max}'\"#" \
Expand Down