Adding general data augmentation methods for speech preprocessing #5370

Emrys365 · 2023-07-24T18:03:25Z

What?

This PR adds a series of data augmentation techniques for preprocessing speech data in various tasks:

The supported data augmentation techniques include:

effects_dict = {
    "lowpass": lowpass_filtering,
    "highpass": highpass_filtering,
    "bandpass": bandpass_filtering,
    "bandreject": bandreject_filtering,
    "contrast": contrast,
    "equalization": equalization_filtering,
    "pitch_shift": pitch_shift,
    "speed_perturb": speed_perturb,
    "time_stretch": time_stretch,
    "preemphasis": preemphasis,
    "deemphasis": deemphasis,
    "clipping": clipping,
    "polarity_inverse": polarity_inverse,
    "reverse": reverse,
    "corrupt_phase": corrupt_phase,
}

The data augmentation methods can be easily configured via the yaml file:

preprocessor: default
preprocessor_conf:
    fs: 16000
    data_aug_effects:   # no need to set the "sample_rate" argument for each effect here
        - [0.1, "contrast", {"enhancement_amount": 75.0}]
        - [0.1, "highpass", {"cutoff_freq": 5000, "Q": 0.707}]
        - [0.1, "equalization", {"center_freq": 1000, "gain": 0, "Q": 0.707}]
        - - 0.1
          - - [0.3, "speed_perturb", {"factor": 0.9}]
            - [0.3, "speed_perturb", {"factor": 1.1}]
    data_aug_num: [1, 4]
    data_aug_prob: 1.0

The above configuration will apply 1~4 different data augmentations during training, including "contrast", "highpass", "equalization", and "speed_perturb", each having a weight of 0.1 to be sampled.

Their arguments are specified respectively.

For the last augmentation, the two "speed_perturb" methods are mutually exclusive methods and only one will be applied.

Why?

Current preprocessors are not flexible to support applying various data augmentation methods at the same time.

mergify · 2023-07-24T18:03:59Z

This pull request is now in conflict :(

codecov · 2023-07-24T18:43:45Z

Codecov Report

Merging #5370 (3a82677) into master (ac8b312) will increase coverage by 0.05%.
The diff coverage is 86.29%.

@@            Coverage Diff             @@
##           master    #5370      +/-   ##
==========================================
+ Coverage   77.13%   77.19%   +0.05%     
==========================================
  Files         678      679       +1     
  Lines       61537    61703     +166     
==========================================
+ Hits        47465    47630     +165     
- Misses      14072    14073       +1

Flag	Coverage Δ
test_configuration_espnet2	`∅ <ø> (∅)`
test_integration_espnet1	`65.73% <ø> (ø)`
test_integration_espnet2	`48.54% <64.97%> (+0.12%)`	⬆️
test_python_espnet1	`20.26% <0.00%> (-0.06%)`	⬇️
test_python_espnet2	`52.08% <65.98%> (+0.06%)`	⬆️
test_utils	`23.10% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed	Coverage Δ
espnet2/tasks/asr.py	`91.08% <ø> (ø)`
espnet2/tasks/enh_s2t.py	`96.66% <0.00%> (ø)`
espnet2/tasks/hubert.py	`85.23% <ø> (ø)`
espnet2/tasks/slu.py	`91.83% <ø> (ø)`
espnet2/tasks/st.py	`88.31% <ø> (ø)`
espnet2/tasks/uasr.py	`0.00% <ø> (ø)`
espnet2/train/preprocessor.py	`81.02% <67.21%> (+1.29%)`	⬆️
espnet2/layers/augmentation.py	`95.34% <95.34%> (ø)`
espnet2/tasks/enh.py	`97.48% <100.00%> (+0.03%)`	⬆️
espnet2/tasks/enh_tse.py	`97.95% <100.00%> (+0.04%)`	⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

ci/test_integration_espnet2.sh

sw005320 · 2023-07-24T19:19:07Z

@Jungjee, can you review this PR?

for more information, see https://pre-commit.ci

Jungjee · 2023-07-25T20:11:25Z

espnet2/layers/augmentation.py

+            -4 for shifting pitch down by 4/`bins_per_octave` octaves
+            4 for shifting pitch up by 4/`bins_per_octave` octaves
+        bins_per_octave (int): number of steps per octave
+        n_fft (int): length of FFT (in second)


Thanks for noticing that!

Jungjee · 2023-07-25T20:12:49Z

espnet2/layers/augmentation.py

+    source_sample_rate = source_sample_rate // gcd
+    target_sample_rate = target_sample_rate // gcd
+
+    ret = torchaudio.functional.resample(


Just questions.
Did you consider applying one without pitch shift? Would it cause severe more computation?
Also, how's the training training speed with this augment (bottlenecks in loading)?

Would time stretch equal to speed_perturb with factor > 1, except for the pitch ?

speed_perturb and time_stretch are two different time scaling method. The former changes the pitch while the latter does not. I think it is dependent on the use case. So I just provide both for the user to choose.

I haven't strictly tested the speed difference yet. Will do some test later.

Jungjee · 2023-07-25T20:19:00Z

espnet2/layers/augmentation.py

+        waveform, n_fft, hop_length, win_length, window=window, return_complex=True
+    )
+    freq = spec.size(-2)
+    phase_advance = torch.linspace(0, math.pi * hop_length, freq)[..., None]


is [..., None] equivalent to .unsqueeze(-1) here?

Yes. These are just the same operations.

Jungjee · 2023-07-25T20:19:57Z

espnet2/layers/augmentation.py

+    Returns:
+        ret (torch.Tensor): compressed signal (..., time)
+    """
+    ret = torchaudio.functional.apply_codec(


how about adding some warning or exception to not be called for unwanted torch version? (or in a different place because if you put that here, it can be called too often)

For now, I think I can just raise NotImplementedError for this function.

Jungjee · 2023-07-25T20:22:57Z

espnet2/train/preprocessor.py

        rir_path = np.random.choice(rirs)
        rir = None
        if rir_path is not None:
-            rir, _ = soundfile.read(rir_path, dtype=np.float64, always_2d=True)
+            rir, fs = soundfile.read(rir_path, dtype=np.float64, always_2d=True)
+            if tgt_fs and fs != tgt_fs:


maybe better to warn of raise something because sample rate mismatch may not be intended

Jungjee

LGTM, mostly added suggestions/questions, not mandatory.

Emrys365 · 2023-07-26T04:35:34Z

LGTM, mostly added suggestions/questions, not mandatory.

Thanks a lot!

Emrys365 · 2023-07-27T17:53:14Z

Strangely, I can locally pass the test in ci/test_import_all.py with Python 3.10.12 and PyTorch 1.13.1. I'm not sure what is the cause of the segmental fault in the CI test.

mergify · 2023-08-03T19:35:35Z

This pull request is now in conflict :(

sw005320 · 2023-08-09T12:13:27Z

Thanks, @Emrys365!

Emrys365 added 5 commits July 23, 2023 10:09

Merge branch 'master' of github.com:espnet/espnet into tse

e8fc9ae

Merge branch 'master' of github.com:espnet/espnet into tse

afd9cd8

Add general data augmentation methods for preprocessing

d38b5ff

Add config files for applying data augmentation

3f00c0d

Add config files for applying data augmentation

c5176c4

Emrys365 added Recipe ESPnet2 Need review labels Jul 24, 2023

mergify bot added conflicts CI Travis, Circle CI, etc labels Jul 24, 2023

Merge branch 'master' into tse

54f43bf

mergify bot removed the conflicts label Jul 24, 2023

sw005320 added this to the v.202307 milestone Jul 24, 2023

sw005320 reviewed Jul 24, 2023

View reviewed changes

ci/test_integration_espnet2.sh Show resolved Hide resolved

Revert --num_workers 0

85b0741

Fix a typo

16f7887

Emrys365 force-pushed the tse branch from 175d4e9 to 16f7887 Compare July 24, 2023 20:09

Emrys365 and others added 8 commits July 24, 2023 17:19

Update tasks

ca1be58

[pre-commit.ci] auto fixes from pre-commit.com hooks

0963d3f

for more information, see https://pre-commit.ci

Update preprocessor

416cf6b

Merge branch 'tse' of github.com:Emrys365/espnet into tse

b93ef6b

Fix a bug in data augmentation

525cb4f

Fix a bug in Enh data augmentation

2b1aec3

Minor fix

866ce1e

Fix egs2/mini_an4/enh1/conf/train_with_data_aug_debug.yaml

70893b4

Jungjee reviewed Jul 25, 2023

View reviewed changes

Reflect comments

febb12a

Update docstrings

f0f366e

Emrys365 and others added 5 commits July 27, 2023 14:31

Update ci/test_import_all.py to catch error information

ddf9097

Update ci workflow to handle segmental fault

e34868c

Merge master updates

6e70c12

Update arguments in preprocessors

56cd77f

Merge branch 'master' into tse

5739182

kan-bayashi modified the milestones: v.202307, v.202312 Aug 3, 2023

mergify bot added the conflicts label Aug 3, 2023

Emrys365 added 2 commits August 8, 2023 14:44

Update setup-python action in the job "test_import"

3bc80ac

Resolve conflicts

3a82677

mergify bot removed the conflicts label Aug 8, 2023

sw005320 merged commit 88050b2 into espnet:master Aug 9, 2023
26 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding general data augmentation methods for speech preprocessing #5370

Adding general data augmentation methods for speech preprocessing #5370

Emrys365 commented Jul 24, 2023 •

edited

mergify bot commented Jul 24, 2023

codecov bot commented Jul 24, 2023 •

edited

sw005320 commented Jul 24, 2023

Jungjee Jul 25, 2023

Emrys365 Jul 26, 2023

Jungjee Jul 25, 2023 •

edited

Emrys365 Jul 26, 2023

Jungjee Jul 25, 2023

Emrys365 Jul 26, 2023

Jungjee Jul 25, 2023

Emrys365 Jul 26, 2023

Jungjee Jul 25, 2023

Emrys365 Jul 26, 2023

Jungjee left a comment

Emrys365 commented Jul 26, 2023

Emrys365 commented Jul 27, 2023

mergify bot commented Aug 3, 2023

sw005320 commented Aug 9, 2023

Adding general data augmentation methods for speech preprocessing #5370

Adding general data augmentation methods for speech preprocessing #5370

Conversation

Emrys365 commented Jul 24, 2023 • edited

What?

Why?

mergify bot commented Jul 24, 2023

codecov bot commented Jul 24, 2023 • edited

Codecov Report

sw005320 commented Jul 24, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Jungjee Jul 25, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Jungjee left a comment

Choose a reason for hiding this comment

Emrys365 commented Jul 26, 2023

Emrys365 commented Jul 27, 2023

mergify bot commented Aug 3, 2023

sw005320 commented Aug 9, 2023

Emrys365 commented Jul 24, 2023 •

edited

codecov bot commented Jul 24, 2023 •

edited

Jungjee Jul 25, 2023 •

edited