Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding general data augmentation methods for speech preprocessing #5370

Merged
merged 25 commits into from Aug 9, 2023

Conversation

Emrys365
Copy link
Collaborator

@Emrys365 Emrys365 commented Jul 24, 2023

What?

This PR adds a series of data augmentation techniques for preprocessing speech data in various tasks:

The supported data augmentation techniques include:

effects_dict = {
    "lowpass": lowpass_filtering,
    "highpass": highpass_filtering,
    "bandpass": bandpass_filtering,
    "bandreject": bandreject_filtering,
    "contrast": contrast,
    "equalization": equalization_filtering,
    "pitch_shift": pitch_shift,
    "speed_perturb": speed_perturb,
    "time_stretch": time_stretch,
    "preemphasis": preemphasis,
    "deemphasis": deemphasis,
    "clipping": clipping,
    "polarity_inverse": polarity_inverse,
    "reverse": reverse,
    "corrupt_phase": corrupt_phase,
}

The data augmentation methods can be easily configured via the yaml file:

preprocessor: default
preprocessor_conf:
    fs: 16000
    data_aug_effects:   # no need to set the "sample_rate" argument for each effect here
        - [0.1, "contrast", {"enhancement_amount": 75.0}]
        - [0.1, "highpass", {"cutoff_freq": 5000, "Q": 0.707}]
        - [0.1, "equalization", {"center_freq": 1000, "gain": 0, "Q": 0.707}]
        - - 0.1
          - - [0.3, "speed_perturb", {"factor": 0.9}]
            - [0.3, "speed_perturb", {"factor": 1.1}]
    data_aug_num: [1, 4]
    data_aug_prob: 1.0

The above configuration will apply 1~4 different data augmentations during training, including "contrast", "highpass", "equalization", and "speed_perturb", each having a weight of 0.1 to be sampled.

Their arguments are specified respectively.

For the last augmentation, the two "speed_perturb" methods are mutually exclusive methods and only one will be applied.

Why?

Current preprocessors are not flexible to support applying various data augmentation methods at the same time.

@mergify
Copy link
Contributor

mergify bot commented Jul 24, 2023

This pull request is now in conflict :(

@mergify mergify bot added conflicts CI Travis, Circle CI, etc labels Jul 24, 2023
@mergify mergify bot removed the conflicts label Jul 24, 2023
@codecov
Copy link

codecov bot commented Jul 24, 2023

Codecov Report

Merging #5370 (3a82677) into master (ac8b312) will increase coverage by 0.05%.
The diff coverage is 86.29%.

@@            Coverage Diff             @@
##           master    #5370      +/-   ##
==========================================
+ Coverage   77.13%   77.19%   +0.05%     
==========================================
  Files         678      679       +1     
  Lines       61537    61703     +166     
==========================================
+ Hits        47465    47630     +165     
- Misses      14072    14073       +1     
Flag Coverage Δ
test_configuration_espnet2 ∅ <ø> (∅)
test_integration_espnet1 65.73% <ø> (ø)
test_integration_espnet2 48.54% <64.97%> (+0.12%) ⬆️
test_python_espnet1 20.26% <0.00%> (-0.06%) ⬇️
test_python_espnet2 52.08% <65.98%> (+0.06%) ⬆️
test_utils 23.10% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed Coverage Δ
espnet2/tasks/asr.py 91.08% <ø> (ø)
espnet2/tasks/enh_s2t.py 96.66% <0.00%> (ø)
espnet2/tasks/hubert.py 85.23% <ø> (ø)
espnet2/tasks/slu.py 91.83% <ø> (ø)
espnet2/tasks/st.py 88.31% <ø> (ø)
espnet2/tasks/uasr.py 0.00% <ø> (ø)
espnet2/train/preprocessor.py 81.02% <67.21%> (+1.29%) ⬆️
espnet2/layers/augmentation.py 95.34% <95.34%> (ø)
espnet2/tasks/enh.py 97.48% <100.00%> (+0.03%) ⬆️
espnet2/tasks/enh_tse.py 97.95% <100.00%> (+0.04%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@sw005320 sw005320 added this to the v.202307 milestone Jul 24, 2023
@sw005320
Copy link
Contributor

@Jungjee, can you review this PR?

-4 for shifting pitch down by 4/`bins_per_octave` octaves
4 for shifting pitch up by 4/`bins_per_octave` octaves
bins_per_octave (int): number of steps per octave
n_fft (int): length of FFT (in second)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

float?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for noticing that!

source_sample_rate = source_sample_rate // gcd
target_sample_rate = target_sample_rate // gcd

ret = torchaudio.functional.resample(
Copy link
Contributor

@Jungjee Jungjee Jul 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just questions.
Did you consider applying one without pitch shift? Would it cause severe more computation?
Also, how's the training training speed with this augment (bottlenecks in loading)?

Would time stretch equal to speed_perturb with factor > 1, except for the pitch ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

speed_perturb and time_stretch are two different time scaling method. The former changes the pitch while the latter does not. I think it is dependent on the use case. So I just provide both for the user to choose.

I haven't strictly tested the speed difference yet. Will do some test later.

waveform, n_fft, hop_length, win_length, window=window, return_complex=True
)
freq = spec.size(-2)
phase_advance = torch.linspace(0, math.pi * hop_length, freq)[..., None]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is [..., None] equivalent to .unsqueeze(-1) here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. These are just the same operations.

Returns:
ret (torch.Tensor): compressed signal (..., time)
"""
ret = torchaudio.functional.apply_codec(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about adding some warning or exception to not be called for unwanted torch version? (or in a different place because if you put that here, it can be called too often)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, I think I can just raise NotImplementedError for this function.

rir_path = np.random.choice(rirs)
rir = None
if rir_path is not None:
rir, _ = soundfile.read(rir_path, dtype=np.float64, always_2d=True)
rir, fs = soundfile.read(rir_path, dtype=np.float64, always_2d=True)
if tgt_fs and fs != tgt_fs:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe better to warn of raise something because sample rate mismatch may not be intended

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

@Jungjee Jungjee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, mostly added suggestions/questions, not mandatory.

@Emrys365
Copy link
Collaborator Author

LGTM, mostly added suggestions/questions, not mandatory.

Thanks a lot!

@Emrys365
Copy link
Collaborator Author

Strangely, I can locally pass the test in ci/test_import_all.py with Python 3.10.12 and PyTorch 1.13.1. I'm not sure what is the cause of the segmental fault in the CI test.

@kan-bayashi kan-bayashi modified the milestones: v.202307, v.202312 Aug 3, 2023
@mergify
Copy link
Contributor

mergify bot commented Aug 3, 2023

This pull request is now in conflict :(

@mergify mergify bot added the conflicts label Aug 3, 2023
@mergify mergify bot removed the conflicts label Aug 8, 2023
@sw005320
Copy link
Contributor

sw005320 commented Aug 9, 2023

Thanks, @Emrys365!

@sw005320 sw005320 merged commit 88050b2 into espnet:master Aug 9, 2023
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants