Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

controllable WaveformToFbankConverter multithreading #161

Open
artemru opened this issue Nov 16, 2023 · 0 comments
Open

controllable WaveformToFbankConverter multithreading #161

artemru opened this issue Nov 16, 2023 · 0 comments
Labels
bug Something isn't working

Comments

@artemru
Copy link
Contributor

artemru commented Nov 16, 2023

Describe the bug:
WaveformToFbankConverter is running in multithread parallel.
This method (as possibly some others) uses parallel_for statement for the execution.
Currently, there's no obvious way to control the number of threads it uses.
That could lead to some performance drawback (like threads/cpu oversubscription).
Moreover, it turns out that even when used inside DataPipeline.map(...),
it does not respect the number of required parallel calls.

Describe how to reproduce:

from fairseq2.data.data_pipeline import read_sequence
from fairseq2.data.audio import WaveformToFbankConverter

_convert_to_fbank = WaveformToFbankConverter(
                num_mel_bins=80,
                waveform_scale=2**15,
                channel_last=True,
                standardize=True,
                device=torch.device("cpu"),
                dtype=torch.float16)
def convert_to_fbank(wav):
    return _convert_to_fbank({"waveform": torch.unsqueeze(wav, 1),
                              "sample_rate": 16_000})['fbank'].shape

xx = [torch.rand(10**5) for i in range(100)]
data_pipeline = read_sequence(xx).map(convert_to_fbank, num_parallel_calls=1).and_return()
list(iter(data_pipeline))  # this will use typically half of available cpus

Describe the expected behavior:

  • a context manager to control number of threads the method uses (with fairseq2_nb_threads(2): ...)
  • make WaveformToFbankConverter respect num_parallel_calls in data pipelining

Environment:
fairseq2==0.1.1+cu118
fairseq2n==0.1.1+cu118

@artemru artemru added the bug Something isn't working label Nov 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant