You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug: WaveformToFbankConverter is running in multithread parallel.
This method (as possibly some others) uses parallel_for statement for the execution.
Currently, there's no obvious way to control the number of threads it uses.
That could lead to some performance drawback (like threads/cpu oversubscription).
Moreover, it turns out that even when used inside DataPipeline.map(...),
it does not respect the number of required parallel calls.
Describe how to reproduce:
fromfairseq2.data.data_pipelineimportread_sequencefromfairseq2.data.audioimportWaveformToFbankConverter_convert_to_fbank=WaveformToFbankConverter(
num_mel_bins=80,
waveform_scale=2**15,
channel_last=True,
standardize=True,
device=torch.device("cpu"),
dtype=torch.float16)
defconvert_to_fbank(wav):
return_convert_to_fbank({"waveform": torch.unsqueeze(wav, 1),
"sample_rate": 16_000})['fbank'].shapexx= [torch.rand(10**5) foriinrange(100)]
data_pipeline=read_sequence(xx).map(convert_to_fbank, num_parallel_calls=1).and_return()
list(iter(data_pipeline)) # this will use typically half of available cpus
Describe the expected behavior:
a context manager to control number of threads the method uses (with fairseq2_nb_threads(2): ...)
make WaveformToFbankConverter respect num_parallel_calls in data pipelining
Describe the bug:
WaveformToFbankConverter
is running in multithread parallel.This method (as possibly some others) uses
parallel_for
statement for the execution.Currently, there's no obvious way to control the number of threads it uses.
That could lead to some performance drawback (like threads/cpu oversubscription).
Moreover, it turns out that even when used inside
DataPipeline.map(...)
,it does not respect the number of required parallel calls.
Describe how to reproduce:
Describe the expected behavior:
with fairseq2_nb_threads(2): ...
)WaveformToFbankConverter
respectnum_parallel_calls
in data pipeliningEnvironment:
fairseq2==0.1.1+cu118
fairseq2n==0.1.1+cu118
The text was updated successfully, but these errors were encountered: