-
Notifications
You must be signed in to change notification settings - Fork 217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Augmentation refactoring and torchaudio SoX effects support #124
Conversation
It's good to review. |
@freewym please check this out - it will be small but breaking for the recipe you're creating, but hopefully makes the whole setup way easier (no need to compile libsox, wavaugment, etc. just install the latest pytorch + torchaudio with anaconda) and gets us rid of various quirks with multiprocessing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good regarding the usage of torchaudio's Sox Effects.
(glad to see the migration here! cc facebookresearch/WavAugment#16) |
Cool. I think I am supposed to install PyTorch 1.7 in order to test it. I will do later today. |
lhotse/augmentation/__init__.py
Outdated
from .common import AugmentFn | ||
from .wavaugment import * | ||
|
||
if str(_torchaudio.__version__) >= '0.7.0': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: If torchaudio
hits 0.10.0
release in the future (we do not know if we will move to the major 1.0 release or not), this could produce a wrong result.
$ python
>>> '0.7.0' > '0.10.0'
True
A future-proof way would be to split the version string and compare major version and minor version as number.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well spotted!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
packaging.version
may be helpful in this case.
See https://stackoverflow.com/questions/11887762/how-do-i-compare-version-numbers-in-python
lhotse/features/base.py
Outdated
@@ -10,7 +10,7 @@ | |||
import torch | |||
|
|||
from lhotse.audio import Recording | |||
from lhotse.augmentation import WavAugmenter | |||
from lhotse.augmentation import AugmentFn, WavAugmenter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WavAugmenter is no longer useful (?) If it is the case, all appearances of WavAugmenter should be removed within this file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I want to keep it for now so that people with PyTorch older than 1.7 can also use it, but I will probably add a deprecation warning...
] | ||
|
||
|
||
def reverb(sampling_rate: int) -> List[List[str]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to make such functions more general, in that they can accept more arguments, e.g., the lower/up bound of room sizes can been passed into this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I'll change that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I'd rather make a follow-up PR later on, as I'm not sure which parameters it makes sense to tweak and how general they should be. If we want to tweak everything it's simpler to just write your own chain...
(I'm open to suggestions)
lhotse/augmentation/torchaudio.py
Outdated
end: Union[int, float] | ||
|
||
def sample(self): | ||
return random.uniform(self.start, self.end) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we use numpy random functions? It may be easier for me to seed it from outside
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would using this function help (note that it also makes the random cut ID creation deterministic)?
https://github.com/lhotse-speech/lhotse/blob/master/lhotse/utils.py#L33
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, never mind. I will seed it in my own code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll change it to numpy anyway, I guess more people are used to seeding numpy
than random
in training loops
return [ | ||
# Random speed perturbation factor between 0.9x and 1.1x the original speed | ||
['speed', RandomValue(0.9, 1.1)], | ||
['rate', sampling_rate], # Resample back to the original sampling rate (speed changes it) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this line makes the running hang. It works without this line.
edit: actually not hang. It terminated with
"File "/export/fs04/a07/ywang/fairseq4/espresso/tools/lhotse/lhotse/cut.py", line 1311, in compute_and_store_features
executor.submit(
File "/export/b03/ywang/anaconda3/lib/python3.8/concurrent/futures/process.py", line 629, in submit
raise BrokenProcessPool(self._broken)
concurrent.futures.process.BrokenProcessPool: A child process terminated abruptly, the process pool is not usable anymore"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😫
I'll have a look
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was able to replicate the issue and add a unit test that causes it. I submitted the issue to torchaudio here: pytorch/audio#1021
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, if replacing RandomValue() above with a function _get_value(factor)
where factor
is simply returned, the running hangs as well. Do you have any clue of the cause?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the function defined as a closure (i.e. within another function) and captures some variable outside of its scope? That could explain it... Otherwise, I don't know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@freewym some good news, if you create executor like: ProcessPoolExecutor(..., mp_context=multiprocessing.get_context("spawn"))
it solves the segfault/hanging problem. Could you try? If it works I'll go on and merge this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Credits to @mthrok for suggesting this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(to make it clear: it works for me on the grid, on my mac, and in GitHub CI, so it should be okay)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it works!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
…/pzelasko/lhotse into feature/augmentation-refactoring
TL;DR
def augment_fn(audio: Union[torch.tensor, np.ndarray], sampling_rate: int) -> np.ndarray
WavAugment
capabilities withtorchaudio.sox_effects