New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support higher order derivatives for F.lfilter
#1441
Support higher order derivatives for F.lfilter
#1441
Conversation
update to new version
Accommodate changes in lfilter
Rebase to master
Get synced
Hi @yoyololicon Thanks for the PR. This is another nice addition. We really appreciate it! Couple of thoughts at a glance
|
@cpuhrsch CR please. |
Sure.
This change doesn't solve the stability issue, and should be a different topic. |
Hi @yoyololicon What is the performance implication of this change? I suspect that you made this PR because I asked about the second-order autograd in the other PR. From the description, it sounds like, this PR gets rid of the trick for efficient computation. Sorry you made this PR from a good heart and probably because of what I said, |
@mthrok Also no need to apologize, I can understand your situation. |
Thanks you for the nice comments.
Can you provide a simple benchmark about the change? |
I benchmark forward and backward pass seperatedly with different input sizes.
|
@yoyololicon - sorry for the long response time. It looks like this PR adds about doubles the runtime of the backward pass. I think it's useful to have higher order derivatives, but we should think of a mechanism to guard enabling higher order derivatives explicitly. I suppose we could have a flag for the forward pass, that allows the user to enable this, but this isn't quite the most principled approach to this. I wish we had an autograd guard or setting that restricts it to a certain order. Let me talk to a few people and get back to you on how to best resolve this, but for now I don't think it's worth the performance regression to be landed in the current form. |
Actually you already have this guard that exists that you can use during the backward to make the choice between the fast, non-differentiable implementation of the differentiable one. You can check that PR: https://github.com/pytorch/pytorch/pull/57189/files that does a similar thing. |
Thanks for the note @albanD! @yoyololicon, looks like this way we can have both! Do you want to give this a try? |
@albanD Thanks for the information. |
@cpuhrsch If the FIR is also written as custom function (just like previous implementation did), then runtime could be comparable.
The lfilter function would be like: torch::Tensor lfilter_core(
const torch::Tensor& waveform,
const torch::Tensor& a_coeffs,
const torch::Tensor& b_coeffs) {
TORCH_CHECK(waveform.device() == a_coeffs.device());
TORCH_CHECK(b_coeffs.device() == a_coeffs.device());
TORCH_CHECK(a_coeffs.size(0) == b_coeffs.size(0));
TORCH_INTERNAL_ASSERT(waveform.sizes().size() == 2);
int64_t n_order = b_coeffs.size(0);
TORCH_INTERNAL_ASSERT(n_order > 0);
auto filtered_waveform = DifferentiableFIR::apply(waveform, b_coeffs / a_coeffs[0]);
auto output =
DifferentiableIIR::apply(filtered_waveform, a_coeffs / a_coeffs[0]);
return output;
} But this looks a little rebundant to me, because the FIR is actually just |
@albanD - Could I ask you take a quick look again? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
at::AutoNonVariableTypeMode
should not used except for internal code though... So this is not great. But if that's the current state, we can move the discussion about removing these to a different issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! Let's wait for another review by Moto and unittests to run green (the mac failures are unrelated and we might even defer them for this merge).
I am more on learning from this PR than reviewing it, and nothing looks suspicious so I think it's good. |
Let me see what I can do with macOS situation. |
@yoyololicon I fixed macOS build issue in #1485. Can you rebase? There are some tests still failing for macOS but as long as it's not related to |
@yoyololicon Thank you so much for the contribution! |
* Update index.rst Test for adding a temporary survey option to the site. * Update index.rst * Update index.rst Co-authored-by: holly1238 <77758406+holly1238@users.noreply.github.com>
Purpose
Update the implementation (also some refactoring) of
lfilter
to make it be able to compute higher order derivatives, where current implementation only support first order.This PR is a little complex, I'll briefly explain its main idea.
Details
Current
lfilter
disable gradient when doing backward pass so it can use in-place operation to efficiently compute the gradient.audio/torchaudio/csrc/lfilter.cpp
Line 182 in 0fbfca5
But, in order to support higher order derivatives, the backward pass should also be differentiable so we can create new graph.
Decomposition of
lfilter
We can breakdown the computation of
lfilter
into two parts: non-autoregressive part (will call it FIR afterwards) and autoregressive part (will call it IIR afterwards).The FIR part involved with parameter
b_coeffs
and can be done by a singleconv1d
call, which natively support autograd; the IIR part involved with parametera_coeffs
and use in-place operation to speed up the computation, so a custom autograd function is needed.We will focusing on the latter part, leave the former to PyTorch native autograd mechanism.
Some Facts About Gradient of IIR
waveform
) equals IIR filtering the gradient with respect to IIR output in reversed time direction with the samea_coeffs
a_coeffs
equals IIR filtering the opposite of IIR output with the samea_coeffs
a_coeffs
equals convolve (2) with the gradient with respect to IIR output usingconv1d
Recursive Differentiable Backward Pass
How to make the backward pass of IIR be differentiable? Base on previous part, we can see that the backward pass consists of 2 call of IIR and 1 call of
conv1d
. Because IIR is now differentiable, it can just call itself two times in its backward pass, with gradient being enable. In this way the backward computational graph can be created.Additional context
To verify the above changes, I add a
gradgradcheck
in unittests, also lower the size of test inputs cuz the ci runtime is getting longer and longer.