New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the derivative of a _diff_wrt object with respect to itself, 1 #14221

Closed
wants to merge 4 commits into
base: master
from

Conversation

Projects
None yet
7 participants
@varkor

varkor commented Feb 15, 2018

This fixes #10150. Now taking the derivative of an object with _diff_wrt = True, with respect to itself, returns 1 rather than the unevaluated derivative.

@jaredkhan

This comment has been minimized.

jaredkhan commented Feb 15, 2018

@varkor Build failed, looks like this was due to an internet connection failure so worth a retest

@varkor

This comment has been minimized.

varkor commented Feb 15, 2018

Yeah, the failure seems spurious to me. I'll try to retrigger the test by closing and reopening the PR.

@varkor varkor closed this Feb 15, 2018

@varkor varkor reopened this Feb 15, 2018

@varkor varkor force-pushed the varkor:_diff_wrt_itself branch from 0a4f4c1 to ed9d895 Feb 15, 2018

@moosichu

This comment has been minimized.

moosichu commented Feb 17, 2018

I'm keen to see how this progresses!

@@ -1213,6 +1213,10 @@ def __new__(cls, expr, *variables, **kwargs):
# derivative class by calling Expr.__new__.
if (not (hasattr(expr, '_eval_derivative') and evaluate) and
(not isinstance(expr, Derivative))):
# If we try to differentiate a variable with respect to itself,
# the result should always be 1.

This comment has been minimized.

@Upabjojr

Upabjojr Feb 20, 2018

Contributor

Well, always is not correct. Think about matrices.

This comment has been minimized.

@varkor

varkor Feb 20, 2018

While it's true that, in a general mathematical basis, there are concepts called differentiation for which it is not the case that d/dx (x) == 1, it is consistent with SymPy's usage. For example, diff(2 * x, x) == 2, even if x is completely unknown — which, likewise, is not valid for matrices. On top of this, SymPy can't (currently) even differentiate matrices with respect to other matrices, so this particular example probably isn't going to cause any issues directly.

I agree in general, but SymPy makes enough assumptions here that I think it's fair to interpret the comment in the context of the diff function.

This comment has been minimized.

@asmeurer

asmeurer Feb 20, 2018

Member

On top of this, SymPy can't (currently) even differentiate matrices with respect to other matrices, so this particular example probably isn't going to cause any issues directly.

This is something we are trying to implement, though.

Matrices are going to have to define _eval_derivative anyway. So the question is should we also require every special _diff_wrt object to define this as well, or should we implement the common case here so that it isn't required for most things (at least, anything whose derivative is a scalar).

This comment has been minimized.

@varkor

varkor Feb 20, 2018

So the question is should we also require every special _diff_wrt object to define this as well, or should we implement the common case here so that it isn't required for most things (at least, anything whose derivative is a scalar).

Yeah — this comes down to a decision decision: treating values as scalars by default or not. Reasonably either this case should be handled, or all the current scalar assumptions should be removed.

@@ -1213,6 +1213,10 @@ def __new__(cls, expr, *variables, **kwargs):
# derivative class by calling Expr.__new__.
if (not (hasattr(expr, '_eval_derivative') and evaluate) and

This comment has been minimized.

@asmeurer

asmeurer Feb 20, 2018

Member

I guess this logic was already here, but this seems like a bit of a code smell to me. I think this logic would be better implemented as a default method on the base class, and a custom method on Derivative. Then Python's inheritance would take care of which logic gets used automatically. It also makes it easy to extend the logic because a subclass could implement _eval_derivative and still call the superclass's version.

This also clarifies the other discussion about whether a scalar default makes sense. Virtually all of the Expr methods assume scalars, so it's very reasonable for Expr's default _eval_derivative to assume a scalar output (i.e., 1).

This doesn't necessarily need to be changed here, because like I said, it is already here, but it is something that I think should be done.

This comment has been minimized.

@varkor

varkor Feb 22, 2018

I've altered the technique now. Instead of special-casing in Derivative.__new__, I've moved the "cannot compute derivative" check and return to after dummy symbols are substituted. This has the effect of giving the correct result when differentiating _diff_wrt objects with respect to themselves, because they now inherit the AtomicExpr._eval_derivative_n_times method, which has this behaviour.

Adding a default _eval_derivative caused the test_diff_no_eval_derivative test to fail, which might have been expected, but I didn't feel familiar enough with the desired semantics there to make a breaking change. I'm still not completely certain this transformation is correct (let's see what the tests say), but if it is, then it seems like a cleaner change, without introducing an extra special case where it doesn't belong.

Does this method seem cleaner to you?

Late exit on no _eval_derivative
We now only return an unevaluated derivative if we could not differentiate with respect to any variables after dummy replacement.

@varkor varkor force-pushed the varkor:_diff_wrt_itself branch from 2f2739e to 569342b Feb 22, 2018

@varkor

This comment has been minimized.

varkor commented Mar 13, 2018

@asmeurer: what's your take on the new method to fix this?

@varkor

This comment has been minimized.

varkor commented Apr 29, 2018

@asmeurer (or anyone else): any update on the state of this PR?

@Upabjojr

This comment has been minimized.

Contributor

Upabjojr commented Apr 30, 2018

I'm not very convinced by this PR. I would leave an unevaluated derivative unless the expression is real or complex (check, I'm not even sure about that).

@varkor

This comment has been minimized.

varkor commented Apr 30, 2018

@Upabjojr: could you explain what you mean a bit more? I don't see how the change makes any difference to the behaviour of expressions that are real/complex vs not.

@Upabjojr

This comment has been minimized.

Contributor

Upabjojr commented Apr 30, 2018

A generic expression should not be evaluated if derived by itself, because you don't know what the expression is or what it represents.

@varkor

This comment has been minimized.

varkor commented Apr 30, 2018

@Upabjojr: oh, I see — you don't agree with #10150? I think it makes sense, as in the vast majority of cases this identity holds (the example given of a case where this is not true for example, with matrices, is obscure — for any scalar-type variable, this holds). If a type for which this identity does not hold is defined, then this behaviour will regardless be overridden. I don't see an issue with providing a more sensible default (especially as this PR specifically simply makes a very subtle modification to the algorithm — the underlying rules governing this behaviour already existed).

@Upabjojr

This comment has been minimized.

Contributor

Upabjojr commented Apr 30, 2018

I'm not so sure about that design pattern, i.e. overriding a method. People may forget to do that when they inherit a class.

@varkor

This comment has been minimized.

varkor commented Apr 30, 2018

@Upabjojr: how are you expecting people to define differentiation on their custom class other than overriding a method? Surely without overriding any methods, the behaviour of differentiation is just going to be the default (i.e. scalar differentiation), in which case the behaviour this PR enables is more consistent.

@asmeurer

This comment has been minimized.

Member

asmeurer commented Apr 30, 2018

@Upabjojr so do you think the original issue is invalid? I guess a counterexample is Derivative(x, x) where x is a matrix, which should be an identity matrix (yes?), not the scalar 1.

I think there's two ways we can think of the defaults on Derivative. One is to always be unevaluated, except in cases where where something is definitional. The other is to give behavior that is right for 90% of classes that define derivatives, and require the other 10% to override more methods to get what it wants. So the question is, should we assume that diff(t, t) = 1, which is probably one of those 90% cases. The advantage of the first way is that you never get a wrong result if you forget to override a method. The advantage of the second way is that most classes don't need to override any methods at all (just setting _diff_wrt = True is enough).

Maybe the best solution is to go with the first way and use mixins if there are common method redefinitions.

@varkor

This comment has been minimized.

varkor commented Apr 30, 2018

I've never personally used custom derivative classes, so I'll defer to those who have. Going with the first option here would require changing the existing behaviour (e.g. which currently treats diff(2 * x, x) == 2) though, if I understand correctly?

(Side note: such a change seems like it would have quite large implications — what's the policy on breaking changes like this?)

@asmeurer

This comment has been minimized.

Member

asmeurer commented Apr 30, 2018

For a matrix 2*x would itself be a matrix object. I'm not sure if there are cases where 2*x is a Mul but we want diff(2*x, x) to be something other than 2. I would worry about it only if such a thing comes up.

@varkor

This comment has been minimized.

varkor commented Apr 30, 2018

@asmeurer: 2 * x is a matrix, but the derivative of a matrix with respect to another matrix is a fourth-rank tensor, not a scalar (which was the problem with just x too). As far as I can tell, any decision that affects x also affects 2 * x, but they're currently not treated consistently.

@Upabjojr

This comment has been minimized.

Contributor

Upabjojr commented May 1, 2018

We don't have a proper way to handle diff(x, x) when x is a matrix. It can be represented with Kroneecker deltas in the indexed form. A generalization of the identity matrix to higher dimensional arrays would be necessary.

@varkor

This comment has been minimized.

varkor commented May 1, 2018

@Upabjojr: SymPy might not currently support differentiation with respect to matrices, but the point was that someone writing a custom class could do something like that. If matrices aren't an issue, then what other examples do you know of where the identity diff(x, x) == 1 does not hold?

Regardless, my point is that diff(2x, x) == 2 (currently behaviour) is consistent with diff(x, x) == 1 and that either one needs to change. I'm personally in favour of making it more convenient for 95% of users writing their own classes and provide a sensible default. You can see this PR changes very little code to achieve this — the logic is already there. I think this is a reasonable change and if you disagree with the current defaults, that's arguably a different issue, because SymPy already makes these assumptions. It's just not thorough enough.

@asmeurer

This comment has been minimized.

Member

asmeurer commented May 1, 2018

If you make diff(x, x) unevaluated, then does it change diff(2*x, x)? I think it might, because Mul._eval_derivative just implements the naive product rule, so it would be the same as diff(2, x)*x + 2*diff(x, x).

@varkor

This comment has been minimized.

varkor commented May 1, 2018

@asmeurer: I'm not sure I understand. diff(x, x) is unevaluated in the current version of SymPy (before this PR), but at the same time diff(2 * x, x) == 2.

Mul._eval_derivative just implements the naive product rule, so it would be the same as diff(2, x)*x + 2*diff(x, x).

That's what one would expect, but it's not what currently happens :/

The differentiation code is a little hard to follow, but what currents happens (as in your comment in the original issue) is that, for a default class Test(Expr): diff_wrt_ = True object:

diff(t, t) ==
   d
───────(Test())
dTest()

diff(2 * t, t) ==
2

So even if diff(t, t) is unevaluated, diff(2 * t, t) is not.

So really the question is whether we should make diff(t, t) == 1 or diff(2 * t, t) == 2 d/dTest() (Test()) by default.

Considering the majority of cases are going to want scalar differentiation, and if someone doesn't want scalar differentiation, they're going to be overriding differentiation regardless, I don't really see a strong reason not to want the former behaviour.

@asmeurer

This comment has been minimized.

Member

asmeurer commented May 1, 2018

Ah I see. It looks like it's because it gets replaced with a Dummy here before the actual differentiation takes place.

I guess the current definition of _diff_wrt is based on substituting it with a Symbol (the original design was to take derivatives with respect to functions like diff(F(x, f(x)), f(x))).

I'm not sure what the right fix here. Probably the substitution bit should be abstracted out in a method that can also be overridden by the wrt object.

@Upabjojr

This comment has been minimized.

Contributor

Upabjojr commented May 2, 2018

I am still favorable to the unevaluated differentiation. But if you really insist we can go your way.

SymPy might not currently support differentiation with respect to matrices

I have added some support for that at the end of last year, though it's probably still not complete.

@asmeurer

This comment has been minimized.

Member

asmeurer commented May 4, 2018

Matrix differentiation is definitely on the roadmap.

I think either way could be argued, though I'm leaning towards @Upabjojr's side that the default should just be unevaluated expressions. I think it should be possible to fix the diff(2*t, t) thing but I'm not sure how.

By the way, another thing to consider is performance. We've already seen some of these derivative abstractions affect the performance of regular derivatives (#14674).

@varkor

This comment has been minimized.

varkor commented May 15, 2018

Sorry, I haven't had time to look at this for a little while!

@asmeurer: okay — I can probably get around to fixing the diff(2 * t, t) issue at some point, but not for a few weeks.

In the meantime, feel free to merge or close this PR: it's not going to alter performance, but adds consistency for now, before the default method is changed. If you'd rather not merge a temporary fix, though, that's fine too. I'll open up an issue (edit: #14719) for the new derivative behaviour, so that it's at least documented.

@moorepants

This comment has been minimized.

Member

moorepants commented Oct 7, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment