Refactor PartialDerivatives to contain single partial #413

gordoncaleb · 2018-12-19T15:54:39Z

This PR contains quite a big refactor that does not contain many functional changes but significantly decreases the complexity of the forward and reverse mode auto diff logic. These refactors include:

PartialDerivatives has been renamed PartialDerivative and no longer contains multiple partial derivatives.
Changed the way forward mode auto diff is calculated in order to avoid needing to store more than a single partial derivative with respect to anything at a time.
Operations that support operands that broadcast (e.g. +-*/) are now responsible for correcting shape changes due to that broadcast. This removes the need for lots of complex code to handle many edge cases in the partial multiply and in the reverse mode autodiff algorithm itself.
The Differentiator reverse mode method now returns a PartialsOf object that contains a collection of partials with respect to many inputs but always of a single output. For example, Differentiator.reverseModeAutoDiff(A, B, C) would get the derivative of A with respect to B AND C. The object returned from this contains a withRespectTo(...) that would return the derivative of A with respect to B OR C.
The Differentiator forward mode method now returns a PartialsWithRespectTo object that like the PartialsOf class contains a collection of partials but are all with respect to the same input but of many outputs. For example, Differentiator.forwardModeAutoDiff(A, B, C) would get the derivative of B and C with respect to A. The object returned from this contains a of(...) that would return the derivative of a single B OR C with respect to A.
Differentiable::getDerivativeWrtLatents used to be backed by forward mode auto diff and used quite heavily in our tests. It has been removed and replaced with a direct call to the forward mode algorithm (Differentiator::forwardModeAutoDiff).
Some operations have had their forward/reverse mode AD code refactored to reflect new guarantees provided by the fact that there is only a single partial derivative to deal with at a time.

…e similar to reverse

… store more than one partial in the partial derivatives class

…e can remove the map that it contains.

…e the class to store multiple partials

…ials in the PartialDerivatives class

…no longer wrt anything

…ns a single partial now

…artials can be created inside PartialDerivative

…e uses of the class

… map of multiple partials but instead a single present or absent partial

…ZERO and make sure ops on it return correctly

… instead

…ts the place you know what to do

…e it's clearer what to do

… name

… instead of multiple

…instead of multiple

GeorgeNash

Nice refactor man. Few comments

keanu-project/src/main/java/io/improbable/keanu/vertices/dbl/Differentiator.java

.../src/main/java/io/improbable/keanu/vertices/dbl/nonprobabilistic/diff/AutoDiffBroadcast.java

...t/src/main/java/io/improbable/keanu/vertices/dbl/nonprobabilistic/diff/LogProbGradients.java

GeorgeNash · 2018-12-21T10:45:20Z

.../src/main/java/io/improbable/keanu/vertices/dbl/nonprobabilistic/diff/PartialDerivative.java

+    public static PartialDerivative matrixMultiplyAlongOfDimensions(PartialDerivative partial, DoubleTensor multiplier, boolean partialIsLeft) {
+
+        if (partial.isEmpty()) {
+            return partial;


why does this return partial and not this?

Because it's static and there is no this. It's the same concept though.

nice - missed it was static

Any reason why these 2 methods are static when all the others aren't?

They're static in order to allow the 2nd tensor being multiplied by to be a DoubleTensor. The other option was to make it a method on the partial but keep the partialIsLeft flag. That gives you an api where you have A.matrixMultiplyAlongOfDimensions(B, false) looking like AB but due to the flag would actually be BA, which I thought was too confusing. If you make both A and B a PartialDerivative then you have to create an extra PartialDerivative for each multiply in order to account for AB and BA due to the fact that either A or B will be a plain old DoubleTensor. I've taken another look at this today and there might be a nicer way to organise this but all the alternatives are some combination of less clear and less performant. We can revisit this when we graph-ify everything.

FWIW - I like this explanation.

...project/src/main/java/io/improbable/keanu/vertices/dbl/nonprobabilistic/diff/PartialsOf.java

migwellian

Good stuff. My comments are very minor.

keanu-project/src/main/java/io/improbable/keanu/vertices/dbl/Differentiator.java

.../src/main/java/io/improbable/keanu/vertices/dbl/nonprobabilistic/diff/AutoDiffBroadcast.java

.../src/main/java/io/improbable/keanu/vertices/dbl/nonprobabilistic/diff/PartialDerivative.java

migwellian · 2018-12-22T00:42:58Z

.../src/main/java/io/improbable/keanu/vertices/dbl/nonprobabilistic/diff/PartialDerivative.java

+    public static PartialDerivative matrixMultiplyAlongOfDimensions(PartialDerivative partial, DoubleTensor multiplier, boolean partialIsLeft) {
+
+        if (partial.isEmpty()) {
+            return partial;


Any reason why these 2 methods are static when all the others aren't?

...project/src/main/java/io/improbable/keanu/vertices/dbl/nonprobabilistic/diff/PartialsOf.java

.../main/java/io/improbable/keanu/vertices/dbl/nonprobabilistic/diff/PartialsWithRespectTo.java

keanu-project/src/test/java/io/improbable/keanu/vertices/dbl/DifferentiatorTest.java

christophernorth

Lots cleaner and easier to navigate now IMO. Just a few small comments scattered around and potentially one bug (due to our confusingly named Tensor functions ;-)

keanu-project/src/main/java/io/improbable/keanu/vertices/dbl/Differentiable.java

keanu-project/src/main/java/io/improbable/keanu/vertices/dbl/Differentiator.java

christophernorth · 2019-01-02T15:06:25Z

.../src/main/java/io/improbable/keanu/vertices/dbl/nonprobabilistic/diff/AutoDiffBroadcast.java

+
+/**
+ * This class is meant to help with auto diff in operations that support implicit broadcasting. E.g. In
+ * addition/subtraction/multiplication/division scalar operands can be operated with non-scalar operands.


Obviously it's not just scalars that can be broadcast and lead to implicit partial changes - it's also compatible Tensors - presumably given the naming that's still a bug we have to fix at some point?

Those non-scalar broadcast aren't supported yet and strictly prohibited through the shape check in the operations that could broadcast. When we add that support then this code will need to be tweaked to support the new cases.

Cool stuff - If you see that as a bigger refactor, no bother. I wonder if there's anything we can do to mark this as something that will need fixing if we add broadcasting properly so this isn't missed?

christophernorth · 2019-01-02T15:12:45Z

...t/src/main/java/io/improbable/keanu/vertices/dbl/nonprobabilistic/diff/LogProbGradients.java

+            if (existingPartialDerivative == null) {
+                partials.put(id, entry.getValue().duplicate());
+            } else {
+                existingPartialDerivative.plusInPlace(entry.getValue());


Don't we have a potential bug hiding here now? plusInPlace sometimes doesn't actually do things in place...

At the moment it's impossible for the existing partial to have a different shape to the next partial. All partials at this point are the correct deterministic shape [of, wrt]. That being said, the inPlace contract doesn't guarantee that the object will always be the same. I've gone with the double put to make sure this is never an issue.

I think better to be safe than sorry definitely!

christophernorth · 2019-01-02T15:17:55Z

.../src/main/java/io/improbable/keanu/vertices/dbl/nonprobabilistic/diff/PartialDerivative.java

+    public static PartialDerivative matrixMultiplyAlongOfDimensions(PartialDerivative partial, DoubleTensor multiplier, boolean partialIsLeft) {
+
+        if (partial.isEmpty()) {
+            return partial;


FWIW - I like this explanation.

christophernorth · 2019-01-02T15:21:13Z

...project/src/main/java/io/improbable/keanu/vertices/dbl/nonprobabilistic/diff/PartialsOf.java

+        final Map<VertexId, DoubleTensor> tensorMap = new HashMap<>();
+
+        for (Map.Entry<VertexId, PartialDerivative> entry : partials.entrySet()) {
+            tensorMap.put(entry.getKey(), entry.getValue().get());


Given this is in the hot path of reverse mode auto-diff - I'm not sure I love having to do this translation from one map type to another - just seems to be adding busy work to avoid other people seeing PartialDerivatives rather than DoubleTensors ?

I tried avoiding this earlier but gave up. I've refactored it to do the conversion at the point the double tensor is used. This avoids the double iteration and the new object creation.

christophernorth · 2019-01-02T15:29:48Z

keanu-project/src/test/java/io/improbable/keanu/util/io/ProtobufTest.java

-        DoubleIfVertex outputVertex = (DoubleIfVertex)complexNet.getVertexByLabel(new VertexLabel(OUTPUT_NAME));
-        DoubleVertex inputVertex = (DoubleVertex)complexNet.getVertexByLabel(new VertexLabel(INPUT_NAME));
+        DoubleIfVertex outputVertex = (DoubleIfVertex) complexNet.getVertexByLabel(new VertexLabel(OUTPUT_NAME));
+        DoubleVertex inputVertex = (DoubleVertex) complexNet.getVertexByLabel(new VertexLabel(INPUT_NAME));


Should we just cast this straight to the proper vertex type to avoid having to do the cast in the Differentiator below?

Nice clean up.

christophernorth · 2019-01-02T15:31:21Z

keanu-project/src/test/java/io/improbable/keanu/vertices/dbl/DifferentiatorEfficiencyTest.java

@@ -0,0 +1,40 @@
+package io.improbable.keanu.vertices.dbl;


Do you think it would be useful to wrap this as a Benchmark so we can monitor changes to this perf?

This tests "efficiency", which just means it's not doing any more calls to autodiff than needed. I think the benchmarks should try quite a few more complex graphs that vary in length, depth, width, etc. The benchmarks could use this for inspiration but I don't think there's much benefit to coupling the benchmarks and this unit test.

christophernorth · 2019-01-02T15:32:05Z

keanu-project/src/test/java/io/improbable/keanu/vertices/dbl/DifferentiatorTest.java

-        DoubleTensor dCdA = dC.withRespectTo(A);
-        DoubleTensor dCdB = dC.withRespectTo(B);
+        DoubleTensor dCdA = dC.withRespectTo(A).get();
+        DoubleTensor dCdB = dC.withRespectTo(B).get();


Bit ugly we have to call this get() method all over the place?

If you want it as a DoubleTensor then yes. We could call it toDoubleTensor() or we could just return the DoubleTensor from the withRespectTo(...) method. It doesn't look like the withRespectTo(...) method is ever used to get the PartialDerivative.

Having withRespectTo(...) return a DoubleTensor removes all of these get() calls. Nice clean up.

…es the underlying double tensor anyways

christophernorth

LGTM. Thanks for the changes.

gordoncaleb added 30 commits December 11, 2018 17:30

explicitly use forward mode autodiff in tests

adf8757

remove old forward mode autodiff algo in favor of new one that is mor…

36a9702

…e similar to reverse

reverse mode autodiff to return partials of object and remove need to…

f78622b

… store more than one partial in the partial derivatives class

Merge branch 'develop' into feature/single-partial

f3c3324

never use PartialDerivatives with more than a single wrt or of. Now w…

4069ad4

…e can remove the map that it contains.

Remove as map from partial derivative. This removes the ability to us…

7abf971

…e the class to store multiple partials

remove partial derivatives constructor that could allow multiple part…

3bbce5c

…ials in the PartialDerivatives class

No longer store multiple partials in the PartialDerivative class.

032ce0f

Format changes I missed. I love spotless.

0e32cee

remove get with respect to on partial derivative class because it is …

326e678

…no longer wrt anything

rename PartialDerivatives to PartialDerivative because it only contai…

f949d93

…ns a single partial now

make null PartialDerivative constructor private to ensure only null p…

783190c

…artials can be created inside PartialDerivative

remove public clone on PartialDerivative in order to minimize creativ…

0cc52ed

…e uses of the class

rename PartialDerivative isEmpty to isPresent since it is no longer a…

e550ea0

… map of multiple partials but instead a single present or absent partial

Merge branch 'develop' into feature/single-partial

c285a32

use isEmpty and isPresent in PartialDerviative to make it more clear

e88f123

ensure that PartialsOf is never used with a null of

3969a9c

rename PartialDerivative.OF_CONSTANT to the more appropriately named …

372cd00

…ZERO and make sure ops on it return correctly

get rid of isKey method on PartialDerivative and check if its present…

43cb4c5

… instead

put scalar plus non-scalar partial diff logic in add vertex because i…

d6383ab

…ts the place you know what to do

put scalar and non-scalar subtraction logic in difference vertex wher…

8a086a4

…e it's clearer what to do

call PartialDerivative ZERO EMPTY instead to match the isEmpty method…

b598e6a

… name

no need to check for missing partials in div vertex

2dac1ea

no need to check for missing partials in mul vertex

4b9acc8

no need to check for missing partials in matmul vertex

7d35843

use defualt autodiff method in all distributions

13a2e4f

wrap log prob gradients in its own class instead of just a map

31b5df1

change reverse mode ad to imply there is only a single output partial…

e88b8ed

… instead of multiple

change forward mode ad to imply there is only a single input partial …

f322854

…instead of multiple

refactor forward mode ad concat code

c85990e

Merge branch 'develop' into feature/single-partial

139c477

GeorgeNash self-requested a review December 21, 2018 10:11

christophernorth self-requested a review December 21, 2018 10:14

gordoncaleb mentioned this pull request Dec 21, 2018

SliceVertex value has different size than vertex #417

Closed

GeorgeNash reviewed Dec 21, 2018

View reviewed changes

gordoncaleb added 6 commits December 21, 2018 14:22

Merge branch 'develop' into feature/single-partial

677639a

remove duplicate correct for scalar broadcast check and add javadoc

6752ac1

rename getPartial to just get

7111953

add java doc to LogProbGradients class and clean up addition method

0fabaf7

replace stream with for loop cause it's in hot path

bb3fc46

Merge branch 'develop' into feature/single-partial

791c8f1

migwellian reviewed Dec 22, 2018

View reviewed changes

gordoncaleb added 7 commits January 2, 2019 12:48

refactor partial diff and differentiator a bit as per PR

8b93af8

use vertex id in both multi partials classes

82cf361

move all public methods in Differentiator to top of class

8727201

make autodiff broadcast class a utility class

79fe077

refactor efficiency differentiator tests to have their own test class

0a14d1e

remove isEmpty on partial derivative to avoid ambiguity with isPresent

adef3b6

minor performance tweak in log prob gradient reduction

9a4a4c9

christophernorth suggested changes Jan 2, 2019

View reviewed changes

gordoncaleb added 7 commits January 2, 2019 16:00

remove old int cast from before we suppored long lengths

7856312

make sure partial in log prob gradients is summed in place appropriately

f581718

remove need for converting map of partials to map of double tensors

c65ba97

avoid cast in protobuf test for cleaner code

6fe7890

PartialsOf to return double tensor and avoid get call

550b604

remove old conversion asMap on PartialsOf because it's not needed

40e637c

remove need for get from PartialsWithRespectTo since every use requir…

a62cd27

…es the underlying double tensor anyways

christophernorth approved these changes Jan 2, 2019

View reviewed changes

gordoncaleb merged commit 506a81f into develop Jan 2, 2019

gordoncaleb deleted the feature/single-partial branch January 2, 2019 17:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor PartialDerivatives to contain single partial #413

Refactor PartialDerivatives to contain single partial #413

gordoncaleb commented Dec 19, 2018

GeorgeNash left a comment

GeorgeNash Dec 21, 2018

gordoncaleb Dec 21, 2018

GeorgeNash Dec 21, 2018

migwellian Dec 22, 2018

gordoncaleb Jan 2, 2019

christophernorth Jan 2, 2019

migwellian left a comment

migwellian Dec 22, 2018

christophernorth left a comment

christophernorth Jan 2, 2019

gordoncaleb Jan 2, 2019

christophernorth Jan 2, 2019

christophernorth Jan 2, 2019

gordoncaleb Jan 2, 2019

christophernorth Jan 2, 2019

christophernorth Jan 2, 2019

christophernorth Jan 2, 2019

gordoncaleb Jan 2, 2019 •

edited

christophernorth Jan 2, 2019

gordoncaleb Jan 2, 2019

christophernorth Jan 2, 2019

gordoncaleb Jan 2, 2019

christophernorth Jan 2, 2019

gordoncaleb Jan 2, 2019

gordoncaleb Jan 2, 2019

christophernorth left a comment

Refactor PartialDerivatives to contain single partial #413

Refactor PartialDerivatives to contain single partial #413

Conversation

gordoncaleb commented Dec 19, 2018

GeorgeNash left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

migwellian left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

christophernorth left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gordoncaleb Jan 2, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

christophernorth left a comment

Choose a reason for hiding this comment

gordoncaleb Jan 2, 2019 •

edited