Compare task input properties as binary #962

lacasseio · 2016-12-02T20:08:17Z

There are various use cases where user defined Serializable task input
properties caused ClassNotFoundException (#784). This is
solved by hashing the task input properties and saving those hash for
the up-to-date check during future build session (#919).
This new behavior deprecates equals method on any user-defined type
used as task input properties.

Open Issues

How do we announce the breaking change regarding equals method on user-defined types?

This change is

adammurdoch

We can't just silently ignore equals() on custom types, as this is a serious breaking change that people won't notice. Things will just silently sometimes be out-of-date.

To deal with this, for types (or every type) that have a custom equals() implementation, we should store the serialized object (not the hash). When we need to compare 2 values, we would deserialise the value using the new value's Classloader and compare both the binary form and call equals(). We would warn the user when these give different results, but honour the result of equals().

In 5.0 we would stop using equals() and use the binary form only.

adammurdoch · 2016-12-07T20:55:46Z

...cts/core/src/integTest/groovy/org/gradle/api/tasks/TaskInputPropertiesIntegrationTest.groovy

+
+            println "Flushing InMemoryTaskArtifactCache"
+            gradle.taskGraph.whenReady {
+                gradle.services.get(InMemoryTaskArtifactCache).invalidateAll()


We shouldn't use internal API in these tests. Instead, we should reproduce the issue in the same way a user would, which is to change the classpath for the build script and then run the build again. The simplest option is to simply change the build script to add/remove/change some code.

I tried appending code to the build script among other similar change but I couldn't reproduce what InMemoryTaskArtifactCache.invalidateAll does. In this case, I want to force Gradle to grab what is on disk and deserialize everything again in order to catch the issues where the wrong class loader is used. I copied over these line from a previous test.

We could use --no-daemon flag, however, I wasn't able to make it work correctly. This issue will usually happen when the cache is flushed by the daemon which I haven't found a trivial deterministic way to execute.

adammurdoch · 2016-12-07T20:59:16Z

...e/src/main/java/org/gradle/api/internal/changedetection/state/InputPropertiesSerializer.java


    InputPropertiesSerializer(ClassLoader classloader) {
-        this.serializer = new MapSerializer<String, Object>(BaseSerializerFactory.STRING_SERIALIZER, new DefaultSerializer<Object>(classloader));
+        this.serializer = new MapSerializer<String, HashCode>(BaseSerializerFactory.STRING_SERIALIZER, new DefaultSerializer<HashCode>(classloader));


We don't need to use DefaultSerializer any more, as it uses java serialisation which is really inefficient. We should use the more efficient serialisation we use elsewhere for hashes, implemented in HashCodeSerializer.

lacasseio · 2016-12-08T20:10:53Z

Thanks @adammurdoch for the very good code review. I'm thinking of implementing a wrapper for the input property to take care of the logic for choosing the binary form vs the actual Object serialization/deserialization.

My only concern is how to correctly detect that a type chain overrides the equals method. For example, a String class overrides it but it's fine alright so we shouldn't emit a warning or care about. Suppose class Foo doesn't override the equals method but his parent class, which happen to be another custom type, override the equals method, how can we make sure we detect this case. Also, if we parent chain goes all the way up to a class outside of the user's control, how do we deal with this case.

Because of this concern, I'm leaning toward wrapping the input property in a class that would deserialize as needed using the classloader of the object we are trying to compare. It was one of the solutions we discuss.

What do you think should be the way to go given my concern above?

adammurdoch · 2016-12-08T22:21:55Z

We don't really care whether equals() is overridden or not, and who overrides it. What we care about is the case where a.equals(b) and hash(a).equals(hash(b)) return a different value. So, we could just do this check for all values regardless of their equals() implementation.

For certain types, we already know that a.equals(b) and hash(a).equals(hash(b)) return the same result for all values: instances of String, Number, File, collections and maps (but not arrays) of these things. So for these types we can short-circuit the deserialization, which is expensive both in time and memory usage and just store and compare the hash.

A wrapper to take care of the type specific behaviour makes a lot of sense here.

Thinking about the overridden equals() case, it actually wouldn't make sense to use the default Object.equals() for a value you are using as a task input, as if you did not the task would always be out-of-date. So let's not bother checking whether equals() is overridden or not, as it almost always will be.

adammurdoch · 2016-12-08T22:23:22Z

Actually, collections and maps don't always return the same value for equals and comparing the hash. Only those that maintain order do.

There is various use case where user defined Serializable task input properties caused ClassNotFoundException (#784). This is solved by hashing the task input properties and saving those hash for the up-to-date check during future build session (#919). This new behavior deprecates equals method on any user-defined type used as task input properties.

The wrapper object handle the serialization as well as deserialization of the task input property. It also choose between a binary only implementation or a full type implementation for equality.

lacasseio

The change requested have been done except for the cache invalidation. I would need some help to correctly fix.

lacasseio · 2016-12-27T19:31:58Z

...cts/core/src/integTest/groovy/org/gradle/api/tasks/TaskInputPropertiesIntegrationTest.groovy

+
+            println "Flushing InMemoryTaskArtifactCache"
+            gradle.taskGraph.whenReady {
+                gradle.services.get(InMemoryTaskArtifactCache).invalidateAll()


I tried appending code to the build script among other similar change but I couldn't reproduce what InMemoryTaskArtifactCache.invalidateAll does. In this case, I want to force Gradle to grab what is on disk and deserialize everything again in order to catch the issues where the wrong class loader is used. I copied over these line from a previous test.

We could use --no-daemon flag, however, I wasn't able to make it work correctly. This issue will usually happen when the cache is flushed by the daemon which I haven't found a trivial deterministic way to execute.

lacasseio · 2016-12-27T19:43:27Z

As discussed, the implementation wraps the task input properties inside a class that either saves only the hash of the property or also save the serialized form of the property object. During the equality check, the property from the previous execution is deserialized and compared with it's matching property from the current run. If the hash and the Objects.equals value differs, a deprecation warning is printed.

lacasseio · 2016-12-28T14:30:40Z

@bmuschko Would you mind having a look at this PR for any low hanging fix to let @adammurdoch focus on the higher level review of the fix?

bmuschko · 2016-12-29T15:04:24Z

...cts/core/src/integTest/groovy/org/gradle/api/tasks/TaskInputPropertiesIntegrationTest.groovy

+            enum FooType { FOO }
+
+            rootProject {
+                afterEvaluate {


Why do we need afterEvaluate here?

You are right. Replaced with tasks.matching.

bmuschko · 2016-12-29T15:05:57Z

...cts/core/src/integTest/groovy/org/gradle/api/tasks/TaskInputPropertiesIntegrationTest.groovy

+
+        given:
+        initScript << """
+            enum FooType { FOO }


I believe enums still have a problem when used as input property. Maybe we can just use class FooType implements Serializable.

I'm not sure which problem you are referring regarding enums as input property. This test does fail without the fix and succeeds after. Could you elaborate on this?

We used to have an issue with enums for input properties (see https://issues.gradle.org/browse/GRADLE-3537 and https://issues.gradle.org/browse/GRADLE-3018). Even though these issues are fixed, @eriwen recently mentioned that we have some sort of issue lurking. Maybe he can elaborate on it.

I agree they are linked to what this PR is trying to fix. One issue got fixed by a previous work that addressed buildSrc custom type. This PR will go one step further and address the custom type declared in build.gradle script. Using either enum or Serializable class will give the same result. I choose enum as it was more compact.

Would you want me to add an additional test specifically for Serializable class?

An enum implements Serializable. So in general I'd say: "not needed". However, given the history of issues we had with enums I'd like to see another tests with a class that explicitly implements Serializable.

bmuschko · 2016-12-29T15:26:07Z

...cts/core/src/integTest/groovy/org/gradle/api/tasks/TaskInputPropertiesIntegrationTest.groovy

+
+            task createFile(type: MyTask) {
+                ext.outputFile = file('output.txt')
+                outputs.file(outputFile)


Let's define the output as property in MyTask + annotation as well.

Good point, done.

bmuschko · 2016-12-29T15:27:03Z

...cts/core/src/integTest/groovy/org/gradle/api/tasks/TaskInputPropertiesIntegrationTest.groovy

+
+                foo = FooType.FOO
+
+                doLast {


Should be a @TaskAction in MyTask to have all the imperative code in the task definition.

bmuschko · 2016-12-29T15:30:06Z

...cts/core/src/integTest/groovy/org/gradle/api/tasks/TaskInputPropertiesIntegrationTest.groovy

+        buildFile << """
+            import org.gradle.api.internal.changedetection.state.InMemoryTaskArtifactCache
+
+            println "Flushing InMemoryTaskArtifactCache"


No need for println.

Done and removed from the other test too.

bmuschko · 2016-12-29T15:38:42Z

...cts/core/src/integTest/groovy/org/gradle/api/tasks/TaskInputPropertiesIntegrationTest.groovy

+                }
+            }
+
+            task createFile {


I'd suggest breaking out the definition of the task into a task type and the task usage. You can represent both build script snippets by a method and reuse them across multiple test cases as you see fit.

I'm not sure what is the advantage of what you are suggesting. Although the type has the same name, they have a different purpose. The FooType in this test exercise a custom equals implementation which is irrelevant and possibly erroneous for the tests above. The same goes for the task type and usage. I do want to configure a task that exercises the up-to-date check but I don't necessary want a custom type as the task type class loader did have an effect on the ClassNotFoundException issue we are trying to solve. I think it would be wise to leave it as is. What do you think?

I was mainly thinking about the task type here. If we can't use a task type, would it be possible to reuse some of the task code you have in place?

I hesitant to reuse the task code as I have the impression it would make the code less readable to save 6 lines of code in total. The overhead of the generic function would be greater that the complexity and number of line saved. I'm open to doing the work, I just can't see how to do it in a clean and useful way.

To put it differently, what was the one thing you found was harder to understand in the new test and, once understood, what could have made it easier to understand the first time?

I think we'll want to strike the right balance between reusability and readability. If you feel that it makes the test code much harder to read then we can keep as is.

- Remove technical detail from deprecation warning - Move test helper function into abstract fixture - Fix failure for TestReportIntegrationTest - Simplify InputProperties.create logic

lacasseio

I pushed another commit addressing the code review.

lacasseio · 2017-01-11T13:42:13Z

...ojects/core/src/main/java/org/gradle/api/internal/changedetection/state/InputProperties.java

@@ -66,6 +67,15 @@ private static boolean isBinaryComparableProperty(Object inputProperty) {
            Object item = processingQueue.pop();

            Class cls = item.getClass();
+            try {
+                // Given the equals wasn't override, use binary comparison.


lacasseio · 2017-01-11T14:48:36Z

...ojects/core/src/main/java/org/gradle/api/internal/changedetection/state/InputProperties.java

+            try {
+                // Given the equals wasn't override, use binary comparison.
+                if (cls.getMethod("equals", Object.class).getDeclaringClass().equals(Object.class)) {
+                    continue;


Great idea and I feel the same about continue. I did a small refactor to make the code cleaner and address your comment.

lacasseio · 2017-01-11T14:50:17Z

...ore/src/test/groovy/org/gradle/api/internal/changedetection/state/InputPropertiesTest.groovy

            return 0
        }
    }

-    def "create a full type wrapper for custom Serializable input property"() {
+    static class SerializableTypeWithCustomEquals implements Serializable, Comparable<SerializableTypeWithCustomEquals> {


lacasseio · 2017-01-11T14:51:09Z

...cts/core/src/integTest/groovy/org/gradle/api/tasks/TaskInputPropertiesIntegrationTest.groovy

@@ -398,6 +398,7 @@ apply from:'scriptPlugin.gradle'
    }



lacasseio · 2017-01-11T14:51:55Z

...cts/core/src/integTest/groovy/org/gradle/api/tasks/TaskInputPropertiesIntegrationTest.groovy

@@ -544,6 +544,8 @@ apply from:'scriptPlugin.gradle'
        then:
        executedTasks == [':createFile']
        skippedTasks.empty  // The equals implementation for custom type always return false
+        outputContains "Custom equals implementation on task input properties has been deprecated and is scheduled to be removed in Gradle 4.0. " +
+            "In the future, Gradle will be hashing the input property object and comparing this hash"


I think you are right. I removed the technical details.

lacasseio · 2017-01-11T14:56:12Z

...cts/core/src/integTest/groovy/org/gradle/api/tasks/TaskInputPropertiesIntegrationTest.groovy

+    // Helper function that generate the FooType definition that can be used in the exact same way
+    private static String createFooTypeDefinitionAsEnum() {
+        return """
+            |enum FooType {


They just serve for whitespace management, the stripMargin remove them and every whitespace in front. I will remove them as it cause confusion.

lacasseio · 2017-01-11T15:06:09Z

...cts/core/src/integTest/groovy/org/gradle/api/tasks/TaskInputPropertiesIntegrationTest.groovy

+
+    // Helper function that generate the FooType definition that can be used in the exact same way
+    private static String createFooTypeDefinitionAsEnum() {
+        return """


lacasseio · 2017-01-11T15:06:19Z

...cts/core/src/integTest/groovy/org/gradle/api/tasks/TaskInputPropertiesIntegrationTest.groovy

@@ -537,4 +545,36 @@ apply from:'scriptPlugin.gradle'
        executedTasks == [':createFile']
        skippedTasks.empty  // The equals implementation for custom type always return false
    }
+
+    // Helper function that generate the FooType definition that can be used in the exact same way


lacasseio · 2017-01-11T16:48:28Z

Looking more into the logic to support Sets and Maps, there is a slim chance that we may wrongly detect a type to be binary comparable compatible if the first entry contains objects that don't override equals but other class hierarchy compatible to the generic type of the collections does override equals. It could cause issues when comparing for equality. I will push a commit to print the deprecation warning only if the equals of the object is overridden. This will ensure that custom type is always correctly compared.

Types with default equals won't be binary compared.

The warning logging is quite complex to get right. Let's have a seperate PR to address this specific feature.

bmuschko

LGTM. Thanks for the fixes!

adammurdoch

This needs some further changes. We need to make sure that the actual input property value is discarded as soon as we inspect the input properties at the start of the up-to-date checks. From there on, we only work with the hash and/or the serialized form. This includes:

not using the input property value to build the cache key
not retaining the input property value in the in-memory cache of the task history.

Also, as part of this change, we need to issue a deprecation warning when newInputValue.equals(deserializedPreviousValue) and newInputValueHashCode.equals(previousValueHashCode) give different results.

Can you also remove the AsyncCacheAccessContext thing, as this should no longer be required.

We should also add some test coverage that these custom types can be compared across builds where the build script changes, as this is the original issue.

adammurdoch · 2017-01-14T21:47:50Z

...cts/core/src/integTest/groovy/org/gradle/api/tasks/TaskInputPropertiesIntegrationTest.groovy

@@ -39,8 +37,7 @@ class TaskInputPropertiesIntegrationTest extends AbstractIntegrationSpec {
        """

        when: fails "foo"
-        then: failure.assertHasDescription("Could not add entry ':foo' to cache taskHistory.bin")
-        then: failure.assertHasCause("Unable to store task input properties. Property 'b' with value 'xxx' cannot be serialized.")
+        then: failure.assertHasDescription("Unable to hash task input properties. Property 'b' with value 'xxx' cannot be serialized.")


Can you also get rid of the call to TaskHistoryStore#flush() in DefaultGradleLauncher, as this should no longer be required.

I have the feeling that removing the call to TaskHistoryStore#flush() isn't as simple as it may be. Given the following comment in the code:

// This is not strictly necessary, as the caches are closed immediately after this. Calling flush here rethrows any write failures inside the context of the build // Instead, failures thrown when stopping or closing a service should be treated as build failures gradle.getServices().get(TaskHistoryStore.class).flush();

It seems to imply the flush() is required to collect any write failures and treat them as build failures. Removing that call may result in some feature loose. I would suggest addressing this independently of this PR as more code would need to be changed somewhat unrelated to this code.

adammurdoch · 2017-01-14T21:48:24Z

...cts/core/src/integTest/groovy/org/gradle/api/tasks/TaskInputPropertiesIntegrationTest.groovy

        given:
        buildFile << """
            import org.gradle.api.internal.changedetection.state.InMemoryTaskArtifactCache

-            println "Flushing InMemoryTaskArtifactCache"
            gradle.taskGraph.whenReady {
                gradle.services.get(InMemoryTaskArtifactCache).invalidateAll()


we should not need this any more (otherwise we haven't actually fixed the problem).

I removed the cache invalidation. Without the fix and the cache invalidation, the test won't fail. Thanks to the invalidation, we are able to make those test fail. That's the reason why I feel uneasy with removing these lines.

adammurdoch · 2017-01-14T21:49:47Z

...cts/core/src/integTest/groovy/org/gradle/api/tasks/TaskInputPropertiesIntegrationTest.groovy

+            import org.gradle.api.internal.changedetection.state.InMemoryTaskArtifactCache
+
+            gradle.taskGraph.whenReady {
+                gradle.services.get(InMemoryTaskArtifactCache).invalidateAll()


should not need this

Done. Same concern as above.

adammurdoch · 2017-01-14T21:56:32Z

...s/core/src/main/java/org/gradle/api/internal/changedetection/state/DefaultInputProperty.java

+            tryDeserialize(this, rhs);
+        } catch (IOException e) {
+            // In presence of corruption, both object cannot assert to be equal
+            return false;


we shouldn't discard the exceptions here, they perhaps should be logged in some form. It would be good if this information ended up in the 'running task a because ...' reason message we log (and send elsewhere), eg 'running task a because the previous value of property b could not be deserialized'

That is a very good idea. I will shelve this for today and will look more in depth how to achieve this.

adammurdoch · 2017-01-14T21:59:00Z

...s/core/src/main/java/org/gradle/api/internal/changedetection/state/DefaultInputProperty.java

+import java.io.Serializable;
+
+class DefaultInputProperty implements InputProperty {
+    private transient Object inputProperty;


we should not retain the input property value.

I was using it as a cache for 2 reasons:

We don't need to deserialize the object if it was already deserialized before.

In order to deserialize the input property, we need the right class loader (which was the original issue). In the implemented solution, we know that at least one of the two DefaultInputProperty will have the instance of a deserialized input property (as it's the current build session). We simply use the class loader of that object to deserialize the other one.

Reason 1 is probably negligible. Could you elaborate on how to solve reason 2 without causing ClassNotFoundException?

adammurdoch · 2017-01-14T22:00:59Z

...ojects/core/src/main/java/org/gradle/api/internal/changedetection/state/InputProperties.java

+        }
+    }
+
+    private static boolean isBinaryComparableProperty(Object inputProperty) {


I would do this when inspecting the task property, so that we do the inspection once, rather than once per task property value.

Good idea. I wasn't sure how to cleanly organize two distinct logic for equality inside a single class. The binary comparison logic is straightforward, however, the default logic is quite complex. Given that you suggest not keeping an instance of inputProperty inside DefaultInputProperty, how can we move the isBinaryComparableProperty logic inside DefaultInputProperty.equals without having access to inputProperty.getClass()?

adammurdoch · 2017-01-14T22:02:07Z

...e/src/main/java/org/gradle/api/internal/changedetection/state/InputPropertiesSerializer.java


    InputPropertiesSerializer(ClassLoader classloader) {
-        this.serializer = new MapSerializer<String, Object>(BaseSerializerFactory.STRING_SERIALIZER, new DefaultSerializer<Object>(classloader));
+        this.serializer = new MapSerializer<String, InputProperty>(BaseSerializerFactory.STRING_SERIALIZER, new DefaultSerializer<InputProperty>(classloader));


we should not use default java serialization any more, as it's very, very inefficient. We simply need to serialize the hash and serialized bytes for each entry

Make sense, since I looked a lot at the Serializer lately, I know what you mean. I will make the change.

adammurdoch · 2017-01-14T22:02:22Z

subprojects/core/src/main/java/org/gradle/api/internal/changedetection/state/InputProperty.java

+
+import java.io.Serializable;
+
+public interface InputProperty extends Serializable {


not Serializable.

Same as above, I will make the change.

adammurdoch · 2017-01-14T22:04:21Z

subprojects/core/src/main/java/org/gradle/caching/internal/DefaultBuildCacheKeyBuilder.java

+        try {
+            HasherUtil.putObject(hasher, value);
+        } catch (NotSerializableException e) {
+            throw new UncheckedIOException(e);


this should't happen. we should be appending the hash of the property value

Could you elaborate on this, I'm not sure I fully understand?

The previous code would simply try to cast all Object to Serializable and use SerializationUtils.serialize as a last resort action. However, if the class isn't Serializable it will throw a ClassCastException. I simply tried to mimic the behavior from how input property was previously serialized without changing the behavior for BuildCacheKey. If the class isn't Serializable it throws a NotSerializableException so we can handle it as needed for BuildCacheKey and InputProperty.

adammurdoch · 2017-01-14T22:16:44Z

Looks like we removed the deprecation warning. The logic is actually quite simple:

we have the new value.
deserialise the previous value
call equals() on the new and previous values
call equals() on the new and previous hashes (or binary content or both)
if the result of 3 and 4 are not the same, complain.

wolfs · 2017-01-15T13:42:32Z

@lacasseio We want to extract the calculation of the build cache key anyway. See #1175. We should probably sync on Monday.

adammurdoch · 2017-01-15T22:12:53Z

The key concept to keep in mind is that the hash of the input property value is input to both the build cache key calculation and the incremental build state. The input property value is input to neither.

In other words, we need to use exactly the same logic to decide whether a property value has changed over time, regardless of whether we're using the build output cache or the output from a previous local build.

lacasseio · 2017-01-16T15:10:06Z

Given the amount of work left to do to deliver this PR and how unsure I am with the exact impact of the issue we are trying to fix, there are three choices I see to avoid dragging this work any longer:

We shelve the work. The ClassNotFoundException issue seems to impact a minority of the users as there isn't much activity on the bug report. It would enable us to focus on higher priority work. We can revisit this issue if it becomes a problem for more users or it may naturally get resolved as more development in that area is done.
We strictly address the ClassNotFoundException issue by deserializing the input data only during the equality check. As we don't try to do any strategic work, the scope becomes more manageable and stay focus on fixing the original issue. Any strategical work is deprioritized and scheduled at a later date.
We deem the strategical work as high priority/value and fix it now. We move forward by addressing all the issue raised here and merge this PR with its present scope.

Strategic work is important but I can't assert how much time is left on this PR as I have been wrong multiple time since this work started. Could @bmuschko and @eriwen (anyone else is welcome to pitch in too) comment on the priority this work should be attributed and what solution strategy we should be looking at?

bmuschko · 2017-01-16T15:50:08Z

@lacasseio I think it would be crucial to know how much effort is left to get this PR production-ready to come to a conclusion. Maybe @adammurdoch can give some insight here.

What we do know is that we have higher priority items in the queue that need attention. I'd suggest we identify if we can improve the situation for the end user right now with the code we have in place. That would give us the opportunity to merge and follow up with a new issue to address any additional concerns. If the code in the PR is not production-ready then we might have to shelve it or find a new owner.

lacasseio

Thanks @adammurdoch for the awesome code review. Could you elaborate a bit more on some point so I can better estimate how much more work is required on this PR. Thanks!

lacasseio · 2017-01-16T18:15:22Z

...cts/core/src/integTest/groovy/org/gradle/api/tasks/TaskInputPropertiesIntegrationTest.groovy

        given:
        buildFile << """
            import org.gradle.api.internal.changedetection.state.InMemoryTaskArtifactCache

-            println "Flushing InMemoryTaskArtifactCache"
            gradle.taskGraph.whenReady {
                gradle.services.get(InMemoryTaskArtifactCache).invalidateAll()


I removed the cache invalidation. Without the fix and the cache invalidation, the test won't fail. Thanks to the invalidation, we are able to make those test fail. That's the reason why I feel uneasy with removing these lines.

lacasseio · 2017-01-16T18:16:49Z

...cts/core/src/integTest/groovy/org/gradle/api/tasks/TaskInputPropertiesIntegrationTest.groovy

+            import org.gradle.api.internal.changedetection.state.InMemoryTaskArtifactCache
+
+            gradle.taskGraph.whenReady {
+                gradle.services.get(InMemoryTaskArtifactCache).invalidateAll()


Done. Same concern as above.

lacasseio · 2017-01-16T18:28:04Z

...s/core/src/main/java/org/gradle/api/internal/changedetection/state/DefaultInputProperty.java

+import java.io.Serializable;
+
+class DefaultInputProperty implements InputProperty {
+    private transient Object inputProperty;


I was using it as a cache for 2 reasons:

We don't need to deserialize the object if it was already deserialized before.

In order to deserialize the input property, we need the right class loader (which was the original issue). In the implemented solution, we know that at least one of the two DefaultInputProperty will have the instance of a deserialized input property (as it's the current build session). We simply use the class loader of that object to deserialize the other one.

Reason 1 is probably negligible. Could you elaborate on how to solve reason 2 without causing ClassNotFoundException?

lacasseio · 2017-01-16T18:29:32Z

...s/core/src/main/java/org/gradle/api/internal/changedetection/state/DefaultInputProperty.java

+            tryDeserialize(this, rhs);
+        } catch (IOException e) {
+            // In presence of corruption, both object cannot assert to be equal
+            return false;


That is a very good idea. I will shelve this for today and will look more in depth how to achieve this.

lacasseio · 2017-01-16T18:35:44Z

...ojects/core/src/main/java/org/gradle/api/internal/changedetection/state/InputProperties.java

+        }
+    }
+
+    private static boolean isBinaryComparableProperty(Object inputProperty) {


Good idea. I wasn't sure how to cleanly organize two distinct logic for equality inside a single class. The binary comparison logic is straightforward, however, the default logic is quite complex. Given that you suggest not keeping an instance of inputProperty inside DefaultInputProperty, how can we move the isBinaryComparableProperty logic inside DefaultInputProperty.equals without having access to inputProperty.getClass()?

lacasseio · 2017-01-16T18:37:18Z

...e/src/main/java/org/gradle/api/internal/changedetection/state/InputPropertiesSerializer.java


    InputPropertiesSerializer(ClassLoader classloader) {
-        this.serializer = new MapSerializer<String, Object>(BaseSerializerFactory.STRING_SERIALIZER, new DefaultSerializer<Object>(classloader));
+        this.serializer = new MapSerializer<String, InputProperty>(BaseSerializerFactory.STRING_SERIALIZER, new DefaultSerializer<InputProperty>(classloader));


Make sense, since I looked a lot at the Serializer lately, I know what you mean. I will make the change.

lacasseio · 2017-01-16T18:37:33Z

subprojects/core/src/main/java/org/gradle/api/internal/changedetection/state/InputProperty.java

+
+import java.io.Serializable;
+
+public interface InputProperty extends Serializable {


Same as above, I will make the change.

lacasseio · 2017-01-16T18:46:11Z

subprojects/core/src/main/java/org/gradle/caching/internal/DefaultBuildCacheKeyBuilder.java

+        try {
+            HasherUtil.putObject(hasher, value);
+        } catch (NotSerializableException e) {
+            throw new UncheckedIOException(e);


Could you elaborate on this, I'm not sure I fully understand?

The previous code would simply try to cast all Object to Serializable and use SerializationUtils.serialize as a last resort action. However, if the class isn't Serializable it will throw a ClassCastException. I simply tried to mimic the behavior from how input property was previously serialized without changing the behavior for BuildCacheKey. If the class isn't Serializable it throws a NotSerializableException so we can handle it as needed for BuildCacheKey and InputProperty.

lacasseio · 2017-01-16T18:49:15Z

...cts/core/src/integTest/groovy/org/gradle/api/tasks/TaskInputPropertiesIntegrationTest.groovy

@@ -39,8 +37,7 @@ class TaskInputPropertiesIntegrationTest extends AbstractIntegrationSpec {
        """

        when: fails "foo"
-        then: failure.assertHasDescription("Could not add entry ':foo' to cache taskHistory.bin")
-        then: failure.assertHasCause("Unable to store task input properties. Property 'b' with value 'xxx' cannot be serialized.")
+        then: failure.assertHasDescription("Unable to hash task input properties. Property 'b' with value 'xxx' cannot be serialized.")


I have the feeling that removing the call to TaskHistoryStore#flush() isn't as simple as it may be. Given the following comment in the code:

// This is not strictly necessary, as the caches are closed immediately after this. Calling flush here rethrows any write failures inside the context of the build // Instead, failures thrown when stopping or closing a service should be treated as build failures gradle.getServices().get(TaskHistoryStore.class).flush();

It seems to imply the flush() is required to collect any write failures and treat them as build failures. Removing that call may result in some feature loose. I would suggest addressing this independently of this PR as more code would need to be changed somewhat unrelated to this code.

adammurdoch · 2017-01-16T23:12:24Z

@lacasseio some comments:

I have the feeling that removing the call to TaskHistoryStore#flush() isn't as simple as it may be.

I added that comment. That call was added to make a test pass, and with the changes in the PR this call will not be required, for us to report problems serializing user provided values, which is the failure this call is trying to deal with.

Other failures, which we're not concerned about here, will still be reported.

I removed the cache invalidation. Without the fix and the cache invalidation, the test won't fail. Thanks to the invalidation, we are able to make those test fail.

This means we're missing some other coverage. We shouldn't have functional tests that use internal APIs and do not exercise Gradle in a way that a user would not.

This is the basic problem here:

Run the build using the daemon
Change script or plugin
Run the build using the same daemon
Depending on the implementation of the value's equals() method fails with a ClassCastException or other linkage exception.

This is the test case I would use.

Reason 1 is probably negligible. Could you elaborate on how to solve reason 2 without causing ClassNotFoundException?

Exactly the way you have, just without retaining the deserialized value in the entry.

Given that you suggest not keeping an instance of inputProperty inside DefaultInputProperty, how can we move the isBinaryComparableProperty logic inside DefaultInputProperty.equals without having access to inputProperty.getClass()?

Don't use equals() for the comparison.

InputPropertiesTaskStateChanges is the class ultimately responsible for doing the comparison. I'd do the work there.

The previous code would simply try to cast all Object to Serializable and use SerializationUtils.serialize as a last resort action.

This is the wrong place to do the serialization, it's too late and it duplicates the other work that happens earlier. The serialization and hashing of arbitrary objects should happen as part of calculating the input properties, so that this result can be reused both as part of the cache key and in the task history for later local builds. When we're building the cache key, we already have a hash for the input property value, and this is what we should be adding to the cache key.

A new API for DiffUtil was added to support this move. This new API abstract the equality check into an Equalizer interface.

lacasseio · 2017-01-17T13:05:24Z

Thanks for all the detail you added in the code review @adammurdoch. I pushed most of the change to address your code review comment. I didn't polish the code and test as I want to double check with you to see if those changes align with what you had in mind.

With those changes, HasherUtil will be removed and all related change reverted. A new API was added to DiffUtil to abstract the equality check through a new interface Equalizer. The equality logic was moved to a custom Equalizer inside InputPropertiesTaskStateChanges. The InputProperty are simple container for the hash, serialized bytes and, only for the current execution, the input property Object. If you give me the ok on those changes, I will move forward with polishing and we should be able to finish the PR.

adammurdoch

Looks good. A couple of things I think we should change.

adammurdoch · 2017-01-19T02:28:33Z

...main/java/org/gradle/api/internal/changedetection/rules/InputPropertiesTaskStateChanges.java

+            return inputProperty.getRawValue();
+        }
+
+        private static boolean isEqualsMethodOverriden(Object obj) {


I think this is going to be a performance issue, as lookup in reflection is quite slow, and we're doing this check for both values for every input property of every task. We could either cache the result of the lookup, or remove the check (and just warn people that 'equals()' returns a different result to comparing the hashes).

adammurdoch · 2017-01-19T02:31:00Z

...e/src/main/java/org/gradle/api/internal/changedetection/state/InputPropertiesSerializer.java

+
+        @Override
+        public InputProperty read(Decoder decoder) throws Exception {
+            byte[] serializedBytes = SERIALIZER.read(decoder);


can be decoder.readBinary()

adammurdoch · 2017-01-19T02:32:06Z

...e/src/main/java/org/gradle/api/internal/changedetection/state/InputPropertiesSerializer.java

+        @Override
+        public InputProperty read(Decoder decoder) throws Exception {
+            byte[] serializedBytes = SERIALIZER.read(decoder);
+            HashCode hash = HashCode.fromBytes(SERIALIZER.read(decoder));


should reuse HashCodeSerializer

adammurdoch · 2017-02-25T21:58:50Z

This can be closed now. I ended up implementing this fix in a somewhat different way, closer to what I needed for caching dependency transforms.

lacasseio · 2017-02-27T13:46:51Z

Thanks a lot @adammurdoch, let's close it now.

lacasseio added the in:core DO NOT USE label Dec 2, 2016

lacasseio added this to the 3.3 milestone Dec 2, 2016

lacasseio added the from:member label Dec 2, 2016

lacasseio mentioned this pull request Dec 2, 2016

Compare the binary form of local cache property for up-to-date check #919

Closed

lacasseio force-pushed the dl-issue-919 branch from 3bb6745 to e49ce0a Compare December 5, 2016 13:55

eriwen self-assigned this Dec 7, 2016

adammurdoch requested changes Dec 7, 2016

View reviewed changes

eriwen modified the milestones: 4.0, 3.3 RC1 Dec 9, 2016

eriwen assigned lacasseio and unassigned eriwen Dec 12, 2016

Daniel Lacasse added 4 commits December 27, 2016 13:26

Implement wrapper object for input properties

99b4c40

The wrapper object handle the serialization as well as deserialization of the task input property. It also choose between a binary only implementation or a full type implementation for equality.

Remove unused imports

e34e327

Polish deprecation warning

f3a2e44

lacasseio force-pushed the dl-issue-919 branch from f6b3ca2 to f3a2e44 Compare December 27, 2016 18:26

Daniel Lacasse added 2 commits December 27, 2016 14:24

Use SortedSet and SortedMap

1f8c77e

Revert unecessary changes

7d0fa40

lacasseio commented Dec 27, 2016

View reviewed changes

bmuschko reviewed Dec 29, 2016

View reviewed changes

Address code review

c107715

- Remove technical detail from deprecation warning - Move test helper function into abstract fixture - Fix failure for TestReportIntegrationTest - Simplify InputProperties.create logic

lacasseio commented Jan 11, 2017

View reviewed changes

Merge branch 'master' into dl-issue-919

5162882

Daniel Lacasse added 3 commits January 11, 2017 12:21

Prevent deprecation warning for type with default equals implementation

c7de0fc

Types with default equals won't be binary compared.

Remove deprecation warning

7f929d0

The warning logging is quite complex to get right. Let's have a seperate PR to address this specific feature.

Fold AbstractInputProperty into BinaryInputProperty for optimization

851a3e2

bmuschko approved these changes Jan 13, 2017

View reviewed changes

adammurdoch requested changes Jan 14, 2017

View reviewed changes

lacasseio commented Jan 16, 2017

View reviewed changes

Daniel Lacasse added 4 commits January 17, 2017 07:51

Remove usage of default serializer

5c68008

Avoid using internal API in integration test

4898dd7

Remove unecessary TaskHistoryStore#flush()

24f9195

Move equality logic in InputPropertiesTaskStateChanges

b4f6599

A new API for DiffUtil was added to support this move. This new API abstract the equality check into an Equalizer interface.

adammurdoch requested changes Jan 19, 2017

View reviewed changes

adammurdoch mentioned this pull request Jan 23, 2017

Faster serialization #1222

Closed

eriwen added the is:blocked label Feb 3, 2017

eriwen removed this from the 4.0 milestone Feb 3, 2017

nedtwigg mentioned this pull request Feb 27, 2017

Could not read entry ':spotlessJava' from cache taskArtifacts.bin > com.diffplug.spotless.extra.GitAttributesLineEndings$Policy diffplug/spotless#78

Closed

lacasseio closed this Feb 27, 2017

lacasseio deleted the dl-issue-919 branch May 31, 2017 22:13


		import java.io.Serializable;

		public interface InputProperty extends Serializable {

Compare task input properties as binary #962

Compare task input properties as binary #962

Conversation

lacasseio commented Dec 2, 2016 • edited

Open Issues

adammurdoch left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lacasseio commented Dec 8, 2016

adammurdoch commented Dec 8, 2016

adammurdoch commented Dec 8, 2016

lacasseio left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lacasseio commented Dec 27, 2016

lacasseio commented Dec 28, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bmuschko Dec 29, 2016 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lacasseio left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lacasseio commented Jan 11, 2017

bmuschko left a comment

Choose a reason for hiding this comment

adammurdoch left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adammurdoch commented Jan 14, 2017

wolfs commented Jan 15, 2017

adammurdoch commented Jan 15, 2017

lacasseio commented Jan 16, 2017

bmuschko commented Jan 16, 2017

lacasseio left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lacasseio commented Dec 2, 2016 •

edited

bmuschko Dec 29, 2016 •

edited

adammurdoch commented Jan 16, 2017 •

edited