New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compare task input properties as binary #962
Conversation
3bb6745
to
e49ce0a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't just silently ignore equals()
on custom types, as this is a serious breaking change that people won't notice. Things will just silently sometimes be out-of-date.
To deal with this, for types (or every type) that have a custom equals()
implementation, we should store the serialized object (not the hash). When we need to compare 2 values, we would deserialise the value using the new value's Classloader and compare both the binary form and call equals()
. We would warn the user when these give different results, but honour the result of equals()
.
In 5.0 we would stop using equals()
and use the binary form only.
|
||
println "Flushing InMemoryTaskArtifactCache" | ||
gradle.taskGraph.whenReady { | ||
gradle.services.get(InMemoryTaskArtifactCache).invalidateAll() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't use internal API in these tests. Instead, we should reproduce the issue in the same way a user would, which is to change the classpath for the build script and then run the build again. The simplest option is to simply change the build script to add/remove/change some code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried appending code to the build script among other similar change but I couldn't reproduce what InMemoryTaskArtifactCache.invalidateAll
does. In this case, I want to force Gradle to grab what is on disk and deserialize everything again in order to catch the issues where the wrong class loader is used. I copied over these line from a previous test.
We could use --no-daemon
flag, however, I wasn't able to make it work correctly. This issue will usually happen when the cache is flushed by the daemon which I haven't found a trivial deterministic way to execute.
|
||
InputPropertiesSerializer(ClassLoader classloader) { | ||
this.serializer = new MapSerializer<String, Object>(BaseSerializerFactory.STRING_SERIALIZER, new DefaultSerializer<Object>(classloader)); | ||
this.serializer = new MapSerializer<String, HashCode>(BaseSerializerFactory.STRING_SERIALIZER, new DefaultSerializer<HashCode>(classloader)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to use DefaultSerializer
any more, as it uses java serialisation which is really inefficient. We should use the more efficient serialisation we use elsewhere for hashes, implemented in HashCodeSerializer
.
Thanks @adammurdoch for the very good code review. I'm thinking of implementing a wrapper for the input property to take care of the logic for choosing the binary form vs the actual Object serialization/deserialization. My only concern is how to correctly detect that a type chain overrides the equals method. For example, a String class overrides it but it's fine alright so we shouldn't emit a warning or care about. Suppose class Because of this concern, I'm leaning toward wrapping the input property in a class that would deserialize as needed using the classloader of the object we are trying to compare. It was one of the solutions we discuss. What do you think should be the way to go given my concern above? |
We don't really care whether For certain types, we already know that A wrapper to take care of the type specific behaviour makes a lot of sense here. Thinking about the overridden |
Actually, collections and maps don't always return the same value for equals and comparing the hash. Only those that maintain order do. |
There is various use case where user defined Serializable task input properties caused ClassNotFoundException (#784). This is solved by hashing the task input properties and saving those hash for the up-to-date check during future build session (#919). This new behavior deprecates equals method on any user-defined type used as task input properties.
The wrapper object handle the serialization as well as deserialization of the task input property. It also choose between a binary only implementation or a full type implementation for equality.
f6b3ca2
to
f3a2e44
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change requested have been done except for the cache invalidation. I would need some help to correctly fix.
|
||
println "Flushing InMemoryTaskArtifactCache" | ||
gradle.taskGraph.whenReady { | ||
gradle.services.get(InMemoryTaskArtifactCache).invalidateAll() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried appending code to the build script among other similar change but I couldn't reproduce what InMemoryTaskArtifactCache.invalidateAll
does. In this case, I want to force Gradle to grab what is on disk and deserialize everything again in order to catch the issues where the wrong class loader is used. I copied over these line from a previous test.
We could use --no-daemon
flag, however, I wasn't able to make it work correctly. This issue will usually happen when the cache is flushed by the daemon which I haven't found a trivial deterministic way to execute.
As discussed, the implementation wraps the task input properties inside a class that either saves only the hash of the property or also save the serialized form of the property object. During the equality check, the property from the previous execution is deserialized and compared with it's matching property from the current run. If the hash and the |
@bmuschko Would you mind having a look at this PR for any low hanging fix to let @adammurdoch focus on the higher level review of the fix? |
enum FooType { FOO } | ||
|
||
rootProject { | ||
afterEvaluate { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need afterEvaluate
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. Replaced with tasks.matching
.
|
||
given: | ||
initScript << """ | ||
enum FooType { FOO } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe enums still have a problem when used as input property. Maybe we can just use class FooType implements Serializable
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure which problem you are referring regarding enums as input property. This test does fail without the fix and succeeds after. Could you elaborate on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We used to have an issue with enums for input properties (see https://issues.gradle.org/browse/GRADLE-3537 and https://issues.gradle.org/browse/GRADLE-3018). Even though these issues are fixed, @eriwen recently mentioned that we have some sort of issue lurking. Maybe he can elaborate on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree they are linked to what this PR is trying to fix. One issue got fixed by a previous work that addressed buildSrc
custom type. This PR will go one step further and address the custom type declared in build.gradle
script. Using either enum
or Serializable
class will give the same result. I choose enum
as it was more compact.
Would you want me to add an additional test specifically for Serializable
class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An enum implements Serializable
. So in general I'd say: "not needed". However, given the history of issues we had with enums I'd like to see another tests with a class that explicitly implements Serializable
.
|
||
task createFile(type: MyTask) { | ||
ext.outputFile = file('output.txt') | ||
outputs.file(outputFile) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's define the output as property in MyTask
+ annotation as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, done.
|
||
foo = FooType.FOO | ||
|
||
doLast { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be a @TaskAction
in MyTask
to have all the imperative code in the task definition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
buildFile << """ | ||
import org.gradle.api.internal.changedetection.state.InMemoryTaskArtifactCache | ||
|
||
println "Flushing InMemoryTaskArtifactCache" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for println
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done and removed from the other test too.
} | ||
} | ||
|
||
task createFile { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest breaking out the definition of the task into a task type and the task usage. You can represent both build script snippets by a method and reuse them across multiple test cases as you see fit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what is the advantage of what you are suggesting. Although the type has the same name, they have a different purpose. The FooType
in this test exercise a custom equals
implementation which is irrelevant and possibly erroneous for the tests above. The same goes for the task type and usage. I do want to configure a task that exercises the up-to-date check but I don't necessary want a custom type as the task type class loader did have an effect on the ClassNotFoundException
issue we are trying to solve. I think it would be wise to leave it as is. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was mainly thinking about the task type here. If we can't use a task type, would it be possible to reuse some of the task code you have in place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hesitant to reuse the task code as I have the impression it would make the code less readable to save 6 lines of code in total. The overhead of the generic function would be greater that the complexity and number of line saved. I'm open to doing the work, I just can't see how to do it in a clean and useful way.
To put it differently, what was the one thing you found was harder to understand in the new test and, once understood, what could have made it easier to understand the first time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we'll want to strike the right balance between reusability and readability. If you feel that it makes the test code much harder to read then we can keep as is.
- Remove technical detail from deprecation warning - Move test helper function into abstract fixture - Fix failure for TestReportIntegrationTest - Simplify InputProperties.create logic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed another commit addressing the code review.
@@ -66,6 +67,15 @@ private static boolean isBinaryComparableProperty(Object inputProperty) { | |||
Object item = processingQueue.pop(); | |||
|
|||
Class cls = item.getClass(); | |||
try { | |||
// Given the equals wasn't override, use binary comparison. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
try { | ||
// Given the equals wasn't override, use binary comparison. | ||
if (cls.getMethod("equals", Object.class).getDeclaringClass().equals(Object.class)) { | ||
continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great idea and I feel the same about continue. I did a small refactor to make the code cleaner and address your comment.
return 0 | ||
} | ||
} | ||
|
||
def "create a full type wrapper for custom Serializable input property"() { | ||
static class SerializableTypeWithCustomEquals implements Serializable, Comparable<SerializableTypeWithCustomEquals> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -398,6 +398,7 @@ apply from:'scriptPlugin.gradle' | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -544,6 +544,8 @@ apply from:'scriptPlugin.gradle' | |||
then: | |||
executedTasks == [':createFile'] | |||
skippedTasks.empty // The equals implementation for custom type always return false | |||
outputContains "Custom equals implementation on task input properties has been deprecated and is scheduled to be removed in Gradle 4.0. " + | |||
"In the future, Gradle will be hashing the input property object and comparing this hash" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are right. I removed the technical details.
// Helper function that generate the FooType definition that can be used in the exact same way | ||
private static String createFooTypeDefinitionAsEnum() { | ||
return """ | ||
|enum FooType { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They just serve for whitespace management, the stripMargin
remove them and every whitespace in front. I will remove them as it cause confusion.
|
||
// Helper function that generate the FooType definition that can be used in the exact same way | ||
private static String createFooTypeDefinitionAsEnum() { | ||
return """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -537,4 +545,36 @@ apply from:'scriptPlugin.gradle' | |||
executedTasks == [':createFile'] | |||
skippedTasks.empty // The equals implementation for custom type always return false | |||
} | |||
|
|||
// Helper function that generate the FooType definition that can be used in the exact same way |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Looking more into the logic to support Sets and Maps, there is a slim chance that we may wrongly detect a type to be binary comparable compatible if the first entry contains objects that don't override |
Types with default equals won't be binary compared.
The warning logging is quite complex to get right. Let's have a seperate PR to address this specific feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for the fixes!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs some further changes. We need to make sure that the actual input property value is discarded as soon as we inspect the input properties at the start of the up-to-date checks. From there on, we only work with the hash and/or the serialized form. This includes:
- not using the input property value to build the cache key
- not retaining the input property value in the in-memory cache of the task history.
Also, as part of this change, we need to issue a deprecation warning when newInputValue.equals(deserializedPreviousValue)
and newInputValueHashCode.equals(previousValueHashCode)
give different results.
Can you also remove the AsyncCacheAccessContext
thing, as this should no longer be required.
We should also add some test coverage that these custom types can be compared across builds where the build script changes, as this is the original issue.
@@ -39,8 +37,7 @@ class TaskInputPropertiesIntegrationTest extends AbstractIntegrationSpec { | |||
""" | |||
|
|||
when: fails "foo" | |||
then: failure.assertHasDescription("Could not add entry ':foo' to cache taskHistory.bin") | |||
then: failure.assertHasCause("Unable to store task input properties. Property 'b' with value 'xxx' cannot be serialized.") | |||
then: failure.assertHasDescription("Unable to hash task input properties. Property 'b' with value 'xxx' cannot be serialized.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also get rid of the call to TaskHistoryStore#flush()
in DefaultGradleLauncher
, as this should no longer be required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have the feeling that removing the call to TaskHistoryStore#flush()
isn't as simple as it may be. Given the following comment in the code:
// This is not strictly necessary, as the caches are closed immediately after this. Calling flush here rethrows any write failures inside the context of the build
// Instead, failures thrown when stopping or closing a service should be treated as build failures
gradle.getServices().get(TaskHistoryStore.class).flush();
It seems to imply the flush()
is required to collect any write failures and treat them as build failures. Removing that call may result in some feature loose. I would suggest addressing this independently of this PR as more code would need to be changed somewhat unrelated to this code.
given: | ||
buildFile << """ | ||
import org.gradle.api.internal.changedetection.state.InMemoryTaskArtifactCache | ||
|
||
println "Flushing InMemoryTaskArtifactCache" | ||
gradle.taskGraph.whenReady { | ||
gradle.services.get(InMemoryTaskArtifactCache).invalidateAll() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should not need this any more (otherwise we haven't actually fixed the problem).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the cache invalidation. Without the fix and the cache invalidation, the test won't fail. Thanks to the invalidation, we are able to make those test fail. That's the reason why I feel uneasy with removing these lines.
import org.gradle.api.internal.changedetection.state.InMemoryTaskArtifactCache | ||
|
||
gradle.taskGraph.whenReady { | ||
gradle.services.get(InMemoryTaskArtifactCache).invalidateAll() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should not need this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Same concern as above.
tryDeserialize(this, rhs); | ||
} catch (IOException e) { | ||
// In presence of corruption, both object cannot assert to be equal | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we shouldn't discard the exceptions here, they perhaps should be logged in some form. It would be good if this information ended up in the 'running task a because ...' reason message we log (and send elsewhere), eg 'running task a because the previous value of property b could not be deserialized'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a very good idea. I will shelve this for today and will look more in depth how to achieve this.
import java.io.Serializable; | ||
|
||
class DefaultInputProperty implements InputProperty { | ||
private transient Object inputProperty; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should not retain the input property value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was using it as a cache for 2 reasons:
- We don't need to deserialize the object if it was already deserialized before.
- In order to deserialize the input property, we need the right class loader (which was the original issue). In the implemented solution, we know that at least one of the two
DefaultInputProperty
will have the instance of a deserialized input property (as it's the current build session). We simply use the class loader of that object to deserialize the other one.
Reason 1 is probably negligible. Could you elaborate on how to solve reason 2 without causing ClassNotFoundException
?
} | ||
} | ||
|
||
private static boolean isBinaryComparableProperty(Object inputProperty) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would do this when inspecting the task property, so that we do the inspection once, rather than once per task property value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. I wasn't sure how to cleanly organize two distinct logic for equality inside a single class. The binary comparison logic is straightforward, however, the default logic is quite complex. Given that you suggest not keeping an instance of inputProperty
inside DefaultInputProperty
, how can we move the isBinaryComparableProperty
logic inside DefaultInputProperty.equals
without having access to inputProperty.getClass()
?
|
||
InputPropertiesSerializer(ClassLoader classloader) { | ||
this.serializer = new MapSerializer<String, Object>(BaseSerializerFactory.STRING_SERIALIZER, new DefaultSerializer<Object>(classloader)); | ||
this.serializer = new MapSerializer<String, InputProperty>(BaseSerializerFactory.STRING_SERIALIZER, new DefaultSerializer<InputProperty>(classloader)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should not use default java serialization any more, as it's very, very inefficient. We simply need to serialize the hash and serialized bytes for each entry
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense, since I looked a lot at the Serializer
lately, I know what you mean. I will make the change.
|
||
import java.io.Serializable; | ||
|
||
public interface InputProperty extends Serializable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not Serializable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above, I will make the change.
try { | ||
HasherUtil.putObject(hasher, value); | ||
} catch (NotSerializableException e) { | ||
throw new UncheckedIOException(e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should't happen. we should be appending the hash of the property value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you elaborate on this, I'm not sure I fully understand?
The previous code would simply try to cast all Object
to Serializable
and use SerializationUtils.serialize
as a last resort action. However, if the class isn't Serializable
it will throw a ClassCastException
. I simply tried to mimic the behavior from how input property was previously serialized without changing the behavior for BuildCacheKey
. If the class isn't Serializable
it throws a NotSerializableException
so we can handle it as needed for BuildCacheKey
and InputProperty
.
Looks like we removed the deprecation warning. The logic is actually quite simple:
|
@lacasseio We want to extract the calculation of the build cache key anyway. See #1175. We should probably sync on Monday. |
The key concept to keep in mind is that the hash of the input property value is input to both the build cache key calculation and the incremental build state. The input property value is input to neither. In other words, we need to use exactly the same logic to decide whether a property value has changed over time, regardless of whether we're using the build output cache or the output from a previous local build. |
Given the amount of work left to do to deliver this PR and how unsure I am with the exact impact of the issue we are trying to fix, there are three choices I see to avoid dragging this work any longer:
Strategic work is important but I can't assert how much time is left on this PR as I have been wrong multiple time since this work started. Could @bmuschko and @eriwen (anyone else is welcome to pitch in too) comment on the priority this work should be attributed and what solution strategy we should be looking at? |
@lacasseio I think it would be crucial to know how much effort is left to get this PR production-ready to come to a conclusion. Maybe @adammurdoch can give some insight here. What we do know is that we have higher priority items in the queue that need attention. I'd suggest we identify if we can improve the situation for the end user right now with the code we have in place. That would give us the opportunity to merge and follow up with a new issue to address any additional concerns. If the code in the PR is not production-ready then we might have to shelve it or find a new owner. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @adammurdoch for the awesome code review. Could you elaborate a bit more on some point so I can better estimate how much more work is required on this PR. Thanks!
given: | ||
buildFile << """ | ||
import org.gradle.api.internal.changedetection.state.InMemoryTaskArtifactCache | ||
|
||
println "Flushing InMemoryTaskArtifactCache" | ||
gradle.taskGraph.whenReady { | ||
gradle.services.get(InMemoryTaskArtifactCache).invalidateAll() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the cache invalidation. Without the fix and the cache invalidation, the test won't fail. Thanks to the invalidation, we are able to make those test fail. That's the reason why I feel uneasy with removing these lines.
import org.gradle.api.internal.changedetection.state.InMemoryTaskArtifactCache | ||
|
||
gradle.taskGraph.whenReady { | ||
gradle.services.get(InMemoryTaskArtifactCache).invalidateAll() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Same concern as above.
import java.io.Serializable; | ||
|
||
class DefaultInputProperty implements InputProperty { | ||
private transient Object inputProperty; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was using it as a cache for 2 reasons:
- We don't need to deserialize the object if it was already deserialized before.
- In order to deserialize the input property, we need the right class loader (which was the original issue). In the implemented solution, we know that at least one of the two
DefaultInputProperty
will have the instance of a deserialized input property (as it's the current build session). We simply use the class loader of that object to deserialize the other one.
Reason 1 is probably negligible. Could you elaborate on how to solve reason 2 without causing ClassNotFoundException
?
tryDeserialize(this, rhs); | ||
} catch (IOException e) { | ||
// In presence of corruption, both object cannot assert to be equal | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a very good idea. I will shelve this for today and will look more in depth how to achieve this.
} | ||
} | ||
|
||
private static boolean isBinaryComparableProperty(Object inputProperty) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. I wasn't sure how to cleanly organize two distinct logic for equality inside a single class. The binary comparison logic is straightforward, however, the default logic is quite complex. Given that you suggest not keeping an instance of inputProperty
inside DefaultInputProperty
, how can we move the isBinaryComparableProperty
logic inside DefaultInputProperty.equals
without having access to inputProperty.getClass()
?
|
||
InputPropertiesSerializer(ClassLoader classloader) { | ||
this.serializer = new MapSerializer<String, Object>(BaseSerializerFactory.STRING_SERIALIZER, new DefaultSerializer<Object>(classloader)); | ||
this.serializer = new MapSerializer<String, InputProperty>(BaseSerializerFactory.STRING_SERIALIZER, new DefaultSerializer<InputProperty>(classloader)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense, since I looked a lot at the Serializer
lately, I know what you mean. I will make the change.
|
||
import java.io.Serializable; | ||
|
||
public interface InputProperty extends Serializable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above, I will make the change.
try { | ||
HasherUtil.putObject(hasher, value); | ||
} catch (NotSerializableException e) { | ||
throw new UncheckedIOException(e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you elaborate on this, I'm not sure I fully understand?
The previous code would simply try to cast all Object
to Serializable
and use SerializationUtils.serialize
as a last resort action. However, if the class isn't Serializable
it will throw a ClassCastException
. I simply tried to mimic the behavior from how input property was previously serialized without changing the behavior for BuildCacheKey
. If the class isn't Serializable
it throws a NotSerializableException
so we can handle it as needed for BuildCacheKey
and InputProperty
.
@@ -39,8 +37,7 @@ class TaskInputPropertiesIntegrationTest extends AbstractIntegrationSpec { | |||
""" | |||
|
|||
when: fails "foo" | |||
then: failure.assertHasDescription("Could not add entry ':foo' to cache taskHistory.bin") | |||
then: failure.assertHasCause("Unable to store task input properties. Property 'b' with value 'xxx' cannot be serialized.") | |||
then: failure.assertHasDescription("Unable to hash task input properties. Property 'b' with value 'xxx' cannot be serialized.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have the feeling that removing the call to TaskHistoryStore#flush()
isn't as simple as it may be. Given the following comment in the code:
// This is not strictly necessary, as the caches are closed immediately after this. Calling flush here rethrows any write failures inside the context of the build
// Instead, failures thrown when stopping or closing a service should be treated as build failures
gradle.getServices().get(TaskHistoryStore.class).flush();
It seems to imply the flush()
is required to collect any write failures and treat them as build failures. Removing that call may result in some feature loose. I would suggest addressing this independently of this PR as more code would need to be changed somewhat unrelated to this code.
@lacasseio some comments:
I added that comment. That call was added to make a test pass, and with the changes in the PR this call will not be required, for us to report problems serializing user provided values, which is the failure this call is trying to deal with. Other failures, which we're not concerned about here, will still be reported.
This means we're missing some other coverage. We shouldn't have functional tests that use internal APIs and do not exercise Gradle in a way that a user would not. This is the basic problem here:
This is the test case I would use.
Exactly the way you have, just without retaining the deserialized value in the entry.
Don't use
This is the wrong place to do the serialization, it's too late and it duplicates the other work that happens earlier. The serialization and hashing of arbitrary objects should happen as part of calculating the input properties, so that this result can be reused both as part of the cache key and in the task history for later local builds. When we're building the cache key, we already have a hash for the input property value, and this is what we should be adding to the cache key. |
A new API for DiffUtil was added to support this move. This new API abstract the equality check into an Equalizer interface.
Thanks for all the detail you added in the code review @adammurdoch. I pushed most of the change to address your code review comment. I didn't polish the code and test as I want to double check with you to see if those changes align with what you had in mind. With those changes, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. A couple of things I think we should change.
return inputProperty.getRawValue(); | ||
} | ||
|
||
private static boolean isEqualsMethodOverriden(Object obj) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is going to be a performance issue, as lookup in reflection is quite slow, and we're doing this check for both values for every input property of every task. We could either cache the result of the lookup, or remove the check (and just warn people that 'equals()' returns a different result to comparing the hashes).
|
||
@Override | ||
public InputProperty read(Decoder decoder) throws Exception { | ||
byte[] serializedBytes = SERIALIZER.read(decoder); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can be decoder.readBinary()
@Override | ||
public InputProperty read(Decoder decoder) throws Exception { | ||
byte[] serializedBytes = SERIALIZER.read(decoder); | ||
HashCode hash = HashCode.fromBytes(SERIALIZER.read(decoder)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should reuse HashCodeSerializer
This can be closed now. I ended up implementing this fix in a somewhat different way, closer to what I needed for caching dependency transforms. |
Thanks a lot @adammurdoch, let's close it now. |
There are various use cases where user defined Serializable task input
properties caused
ClassNotFoundException
(#784). This issolved by hashing the task input properties and saving those hash for
the up-to-date check during future build session (#919).
This new behavior deprecates equals method on any user-defined type
used as task input properties.
Open Issues
equals
method on user-defined types?This change is