Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errorprone rule to check that all fields in immutables builders have been initialized #1504

Merged
merged 19 commits into from
Oct 12, 2020

Conversation

jackwickham
Copy link
Contributor

@jackwickham jackwickham commented Oct 9, 2020

Before this PR

The only way to get compile time safety that all fields in a builder have been initialized is to enable staged builders. This has a runtime cost (it results in significantly more generated classes), and the refactoring cost is high (staged builders constrain the order in which fields can be initialized). Without it, it's easy to accidentally miss a field on a builder, resulting in a ci-time or runtime exception.

After this PR

==COMMIT_MSG==
Add an Errorprone rule to check that all fields in Immutables builders have been initialized
==COMMIT_MSG==

Checking that all of the fields have been initialized is undecidable in general, so this implementation is limited to the common case where: the builder is constructed in place using new ImmutableType.Builder() or new Type.Builder() (where Type.Builder extends ImmutableType.Builder), or using a method that does nothing other than construct and return the builder; and the builder is never stored to a variable, so the builder only lives within a single statement.

This means that these snippets will be checked:

new Type.Builder().name("name").build();
Type.builder().name("name").build();

// Type.java
Builder builder() { return new Builder(); }
Type.builder().name("name").build();

//Type.java
ImmutableType.Builder builder() { return ImmutableType.builder(); }

but these will not (they are ignored and no diagnostics are emitted):

// Builder cannot be stored into an intermediate variable
Type.Builder builder = new Type.Builder();
builder.name("name").build();
builder().build();

// If the builder comes from a method, that method can't do anything other than construct and return it
Builder builder() { return new Builder().name("name"); }

Possible downsides?

I have tested this on a handful of services, but it's certainly possible that there is an edge case that I've missed which would cause false positives in other places.

During testing, I found a couple of true positive matches in dead code. If that is the case in other places, upgrading to this version will require manual action in a non-negligible number of services, but I think that is acceptable because it is removing incorrect code.

I also found some unit tests that are expecting an exception when missing required fields - these tests would need to suppress this warning.

The implementation that I decided to go for relies on Immutables generating and using a bitmap and constants named INIT_BIT_(VARNAME) to track whether certain fields have been initialized - this has not been changed since 2015, but it is not part of the public API so there is no guarantee that it won't change in future. I think any changes that they could make that affect compatibility would only cause the check to false-negative rather than to false positive, and any breakage should be caught by the unit tests here.

@changelog-app
Copy link

changelog-app bot commented Oct 9, 2020

Generate changelog in changelog/@unreleased

Type

  • Feature
  • Improvement
  • Fix
  • Break
  • Deprecation
  • Manual task
  • Migration

Description

Add an Errorprone rule to check that all fields in Immutables builders have been initialized

Check the box to generate changelog(s)

  • Generate changelog entry

@policy-bot policy-bot bot requested a review from fawind October 9, 2020 15:40
README.md Outdated Show resolved Hide resolved
Co-authored-by: Carter Kozak <ckozak@ckozak.net>
@uschi2000
Copy link
Contributor

Can I propose to add a strict mode (can be in a separate PR) that guarantees that all objects that were build() are fully initialized, at the expense of disallowing the undecidable patterns outright?

@carterkozak
Copy link
Contributor

Can I propose to add a strict mode (can be in a separate PR) that guarantees that all objects that were build() are fully initialized, at the expense of disallowing the undecidable patterns outright?

Would this be a separate check that applies to *Builder types? While I like the idea as a way to suggest broader use of staged builders, strict checks force us to handle all edge cases rather than targeting the 99% of bugs that we can most reliably detect. There's a larger ongoing investment to keep them working properly that isn't always worthwhile.
In general we've stayed away from dual-mode flags in this project in favor of uniformity.

@jackwickham I've tried this out on a very large internal project, and so far all the findings have been legitimate. Very well done! I wonder if we can do something similar for conjure builders later on :-)

@jackwickham
Copy link
Contributor Author

Can I propose to add a strict mode (can be in a separate PR) that guarantees that all objects that were build() are fully initialized, at the expense of disallowing the undecidable patterns outright?

I'm not sure what a strict mode would actually look like - it would be weird to have eg a .builder() and .strictBuilder() method that do exactly the same thing but are treated differently by errorprone, and that would have to be in user code (the interface annotated with @Value.Immutable) rather than generated code so it would probably not get much use; I think it would be weird to use an annotation on the immutable interface because it's entirely a user concern; using comments isn't possible because at this point we are operating on the javac AST; and annotating the method where the call occurs is probably the most practical, but it's a very broad tool and seems a bit pointless because all it would do is ensure that you don't store the builder to a variable, so the effort of adding the annotation is the same as actually fixing it.

I wonder if we can do something similar for conjure builders later on :-)

I thought about this a bit - if we did, I think it would be good to update the generated code to annotate the builder methods with the fields that they populate, to avoid the tricks that are used here to find mandatory fields and match them to methods. For example,

@Initializes(field = "myList", required = false)
public Builder addMyList(...)

@Initializes(field = "myField", required = true)
public Builder setMyField(...)

We would be able to reuse the AST recursion from here, but it would harden the matching logic against unanticipated variants of generated code.


// Mandatory fields have a private static final constant in the generated builder named INIT_BIT_varname, where
// varname is the UPPER_UNDERSCORE version of the variable name. Find these fields to get the mandatory fields.
// nb. this isn't part of any immutables API, so could break.
Copy link
Contributor

@iamdanfox iamdanfox Oct 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is kinda amazing that it works! given that it relies on internal implementation details of immutables, I wonder if we coudl get this error-prone rule upstreamed into https://github.com/immutables/immutables so that if they ever decide to change the internal representation then CI will catch their change??

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One concern I would have with upstreaming this is that it's not totally compliant with custom @Value.Style annotations, and can never be because the annotation (eg @ImmutablesStyle internally) is often brought in as a compileOnly dependency, so if it's consumed from a different compilation unit (eg using immutable classes that were generated in the main sourceset from test code) it will not be possible to recover the original style. That only affects an edge case where the get style has been customized and the field with that part removed is an identifier (eg getPublic), so maybe it's not a big deal (I solved it here by hard coding the style to get* and is*and bailing if there are fields that can't be initialized after transforming like that). There are other ways that the style could in theory be changed (eg changing the name of the from method) that I don't think we would be able to handle.

Instead, if it were upstreamed then I'd prefer to go with an annotations-based approach, as mentioned above.

@robert3005
Copy link
Contributor

Fwiw this feature already exists in immutables under staged builders (you pay the cost of more code for compile time safety) https://immutables.github.io/immutable.html#staged-builder. Not sure how these two stack against each other. Notably staged builders will cause compilation breaks on existing code hence would require more work to migrate to. Wanted to throw it out here since I had desire for those kinds of checks recently and opted for trying out staged builders

@carterkozak
Copy link
Contributor

I completely agree that we should use staged builders instead of lenient builders in order to leverage compile-time checking. Migration is difficult to do, but we may be able to put some cycles into it. I think validation for cases that cannot be converted (abi compat constraints) is reasonable and provides much of the upside without forcing code churn or manual actions.

@jackwickham
Copy link
Contributor Author

Yeah, staged builders are good but I think it would take a considerable amount of manual effort to migrate onto them, especially in the non-linear control flow cases that I am also not checking here - until then, I think this offers a low-lift way to get most of the benefits without the downsides.

@uschi2000
Copy link
Contributor

@carterkozak @jackwickham lenient/strict could be done through annotations, for instance.

@jackwickham
Copy link
Contributor Author

I suppose a strict mode would provide similar assurances to refactoring to use staged builders, but without the refactoring cost (at the expense of less flexibility). In practice, the main pattern in production code that would fail in strict mode is

Builder builder = builder().setField(...);
someCollection.forEach(elem -> { doSomethingToElem(); builder.addElem(elem); });
builder.build();

which can be fairly painlessly refactored to move setField into the build statement to pass this check.

In tests there are more cases where methods return partially initialized builders, which would be much more annoying to fix. That case wouldn't be too hard to support here though.

It does still have to be opt in at the definition site though, which means it probably won't get much adoption unless we excavate it.

I think it's best to start with this, then we can look at whether a strict mode makes sense later.

@bulldozer-bot bulldozer-bot bot merged commit ff89a43 into develop Oct 12, 2020
@bulldozer-bot bulldozer-bot bot deleted the jw/check-builder-initialization branch October 12, 2020 17:23
@svc-autorelease
Copy link
Collaborator

Released 3.40.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants