Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[java] New Rule: Use Explicit Types #2847

Closed
numeralnathan opened this issue Oct 22, 2020 · 22 comments · Fixed by #4561
Closed

[java] New Rule: Use Explicit Types #2847

numeralnathan opened this issue Oct 22, 2020 · 22 comments · Fixed by #4561
Labels
a:new-rule Proposal to add a new built-in rule
Milestone

Comments

@numeralnathan
Copy link

numeralnathan commented Oct 22, 2020

Proposed Rule Name: UseExplicitTypes

Proposed Category: One of [Best Practices | Error Prone]

Description:

Java 10 introduced the var keyword. This reduces the amount of typing but decreases the reading comprehension of the code. Please create a rule that flags the use of the var keyword.

Code Sample:

The line with var index should be flagged as a problem.

public static void myMethod()
{
   var index = 0;
}

Here is the "correct" code.

public static void myMethod()
{
   int index = 0;
}

Possible Properties:

  • localVariables: One of [allow | disallow]. Defaults to disallow.
  • forLoopIndexes: One of [allow | disallow]. Defaults to disallow.
  • tryWithResources: One of [allow | disallow]. Defaults to disallow.
  • primitiveTypes: One of [allow | disallow]. Defaults to disallow. If true, then var can replace a primitive type as long as one of the above properties allows the location.

Are there any other places where var can be used? If so, add those as properties.

@numeralnathan numeralnathan added the a:new-rule Proposal to add a new built-in rule label Oct 22, 2020
@oowekyala
Copy link
Member

Sounds like it could be useful in some codebases.

I think it's not useful to have separate properties for local variables, for loops, etc. Either you want explicit types or you don't, the syntactic context is not relevant IMHO.

Maybe an interesting property would be to allow the use of var if the explicit type is very long? Eg Map.Entry<String, List<SomeItem>>, which is a pain to write when iterating over maps

Also, name suggestion: UseExplicitTypes

@numeralnathan
Copy link
Author

I think it's not useful to have separate properties for local variables

I do not know how useful this is. I personally would like to never see a var. I wonder if some people would allow var in some cases. In any case, adding the properties means 1 less feature request in the future.

Can a var be used in a lambda parameter (e.g. (final var param1) -> myMethod(param1))? If so, add this property. I can see that this would be used just to declare a lambda parameter as final. I may be willing to allow this since lambda parameters already rarely have types specified.

I like the name UseExplicitTypes. It gets rid of the negative similar to what the rule ConfusingTernary does for if statements.

@adangel adangel changed the title [java] No var [java] UseExplicitTypes Oct 22, 2020
@numeralnathan numeralnathan changed the title [java] UseExplicitTypes [java] Use Explicit Types Oct 22, 2020
@linusjf
Copy link

linusjf commented Oct 25, 2020

@numeralnathan
Copy link
Author

Thanks for the links! Those are interesting reads.

So, if you use var then you have to provide a much better variable name. In the end, you save some typing by not writing the data type but you add some typing because you have to give a much better name. Although a better name helps with reading the rest of the method.

@linusjf
Copy link

linusjf commented Oct 25, 2020

Precisely so! You could create a rule for meaningful variable names for var.
I don't think it's worth extending it to typed variables. This rule, however, can become highly subjective. I'm not aware if any static analysis tool supports such a rule. Do you know of any? This applies more to untyped languages.

@numeralnathan
Copy link
Author

numeralnathan commented Oct 25, 2020

As a human, coming up with meaningful names is sometimes hard. I have to spend up to a few minutes. The links help give some ideas.

I figure we would need some AI to be able to replace a human. We could use a generative adversarial network (GAN) fed with good variable names and the data types. IDEs would use the generator for code auto-completion (i.e. the programmer types a data type and the IDE prompts with variable names). PMD would use the discriminator to detect variable names that do not go with certain data types. For example, a variable name such as index would be good in a for loop or addressing elements in an array or List (i.e. int index). But, would probably be a horrible choice for the name for an array or List (i.e. int index[]).

However, good variable names is not the point of this ticket and should be put in a separate a ticket. (See 2868). This ticket simply bans the use of var. If we had a rule that flagged bad variable names, then perhaps this rule would not be enabled by default. I still would want the UseExplicitTypes rule.

@linusjf
Copy link

linusjf commented Oct 25, 2020

I think it's not useful to have separate properties for local variables

I do not know how useful this is. I personally would like to never see a var. I wonder if some people would allow var in some cases. In any case, adding the properties means 1 less feature request in the future.

Can a var be used in a lambda parameter (e.g. (final var param1) -> myMethod(param1))? If so, add this property. I can see that this would be used just to declare a lambda parameter as final. I may be willing to allow this since lambda parameters already rarely have types specified.

I like the name UseExplicitTypes. It gets rid of the negative similar to what the rule ConfusingTernary does for if statements.

As far as I can see, var is syntactic sugar. I haven't used it much myself; adoption of new features and classes in the language and JDK lags their introduction by a few releases. That's where static analysis engines such as PMD come in to enforce their utilization.
Specifically, for var, besides lambdas, I wouldn't use it in code blocks longer than 8-10 lines. It would become very difficult to keep track of its assigned type, otherwise. As for saving character spaces, with primitive types at most three characters will be reduced. Should we even allow var to replace primitive types?

@linusjf
Copy link

linusjf commented Oct 25, 2020

The only other reason to use var is to circumvent rules such as LooseCoupling,
which enforces coding to interfaces or abstract data types.
#2822 (comment)

@numeralnathan
Copy link
Author

LooseCoupling only works on fields, method parameters and method returns types. var cannot be used in any of these places. Is there a rule that var can circumvent? If so, I am wondering if the rule needs to be enhanced.

@numeralnathan
Copy link
Author

Should we even allow var to replace primitive types?

When in doubt, add a property and let the development team decide.

@linusjf
Copy link

linusjf commented Oct 25, 2020

LooseCoupling only works on fields, method parameters and method returns types. var cannot be used in any of these places. Is there a rule that var can circumvent? If so, I am wondering if the rule needs to be enhanced.

Hmm....I seem to have missed that. I've been coding against interfaces and ADTs in my test cases. Now, I'm wondering why I went to all that trouble? Extend the rule, perhaps, retaining the defaults?

@linusjf
Copy link

linusjf commented Oct 25, 2020

Should we even allow var to replace primitive types?

When in doubt, add a property and let the development team decide.

I revisited the article:

http://openjdk.java.net/projects/amber/LVTIstyle.html

var is discouraged for primitive types since it can result in ambiguity. It should not be permitted. Period.

@cvmocanu
Copy link

Adding my +1 here.
I would really like to be able to use this rule.

Personally, I think adding var to the Java language was a mistake, because it makes the code harder to read in ALL cases.

Case no. 1 (really bad)
When the right-hand-side is a method call, it's impossible to tell the type:

final var user = getUser(userId);

Try figuring out the type in a GitHub PR for code like that. Especially if you have both UserEntity and UserDto classes in the codebase.

Case no. 2 (more acceptable)
var wastes brain cycles even in simple cases (where the type is directly obvious) like this:

final var message = "The message";

This is because for some variable declarations the type is to the left of the variable name (for those not using var), while for other declarations you need to scan what's on the right side of the = sign (when using var). Having 2 ways to find the types unnecessarily wastes brain cycles that could be used to better understand the actual code, rather than Java syntax.

Personally, I would be very satisfied with having a rule that prohibits var everywhere.
Of course, if there is demand, I'm OK with having properties for this rule, as long as I can configure it to prohibit var in all cases.

@tkalliom
Copy link

One option I would like to see: allow var only when assigning an object constructed with a direct constructor reference. The reason to avoid var is to make sure you can see the actual type where the variable is declared, but in this case the constructor provides that information and a second mention of the type in the same statement is just redundancy. Meaning:

// fail - this line does not tell you what type `x` is
var x = performSomeOperation();

// pass - you can see that `y` is a `Foo`
var y = new Foo();

@linusjf
Copy link

linusjf commented May 11, 2023

This rule would not be so much of an issue if you had a tool that infers the correct type for var and replaces it in the codebase. Is PMD likely to go that way?

@numeralnathan
Copy link
Author

MyClass myObject = new MyClass(); is harder to read. The method ends up having variables declared all over and it takes longer to find where the variable is declared. This is a carry over from C++ where separating the declaration from the initialization has a performance cost of double initialization. This is also a carry over from text books where the publisher wants to save space.

The following code is much easier to read because all the variables are declared at the top of the method. Methods are encouraged to be short due to other PMD rules and testability so the variable declarations are always visible on the screen.

MyClass myObject;

...
myObject = new MyClass();

@linusjf
Copy link

linusjf commented May 11, 2023

MyClass myObject = new MyClass(); is harder to read. The method ends up having variables declared all over and it takes longer to find where the variable is declared. This is a carry over from C++ where separating the declaration from the initialization has a performance cost of double initialization. This is also a carry over from text books where the publisher wants to save space.

The following code is much easier to read because all the variables are declared at the top of the method. Methods are encouraged to be short due to other PMD rules and testability so the variable declarations are always visible on the screen.

MyClass myObject;

...
myObject = new MyClass();

I've rarely come across this style of coding for Java. Can you point to some code-bases (on GitHub) that follow your suggested methodology?

@numeralnathan
Copy link
Author

Section 6.3 on https://www.oracle.com/java/technologies/javase/codeconventions-declarations.html is close. The code still initializes and declares on the same line. This doesn't make sense. You don't want to initialize all the variables with dummy values just to have the variables at the top of the method.

Then we have the Google Java Style Guide section 4.8.2.2 that says to declare variables when needed. https://google.github.io/styleguide/javaguide.html#s4.8-specific-constructs

Instead of guides and code examples, we need to find a study that shows which style leads to faster code reading and comprehension.

@cvmocanu
Copy link

cvmocanu commented May 11, 2023

Hi @numeralnathan

MyClass myObject = new MyClass(); is harder to read.

See my comment above. I used to think the same. But now I think that MyClass myObject = new MyClass(); is easier to read than var myObject = new MyClass();. This is because our eyes are used to see the types to the left of the variable. If we are looking for the types sometimes to the left and sometimes to the right, I'm pretty sure that takes our brains an extra fraction of a second to parse the line, and as far as I'm concerned, it's not worth it.

I would be very happy for a rule that forbids var completely.
Of course, if other devs want something else, maybe the rule should be configurable (forbid completely, only allow primitives and constructor calls, etc.)

In Kotlin, things are a bit better - the type is always to the right of the variable name, but in Java, as far as I'm concerned var is an anti-feature.

@cvmocanu
Copy link

The following code is much easier to read because all the variables are declared at the top of the method. Methods are encouraged to be short due to other PMD rules and testability so the variable declarations are always visible on the screen.

MyClass myObject;

...
myObject = new MyClass();

There's one disadvantage of doing things this way: the "extract method" refactoring in IntelliJ will not work very well. That refactoring works best if the variables are declared as close as possible to the code that uses them.

@cvmocanu
Copy link

cvmocanu commented May 12, 2023

For others like me, looking to forbid the usage of var, the following CheckStyle rule does it:

    <module name="TreeWalker">

        ...

        <!--
            Disallow usage of variable type inference using the `var` keyword.

            Using `var` makes the code harder to read in all contexts except IntelliJ,
            because the brain has to work extra to try to understand what is the type,
            and in some cases it's impossible to know it - e.g. `var house = getHouse()`.
            This is particularly nasty when doing PR reviews in GitHub.

            There are a few where the type is directly visible:
            * in primitive assignments: `var done = false`, `var maxAge = 39'
            * in constructor assignments: `var person = new Person(...)`
            While these cases are much better, they still pose a problem: our brains
            are used in Java code to see the type to the left of the variable name.
            If we are mixing `var` and non-`var` declarations, now the type will
            sometimes be shown to the left of the variable name (non-`var`), and
            sometimes to the right of the variable name (when using `var`).
            This adds a bit of extra work for the brain, so `var` is also not
            recommended in these cases. The extra brain power can be better used
            to understand the intent of the code, rather than to parse Java syntax.
        -->
        <module name="MatchXpath">
            <property name="query" value="//VARIABLE_DEF/TYPE/IDENT[@text='var']"/>
            <message key="matchxpath.match" value="The `var` keyword should be avoided to keep the code easier to understand"/>
        </module>
    </module>

I strongly suspect this can also be done with PMD's XPathRule.

@jsotuyod
Copy link
Member

jsotuyod commented May 12, 2023

I strongly suspect this can also be done with PMD's XPathRule.

Absolutely! You can always use the out of the box rule designer for easily creating these… but it would go something along the lines…

//LocalVariableDeclaration[@TypeInferred=true()]

The attribute @TypeInferred means it's declared using var.

You can filter out scenarios for where this is used (try-with-resources, for loop, etc.) by looking at ancestors.

@adangel adangel changed the title [java] Use Explicit Types [java] New Rule: Use Explicit Types May 14, 2023
@adangel adangel added this to the 7.0.0 milestone Oct 6, 2023
adangel added a commit to adangel/pmd that referenced this issue Oct 6, 2023
@adangel adangel closed this as completed in d49178a Oct 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a:new-rule Proposal to add a new built-in rule
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants