Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid returning null lists #106

Merged
merged 17 commits into from Sep 6, 2015
Merged

Conversation

ftomassetti
Copy link
Member

  • every list is now initialized as an empty list
  • for backward compatibility setters accept a null but convert it to an empty lists
  • a few bugs are found: the current code was checking for a list to be null but it did not consider the possibility it was empty

We have discussed about it on Gitter, I hope I did not forget any of the advices I received if I did please forgive me and remember me and I will update the pull requests.

@sebastiankirsch
Copy link
Contributor

Consistency wise, I'd prefer to either have if ... else everywhere or nowhere; currently it is both and at some places there are even recursive calls. When going for speed, the best way would be

if (aList == null) {
  this.aList = Collections.emptyList();
} else {
  this.aList = aList;
  setAsParentNodeOf(this.aList);
}

Seeing many places with if (aList == null || aList.isEmpty): I'd prefer introducing a Utility class providing

public static boolean isEmpty(Collection<?> c) {
    return c == null || c.isEmpty();
}

@sebastiankirsch
Copy link
Contributor

Another idea (sorry for not coming up with this earlier) is

public static <T> List<T> ensureNotNull(List<T> list) {
  return list == null ? emptyList() : list;
}

and then we end up with

this.aList = ensureNotNull(aList);
setAsParentNodeOf(this.aList);

which is shorter and better when it comes to communicating the intention.

@sebastiankirsch
Copy link
Contributor

I'm sorry for bringing this up again, but I just realized and can't remember if we fully discussed this: with this solution, we're introducing an inconsistency in that with those changes, you cannot add new elements to a list you did not set explicitly.
So I wonder if we should keep the behavior consistent and either create/return an immutable list for non-empty lists - which would probably break some code - or use something different for initializing the lists...

@ftomassetti
Copy link
Member Author

The reason for the different patterns is because some parameters are final, in that case I go for the if/else
I did not want to introduce/remove final in all these methods as part of this patch.

Don't be sorry for improving things :)

Yes, it is true that you cannot add elements to a list that you did not set, however you probably should not expect this to work: you cannot be sure we return the internal list instead of a copy. If we want to give the possibility for add to work I think we can instantiate a new mutable list instead of using Collections.emptyList, however there were some concerns with that in the past (lot of allocations).

@ftomassetti
Copy link
Member Author

In the meantime I will introduce the method:

public static boolean isEmpty(Collection<?> c) {
    return c == null || c.isEmpty();
}

@sebastiankirsch
Copy link
Contributor

But the final modifier only affects the implementation of the methods which is completely up to us - so I'd suggest to remove that modifier and strive for a consistent implementation.

I'm fine with returning immutable lists; but then let's document that.

@ftomassetti
Copy link
Member Author

  • moved isNullOrEmpty to a class in an internal package
  • creating ensureNotNull and using it consistently

About documenting the fact we could return an immutable list: would be enough to do that in the wiki or in the documentation at package level or do we have to repeat that for each list setters?

P.S. thanks for helping in improving this patch!

@sebastiankirsch
Copy link
Contributor

Judging from me, anything else than at each getter (<- not setter) is
useless. But that's just me...

2015-02-23 16:07 GMT+01:00 Federico Tomassetti notifications@github.com:

  • moved isNullOrEmpty to a class in an internal package
  • creating ensureNotNull and using it consistently

About documenting the fact we could return an immutable list: would be
enough to do that in the wiki or in the documentation at package level or
do we have to repeat that for each list setters?

P.S. thanks for helping in improving this patch!


Reply to this email directly or view it on GitHub
#106 (comment).

@ftomassetti
Copy link
Member Author

Ok, would it work to add this comment on each setters for lists?

    /**
     * @return the list returned could be immutable (in that case it will be empty)
     */

@sebastiankirsch
Copy link
Contributor

I would omit the brackets. Like @return a potentially immutable list

Federico Tomassetti notifications@github.com schrieb am Mon Feb 23 2015
at 4:21:27 PM:

Ok, would it work to add this comment on each setters for lists?

/**
 * @return the list returned could be immutable (in that case it will be empty)
 */


Reply to this email directly or view it on GitHub
#106 (comment).

@ptitjes
Copy link
Contributor

ptitjes commented Feb 24, 2015

As I explained on the chat, I think it is counter-intuitive (and inconsistent) to return mutable list and immutable empty lists.

The point of returning something else than null is to avoid some null check to users.
But don't forget that the ast is mutable and that we have users that modify the trees. For those, this will change their code from:

List<> interfaces = node.getInterfaces();
if (interfaces == null) {
    interfaces = new List<>();
    node.setInterfaces(interfaces);
}
// Do modifications

to

List<> interfaces = node.getInterfaces();
if (interfaces.isEmpty()) {
    interfaces = new List<>();
    node.setInterfaces(interfaces);
}
// Do modifications

I find the latter worse for the reason that it is an non-natural idiom for a java coder whereas the first one is (unfortunately, but that's life...).

So I would advise to either:

  • let methods return null (minimal tree size policy and consistent policy with the one-one getters as we don't have Option) and then document it correctly
  • let methods return empty mutable lists
  • let methods return immutable lists (even for the non-empty ones) and use a null object (aka Option) for one-one getters

@ftomassetti
Copy link
Member Author

I would vote for the mutable lists, possibly lazy initialized in the getters

@ptitjes
Copy link
Contributor

ptitjes commented Feb 26, 2015

I was thinking about externalizing all the new XXXNode(...) of the grammar in a factory that the parser have a pointer to. This is easy to do, just a little painful.

However, we could add a postProcessList(...) method to the factory (by default noop) called for every list arguments before calling the constructors and have two subclasses one that ensureNotNull the lists with empty immutable lists and one that ensureNotNull the lists with empty mutable lists.

Would that fit the bill ?

Edit: I must add that my idea for the node factory comes from the fact that in the next major release we could have two AST, an immutable and a mutable one. Details to be thought about but voilà!

@matozoid
Copy link
Contributor

Can someone show how that node factory is supposed to work?

@ftomassetti
Copy link
Member Author

After half a year of thinking about this issue, I would suggest to go for immutable lists lazily created (so created when someone call the getter). In our own code we had several bugs caused by this null lists and I think the same apply to the code written by users, so my personal opinion is that we should get rid of null lists. However this is a potential concern for performance (we are instantiating a lot of lists potentially). Returning the immutable list could reduce the performance hit but it is true that it could be confusing so we should probably avoid that.

Do we want to have a last round of opinion and close this PR if we do not find an acceptable compromise?

@matozoid
Copy link
Contributor

I feel it would lead to more complex code than simply initializing everything with an empty list, but I'm glad to pull anything that will kill those nulls. Nulls must die.

@SmiddyPence
Copy link
Member

Is it possible to quantify the performance degradation similar to in #113 ?

We're just guessing otherwise, there may be an obvious answer with some form a of bench marking.

@ftomassetti
Copy link
Member Author

I measured performance like this:

    public static void main(String[] args) throws IOException, ParseException {
        long a = System.currentTimeMillis();
        for (int i=0;i<50;i++) {
            JavaParser.parse(new File("javaparser-core/src/main/java/com/github/javaparser/JavaParser.java"));
            JavaParser.parse(new File("javaparser-core/src/main/java/com/github/javaparser/ASTHelper.java"));
            JavaParser.parse(new File("javaparser-core/src/main/java/com/github/javaparser/ast/visitor/CloneVisitor.java"));
        }
        long b = System.currentTimeMillis();
        System.out.println(b-a);
    }

With master I get: 1544, 1608, 1622
With this change I get: 1762, 1686, 1688
Note that the code tested is the one initializing lists at construction time and returning mutable lists. If the performance hit is too big we can revert to lazyload.

@ftomassetti
Copy link
Member Author

Note that some tests are broken, so if we think this is the way to go I will fix them and squash some commits before merging it.

@matozoid
Copy link
Contributor

Now we only need some power users to tell us what this increase would mean to them :) Or maybe we can decide on how fast we want to be. Just to make this thing SMART.

@ftomassetti
Copy link
Member Author

Assuming these measurements are meaningful (and they are probably not, we did not consider the initial "heating" of the JVM) we have an increase in time ranging from 4% to 14% (considering the best and the worst combinations of measures). To me it seems adequate given the amount of possible issues solved. We could maybe carve out some improvements with lazy initialization.

@ftomassetti
Copy link
Member Author

Now tests are passing. Note that the issue was due to the fact that DumpVisitor is checking for null instead of null or empty list. While we fixed several of these issues in the past there were still a few here and there. I suspect similar errors could be done by our users.

@ftomassetti
Copy link
Member Author

I would perform some measurement again with lazy instantiation

@ftomassetti
Copy link
Member Author

I measured the last version, which uses mutable lists and lazy loading (i.e., we check for null before returning the value in the getter).

I used this program: https://gist.github.com/ftomassetti/74217a4063da29bf3ced

With this patch it parsed 44340 in:

  • 19739, 19552 and 19914 ms
    using master:
  • 19577, 19575, 19455 ms

@ptitjes
Copy link
Contributor

ptitjes commented Sep 5, 2015

@ftomassetti Maybe you should also measure printing in your micro-benchmark. As lazy initialization will be triggered at tree traversal, printing performance might reflect it.

@ftomassetti
Copy link
Member Author

@ptitjes this is a fair point, but if IO is involved I think it is slow enough that the creation of lists should not be relevant. I will measure again, dumping to file.

There were also setters where the call to setAsParentOf was missing (I added it).

@ptitjes
Copy link
Contributor

ptitjes commented Sep 5, 2015

@ftomassetti You're right! I forgot that. Maybe tweaking a little SourcePrinter to use a Writer and giving it a noop Writer might be worth trying ?

@ftomassetti
Copy link
Member Author

I updated the script and dumped to file (I imagine this is a more typical scenario that dumping and throwing away the result). It took 26000 on master and 26513 on this branch.

@ptitjes
Copy link
Contributor

ptitjes commented Sep 5, 2015

Well, IMO this is really not a big overhead! This lazy-initializing solves the null encountering danger without sacrificing the mutability of the AST in JavaParser. This is worth it!

And you can still choose to later specialize the list by adding the parent as an argument to (maybe renamed) ensureNotNull() to setup/unsetup parent node to added/removed list elements.

@ftomassetti
Copy link
Member Author

Merging with the approval of @matozoid who is hopefully winning his match of Exploding Kittens

ftomassetti added a commit that referenced this pull request Sep 6, 2015
@ftomassetti ftomassetti merged commit defc804 into javaparser:master Sep 6, 2015
ftomassetti added a commit to ftomassetti/javaparser that referenced this pull request Jan 10, 2018
Issue103: apply JavaParser code style to all source code
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants