New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[java] RFC: new layer of abstraction atop PrimaryExpressions #497
Comments
A design question here: would this layer of abstraction be constructed by the type resolution visitor, or by a dedicated visitor? I think the latter would be preferable, because we could still use that representation without type resolution (it would lack type info but could still be usable --- not for metrics though), and it would be less entangled into type resolution. We'd need to see if that visitor would always execute, or would be conditioned on whether it's used by a rule of the ruleset, like type resolution or metrics. Another point is that using such a feature for metrics would make Metrics depend on type resolution. Benchmarking PMD on its own sources with a simple ruleset showed an execution time of 0.092 seconds for the metrics visitation vs 2.270 seconds for type resolution. So it might make sense to try to limit that dependency to those metrics that need it, keeping in mind that type resolution will see some more performance improvements in the near future |
I'm not so certain about this assertion. It's true we have refrained from doing so in the past, but given we are now using semantic versioning, we may consider changing the AST on major releases such as 6.0.0. Particularly on cases such as this one, where our parsers is not in line with the grammar defined by the JSL. I agree with you, the type resolution for intermediate nodes being lost is not ideal. Not only for metrics, but for other rules too. Non the less, I'd love to hear what @adangel and @ryan-gustafson may have to say.
By default all java rules enable type resolution. This shouldn't be an issue on itself. And yes, PMD will keep on getting better performance as we keep working. |
Hm... yes, I didn't think about changing the AST, but yes - this would indeed be possible for 6.0.0. And if we move towards a more JSL compliant grammar, even better! The existing rule tests should show us, what we need to adjust after changing the grammar. I'm wondering, whether there is a more light-weight solution (e.g. without a separate visitor) possible: What about enhancing the ASTPrimaryExpression class with a method |
@adangel For starters I may use the representation of I like the idea of not using a visitor, and build this representation on demand. That would entail very minimal changes to the type resolution visitor, which would not have the responsibility of building or populating the representation. The changes to the AST could be very lightweight too, at the very least we could store a list of the types accessed in each ASTPrimaryPrefix for example, and let the VirtualExpression use these types to populate its representation when it's being built. If we decided to change the grammar to e.g. add more nodes to carry types, our parsers would get even farther from the JLS as they are now though...
@jsotuyod Will this behaviour be continued? I mean, there's less and less ground to disable type resolution by default as the performance of this utility improves, but what would be the point of having a specific xml attribute for it then? |
Yes, I foresee a time where symbol table / type resolution will be tightly coupled / merged. Each requires one-another at different times. Having type resolution being able to work on-demand on subtrees may help on this, but that's still to be seen.
@adangel this is assuming we don't change the grammar right now, but do so in the future, right? otherwise I am not fully following your comments...
@oowekyala I'm not sure what changes you are referring to here... the change I'm proposing would actually better follow the JSL.. check this Primaries are parsed with left recursion. Your example
That would retain the same node-separation as the JSL ('though not the same names), and would be actually more similar to what we currently do. Also, we would "consume" the '.' tokens and not have them as part of the images, which would allow to simplify several rules currently handling this on their own.
Yes, I foresee that being kept around. The attribute was added back when it was still on a very early stage. I am okay with keeping it around for backwards compatibility (we may deprecate it at some point). Non the less, keeping it around for other languages that have currently no support for type resolution but may in the future would be nice (Scala, Groovy, Kotlin, etc.). |
@jsotuyod Ok I misunderstood. The changes you're proposing totally make sense and would make Primaries much easier to parse. Just wondering, could we add specific nodes for each kind of primary? Like, one for method invocation, one for field access, etc. That would allow us to define specific methods for each kind and make each one much easier to use. This could perhaps be done by adding a specific child for each primary, eg for
Maybe
Okay that makes sense to me now. Thanks for explaining :) |
Hi,
For my work on the metrics framework, I need a way to find the qualified names of the methods that are called and fields that are accessed in a block of code. This is necessary to resolve accurately the usage of methods and data across a multi file project, and I believe it's the next big step of my project.
Type resolution could greatly help to do so, but after a discussion with @WinterGrascph, it appears it's not exhaustive enough yet: it currently populates the type of AST nodes, and the parser groups some elements in the same nodes. For instance, an expression such as
a.field.foo()
would be parsed intoPrimary[Prefix[a.field.foo], Suffix[()]]
, and type resolution would only populate the type of the prefix (type offoo
method invocation) and the suffix containing the () arguments and the Primary expression would have the same types as well. That means we miss the type ofa
, anda.field
. Similarly, when a similar expression is parenthesized ((a).field.foo()
), it would be parsed intoPrimary[Prefix[Expression[(a)]], Suffix[field], Suffix[foo], Suffix[()]]
, here the prefix with(a)
would have the type ofa
, suffix field would have field's type assigned and so on.There's a couple of thing here that deserve attention:
Since we can't change the way the AST is parsed, one solution to these problems would be to create another layer of abstraction atop
PrimaryExpression
s. This layer could bind a data structure that would represent the expression's structure in a more intuitive way. This wouldI imagine such a representation could look like the following for expression
a.field.foo()
:The representation would probably be created and populated by the type resolution visitor. Since the implementation of this feature is necessary for the metrics framework to move forward, I guess I could work on it with @WinterGrascph, which would need some organisation
Comments? @adangel @jsotuyod
The text was updated successfully, but these errors were encountered: