Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Java: Remove source dispatch when there's an exact match from a manual model. #16552

Merged

Conversation

aschackmull
Copy link
Contributor

This blocks source dispatch when we have an exact match from a manual model, thus improving analysis when analysing the db containing the source for a given manual model. I.e. this only affects projects for which there exists some number of manual models.

@owen-mc
Copy link
Contributor

owen-mc commented May 22, 2024

Can you explain in a bit more detail what this does? I'm not totally sure what "source dispatch" is.

From what I can tell, one effect of this PR is that using the shortcut of not listing the signature (because I know that there is only one method with that name, or because I don't want to spend time listing all the methods with that name) will now behave differently from having a model which has the signature. At the very least this should be communicated widely, and perhaps documented somewhere.

@aschackmull
Copy link
Contributor Author

Can you explain in a bit more detail what this does? I'm not totally sure what "source dispatch" is.

For every manual model, this change affects only one single projects, namely the one containing the source implementation. Take for example java.lang.String.replace. When analysing the jdk we would get dispatch to both the source implementation of String.replace and also the MaD-synthesised method body. This change ensures that we only get the latter. Most (all?) of our manual models that refer to non-interface methods are describing jdk methods, so this change is fairly targeted to the analysis of openjdk itself.

Whether or not we should include the "empty signature" check to determine whether we disregard the source dispatch target is something that we can change if we want - I opted for the most conservative version here. I'm not sure that this actually needs to be that widely documented, since it only affects analysis of one project (per manual model).

@owen-mc
Copy link
Contributor

owen-mc commented May 22, 2024

Okay. I do think this changes the best-practice to be including the signature, though as most modelling is automated this won't make too much difference.

Does this improve performance on the JDK then? Analysis time got 20% longer when I added all the df-generated models, and I wasn't able to figure out where the time was being spent.

@aschackmull
Copy link
Contributor Author

Does this improve performance on the JDK then?

Yes. I've only measured this in the context of #16500 where the 60% regression was reduced to a 19% regression.

@owen-mc
Copy link
Contributor

owen-mc commented May 22, 2024

It would be good to run DCA on this (possibly just with the JDK source) to quantify how much of the performance hit from adding the jdk models it mitigates. For reference, that made analysis time on the jdk go from 435.3s to 526.7s.

Should this change also be made to other languages? (Not necessarily in this PR.) I'm particularly aware of this as I've got a branch where I've updated the go version to match the java version as much as possible, and this PR will introduce another unnecessary difference.

@aschackmull
Copy link
Contributor Author

It would be good to run DCA on this

I've started dca.

Should this change also be made to other languages?

Probably, yes.

@aschackmull
Copy link
Contributor Author

Actually, from re-visiting some of my notes, I do think we have several cases where it's desirable to consider a model an exact match based solely on the package+type+name, and not consider the signature. So I'll try to make that change and kick off another dca.

@aschackmull
Copy link
Contributor Author

We might even want to go as far as looking for package matches rather than exact matches. That way a manual model for an interface would take complete precedence over any source implementations of that interface in the same package - even if they had direct calls that didn't go through the interface. But I think I'll leave that option for now and stick with the "exact match" concept. I'm just noting this for future reference.

@aschackmull
Copy link
Contributor Author

It would be good to run DCA on this (possibly just with the JDK source) to quantify how much of the performance hit from adding the jdk models it mitigates. For reference, that made analysis time on the jdk go from 435.3s to 526.7s.

The first variant showed a 23% time decrease from 598.7s to 459.7s. The second variant showed a 33% time decrease from 581s to 388.3s.

@aschackmull
Copy link
Contributor Author

Looks like I'll need to rebase to satisfy CI.

@aschackmull aschackmull force-pushed the java/no-source-dispatch-for-exact-mad branch from 474377c to f353065 Compare May 23, 2024 08:50
@owen-mc
Copy link
Contributor

owen-mc commented May 23, 2024

Final question: why choose the model over the source code? Do you think they're more accurate? Is performance an issue? I assume using the model is faster as the paths will be much simpler. I can see it's more consistent, as we use the models when analysing every other repo.

@aschackmull
Copy link
Contributor Author

Final question: why choose the model over the source code? Do you think they're more accurate? Is performance an issue? I assume using the model is faster as the paths will be much simpler. I can see it's more consistent, as we use the models when analysing every other repo.

All of those reasons apply. It's more consistent, paths will be simpler, flow will be more accurate, and as we can see, performance also improves significantly.

@aschackmull aschackmull merged commit 619913b into github:main May 23, 2024
31 checks passed
@aschackmull aschackmull deleted the java/no-source-dispatch-for-exact-mad branch May 23, 2024 10:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DataFlow Library Java no-change-note-required This PR does not need a change note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants