-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Overlay: Future-proof Java config & XML discard predicates #20484
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Overlay: Future-proof Java config & XML discard predicates #20484
Conversation
bb2ef45
to
1cc0c63
Compare
1cc0c63
to
bb9e773
Compare
bb9e773
to
dbb9a26
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR future-proofs overlay discarding for Java config and XML by switching from “any locatable exists in overlay” checks to using overlayChangedFiles, making it robust if extraction becomes incremental.
- getRawFile now supports XML locatables via xmllocations
- Java config discard now keys off overlayChangedFiles using per-entity raw file resolution
- XML module introduces DiscardableXml* classes and imports Overlay to support discard behavior
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
File | Description |
---|---|
java/ql/lib/semmle/code/xml/XML.qll | Adds Overlay import and DiscardableXml* classes to enable file-based discard behavior while excluding namespaces. |
java/ql/lib/semmle/code/java/Overlay.qll | Updates getRawFile to support XML locatables and reworks config discard to be based on overlayChangedFiles with a helper to resolve config raw files. |
private class DiscardableXmlAttribute extends DiscardableLocatable, @xmlattribute { } | ||
|
||
private class DiscardableXmlElement extends DiscardableLocatable, @xmlelement { } | ||
|
||
private class DiscardableXmlComment extends DiscardableLocatable, @xmlcomment { } | ||
|
||
private class DiscardableXmlCharacters extends DiscardableLocatable, @xmlcharacters { } | ||
|
||
private class DiscardableXmlDtd extends DiscardableLocatable, @xmldtd { } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Consider adding a brief comment explaining why these XML entities are marked discardable and that XML namespaces are intentionally excluded due to non-file-local TRAP keys. This will help future readers understand the scope and rationale of discard behavior.
Copilot uses AI. Check for mistakes.
string getRawFile(@locatable el) { | ||
exists(@location loc, @file file | | ||
hasLocation(el, loc) and | ||
(hasLocation(el, loc) or xmllocations(el, loc)) and | ||
locations_default(loc, file, _, _, _, _) and | ||
files(file, result) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Please add a short doc comment clarifying that getRawFile now handles XML locatables via xmllocations in addition to standard locations, since this broadens the function’s scope and improves discoverability.
Copilot uses AI. Check for mistakes.
/** Gets the raw file for a configLocatable. */ | ||
overlay[local] | ||
private predicate baseConfigLocatable(@configLocatable l) { not isOverlay() and exists(l) } | ||
|
||
overlay[local] | ||
private predicate overlayHasConfigLocatables() { | ||
isOverlay() and | ||
exists(@configLocatable el) | ||
} | ||
|
||
overlay[discard_entity] | ||
private predicate discardBaseConfigLocatable(@configLocatable el) { | ||
// The properties extractor is currently not incremental, so if | ||
// the overlay contains any config locatables, the overlay should | ||
// contain a full extraction and all config locatables from base | ||
// should be discarded. | ||
baseConfigLocatable(el) and overlayHasConfigLocatables() | ||
} | ||
|
||
overlay[local] | ||
private predicate baseXmlLocatable(@xmllocatable l) { | ||
not isOverlay() and not files(l, _) and not xmlNs(l, _, _, _) | ||
private string getRawFileForConfig(@configLocatable el) { | ||
exists(@location loc, @file file | | ||
configLocations(el, loc) and | ||
locations_default(loc, file, _, _, _, _) and | ||
files(file, result) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Consider making the doc a bit more explicit, e.g., “Returns the raw file path for a config locatable using configLocations; used to drive overlayChangedFiles-based discarding.” This clarifies how the function is used and why configLocations is the source.
Copilot uses AI. Check for mistakes.
Java property extraction and XML extraction is currently non-incremental, meaning that all property and XML files are extracted in the overlay. The current Java config and XML discard predicates makes use of this fact and discard all config locatables from base if any config locatables have been extracted in the overlay and likewise for xml locatables. This is sound, but will require an update of the discard predicate in lock-step with an extractor change if we later make Java property or XML extraction incremental. This PR updates the Java config and XML discard predicates to discard based on
overlayChangedFiles
instead. This is future-proof if we later make Java property or XML extraction incremental.Commit 1 future-proofs Java config discarding. All Java config entities use
*
TRAP ids and can therefore be discarded from base if they appear in files extracted in the overlay.Commit 2 future-proofs Java XML discarding. All XML entities except namespaces use
*
TRAP ids and can therefore be discarded from base if they appear in files extracted in the overlay. XML namespace entities use TRAP keys of the form@"$URI;xmlnamespace"
that are not file-local and cannot safely be discarded from base.With the current non-incremental extraction, unchanged Java property and XML files will be extracted in both base and overlay and will not be discarded from either. This causes apparent accuracy regressions in DCA experiments (variant vs baseline for
java/maven/non-https-url
,java/maven/dependency-upon-bintray
andjava/spring-boot-exposed-actuators-config
. I've inspected thejava/maven/non-https-url
regressions locally and they occur because the query selectsDeclaredRepository
which extend@xmlelement
entities that are extracted in both base and overlay with different ids. This causes an apparent regression at the bqrs-level, but at the SARIF level the duplicate results collapse to a single alert. The DCA alert-comparison which compares SARIF alerts shows no regressions (variant vs. baseline).