-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Actions: Fix bad performance in getTargetPath
#19186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Seen on `github/codeql`, some queries had very poor performance: ``` [2/24 eval 36m4s] Evaluation done; writing results to codeql/actions-queries/Security/CWE-312/ExcessiveSecretsExposure.bqrs ``` Investigating further lead to the following worrying sequence of joins (after I ran out of patience and cancelled the query): ``` [2025-04-01 12:31:03] Tuple counts for Yaml::YamlInclude.getTargetPath/0#dispred#32565107#fb#reorder_1_0/2@i6#9f4b2jw1 after 8m40s: ... 559418 ~33% {1} r5 = SCAN `Yaml::YamlNode.getLocation/0#dispred#24555c57#prev_delta` OUTPUT In.1 ... 909345525 ~821% {3} r7 = JOIN r5 WITH `Yaml::YamlNode.getLocation/0#dispred#24555c57#prev` CARTESIAN PRODUCT OUTPUT Rhs.1, Lhs.0 'result', Rhs.0 909342139 ~779% {3} | JOIN WITH `Locations::Location.getFile/0#dispred#dcf38c8d#prev` ON FIRST 1 OUTPUT Rhs.1, Lhs.1 'result', Lhs.2 909338753 ~794% {3} | JOIN WITH containerparent_10#join_rhs ON FIRST 1 OUTPUT Rhs.1, Lhs.1 'result', Lhs.2 909335367 ~824% {3} | JOIN WITH `FileSystem::Container.getAbsolutePath/0#dispred#d234e6fa` ON FIRST 1 OUTPUT Lhs.2, Lhs.1 'result', Rhs.1 883246724 ~812% {3} | JOIN WITH `Yaml::YamlNode.getDocument/0#dispred#ee1eb3bf#bf_10#join_rhs` ON FIRST 1 OUTPUT Rhs.1 'this', Lhs.1 'result', Lhs.2 760047185 ~838% {5} | JOIN WITH yaml_scalars ON FIRST 1 OUTPUT Lhs.1 'result', Lhs.0 'this', Rhs.2, _, Lhs.2 0 ~0% {4} | REWRITE WITH Tmp.3 := "/", Out.3 := (In.4 ++ Tmp.3 ++ InOut.2), TEST Out.3 = InOut.0 KEEPING 4 {4} | REWRITE WITH NOT [TEST InOut.2 startsWith "/"] ... ``` The culprit turned out to be the following method on class `YamlInclude` ```ql private string getTargetPath() { exists(string path | path = this.getValue() | if path.matches("/%") then result = path else result = this.getDocument().getLocation().getFile().getParentContainer().getAbsolutePath() + "/" + path ) } ``` Basically, in the `else` branch, the evaluator was producing all possible values of `result` before filtering out the ones where the `path` component started with a forward slash. To fix this, I opted to factor out the logic into two helper predicates, each accounting for whether `this.getValue()` does or does not start with a `/`. With this, evaluating the original query from a clean cache takes roughly 3.3s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot wasn't able to review any files in this pull request.
Files not reviewed (1)
- shared/yaml/codeql/yaml/Yaml.qll: Language not supported
Tip: If you use Visual Studio Code, you can request a review from Copilot before you push from the "Source Control" tab. Learn more
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work, and makes sense. Just one question.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering a change note, but I think I'd rather attach one to the extractor change.
Seen on
github/codeql
, some queries had very poor performance:Investigating further lead to the following worrying sequence of joins (after I ran out of patience and cancelled the query):
The culprit turned out to be the following method on class
YamlInclude
Basically, in the
else
branch, the evaluator was producing all possible values ofresult
before filtering out the ones where thepath
component started with a forward slash.To fix this, I opted to factor out the logic into two helper predicates, each accounting for whether
this.getValue()
does or does not start with a/
. With this, evaluating the original query from a clean cache takes roughly 3.3s.