ci: Drop codeql workflow#41277
Conversation
Claude review of PR #41277 (a72f32f)<!-- claude-pr-review --> Clean, well-justified removal of the CodeQL workflow and all associated configuration/query files. The commit message is thorough with concrete data backing the decision. Two documentation files still reference CodeQL and the deleted files, and should be updated to match. Suggestions
|
After analyzing all 218 CodeQL alerts across the project's history, the workflow has not justified its CI cost: - The most impactful query (PotentiallyDangerousFunction) was a custom systemd-specific query that has already been replaced by clang-tidy's bugprone-unsafe-functions check (6fb5ec3). - Of the remaining C++ queries, 6 never triggered at all (bad-strncpy-size, unsafe-strcat, unsafe-strncat, suspicious-pointer-scaling, suspicious-pointer-scaling-void, inconsistent-null-check). - Several high-value-sounding queries had extreme false positive rates: toctou-race-condition (95% FP), use-after-free (88% FP), cleartext-transmission (100% FP). - Many queries that did trigger are already covered by compiler warnings (-Wshadow, -Wformat, -Wunused-variable, -Wreturn-type, -Wtautological-compare) or existing clang-tidy checks (bugprone-sizeof-expression). - Across all alerts, only 3 genuinely useful C++ fixes can be attributed to CodeQL: 1 tainted-format-string, 2 incorrectly-checked-scanf. The rest were either false positives or incidental fixes during refactoring that weren't prompted by CodeQL. - The Python queries are largely superseded by ruff (already in CI) and had an 89% false positive rate on the security-focused checks. The workflow consumed significant CI resources (40+ minutes per run) and the ongoing maintenance burden of triaging false positives outweighs the marginal value of the 2-3 real findings it produced across its entire lifetime.
|
Not a huge fan of dropping CI checks - we need more of these, rather than less... |
This thing has basically only been producing false positives. Let's give ourselves back the CI capacity for more useful workflows rather than spending compute capacity on something that only produces false positives. I'll replace it with a workflow that runs |
|
I guess that this is in response to my comments about CodeQL in #40880, where it filed dozens of identical and completely useless comments. I think we should merge this. CI fatigue is a real problem, and CI that has so high false positive rate contributes heavily to people not looking at CI. CodeQL has the additional problem that it files top-level comments instead of inline comments and they are annoying to hide. |
|
I talked about this with Luca and we agreed to merge this. |
| pushed to `main`. The list of active alerts can be found | ||
| [here](https://github.com/systemd/systemd/security/code-scanning). | ||
|
|
||
| 15. Each PR is automatically tested with [Address Sanitizer](https://clang.llvm.org/docs/AddressSanitizer.html) |
After analyzing all 218 CodeQL alerts across the project's history, the workflow has not justified its CI cost:
The most impactful query (PotentiallyDangerousFunction) was a custom systemd-specific query that has already been replaced by clang-tidy's bugprone-unsafe-functions check (6fb5ec3).
Of the remaining C++ queries, 6 never triggered at all (bad-strncpy-size, unsafe-strcat, unsafe-strncat, suspicious-pointer-scaling, suspicious-pointer-scaling-void, inconsistent-null-check).
Several high-value-sounding queries had extreme false positive rates: toctou-race-condition (95% FP), use-after-free (88% FP), cleartext-transmission (100% FP).
Many queries that did trigger are already covered by compiler warnings (-Wshadow, -Wformat, -Wunused-variable, -Wreturn-type, -Wtautological-compare) or existing clang-tidy checks (bugprone-sizeof-expression).
Across all alerts, only 3 genuinely useful C++ fixes can be attributed to CodeQL: 1 tainted-format-string, 2 incorrectly-checked-scanf. The rest were either false positives or incidental fixes during refactoring that weren't prompted by CodeQL.
The Python queries are largely superseded by ruff (already in CI) and had an 89% false positive rate on the security-focused checks.
The workflow consumed significant CI resources (40+ minutes per run) and the ongoing maintenance burden of triaging false positives outweighs the marginal value of the 2-3 real findings it produced across its entire lifetime.