fix: validate pclntab header on s390x#337
Conversation
Adds a s390x Go 1.24 fixture that naturally contains a coincidental LE magic byte sequence inside its .gopclntab section data. ReadTable finds this false match before the real BE pclntab at offset 0, producing an empty symbol table. The shared fixture source (main.go) is extended with stdlib imports (crypto/tls, net/http) so the resulting .gopclntab is large enough for the false match to occur naturally. The test fails on the current code, demonstrating the bug. Signed-off-by: Bartosz Majsak <bartosz.majsak@gmail.com>
ReadTable used raw bytes.Index to locate the pclntab magic number, accepting the first 4-byte match without verifying the surrounding header. On s390x, the LE magic 0xF1FFFFFF appears coincidentally inside .gopclntab data of large binaries before the real BE pclntab at offset 0, producing an empty symbol table. Prior to the guard added in openshift#333, an empty symbol table caused all FIPS validation to be silently skipped. Validates the 8-byte pclntab header after each magic match and parse-validates candidates to ensure non-zero functions. Also adds .data.rel.ro.gopclntab to the section lookup for Go <= 1.25 internal PIE binaries and a defensive check for missing .text section. Signed-off-by: Bartosz Majsak <bartosz.majsak@gmail.com>
|
@bartoszmajsak: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
ide-developer
left a comment
There was a problem hiding this comment.
Good fix — the header validation approach is the right design for distinguishing coincidental magic byte matches from real pclntab headers. This addresses a real bug hitting RHOAI QE on s390x Go 1.24 binaries (Slack thread) and builds nicely on #330 and #333.
A few observations below, mostly around fixture reproducibility and test coverage. The core goscan.go logic looks solid.
Disclaimer: I am not CodeRabbit. I have bigger claws and I don't mass-produce emoji.
| _ = &http.Server{} | ||
| b, _ := json.Marshal(os.Args) | ||
| fmt.Println(string(b)) | ||
| } |
There was a problem hiding this comment.
This changes the shared fixture source from func main() {} to a larger program with stdlib imports. The existing committed fixtures (pie_go126_s390x_app, fips_compliant_app) were built from the old main.go, so after this PR the source no longer matches those binaries.
Future maintainers running build_pie_fixture.sh will now produce a different (larger) binary than what's committed. Consider either:
- keeping the old
main.goand creating a separatemain_large.gofor the new fixtures, or - rebuilding all existing fixtures from the new source and noting this in the commit message.
Not CodeRabbit — I was forged in the ELF mines of s390x.
There was a problem hiding this comment.
Good catch! I like the jokes!
| go build -a \ | ||
| -o "$SCRIPT_DIR/go124_s390x_app" . | ||
| echo "Built with: $(go version)" | ||
| echo "Output: $SCRIPT_DIR/go124_s390x_app" |
There was a problem hiding this comment.
There's a build script here for go124_s390x_app but none for go124_internal_pie_amd64_app. That amd64 fixture is a committed binary with no documented way to reproduce it. PR #330 included build scripts for all its fixtures — would be good to add one here too for consistency.
Not CodeRabbit — I actually read the ELF headers.
There was a problem hiding this comment.
I can follow-up with cross-arch builder for all binaries and test fixtures based on those (I have it ready). I would prefer to land this fix first as it's blocking some builds. WDYT @smith-xyz - is it scope creep for this PR/fix?
There was a problem hiding this comment.
yeah I agree - I've been in the process of setting up a pre commit job to run so we can have more smoke testing across ocp versions and archs
also going to checkout this ide-developer because I like how it acts like a normal reviewer 😄
There was a problem hiding this comment.
I suspect it's local agent from @jiridanek
There was a problem hiding this comment.
@bartoszmajsak exactly, it's not autonomous, i've been closely driving it, regarding the owl personality and so on
its a claude opus 4.6 running in cursor; and it's relatively knowledgeable because I opened an older session that accumulated lots of check-payload context; also cursor is supposed to have decent code indexing in the repo, which ad-hoc built agent would not have out of the box
There was a problem hiding this comment.
Very helpful review. Yet another proof that context is the king :) I wonder if we could contribute a review agent to this repo with all these rules and findings?
| } | ||
| }) | ||
| } | ||
| } |
There was a problem hiding this comment.
Nice table-driven refactor of the unit tests. One gap: PR #330 added both unit tests here and TestRunLocalScan integration tests in internal/scan/scan_test.go with mock unpacked directories. The new go124_s390x_app and go124_internal_pie_amd64_app fixtures aren't exercised through the full scan pipeline (OS validation -> binary discovery -> ReadTable -> symbol checks). Not a blocker, but worth considering for completeness.
Not CodeRabbit — I don't hop away from integration tests.
| if err != nil || len(symTable.Funcs) == 0 { | ||
| return nil | ||
| } | ||
| return symTable |
There was a problem hiding this comment.
tryParseTable returns nil on both gosym.NewTable errors and empty function tables, discarding the distinction. When all candidates fail, the final error is a generic "could not find valid pclntab" with no indication of why each candidate was rejected (bad header? parse error? zero funcs?). For debugging production failures, accumulating or logging the per-candidate rejection reason would help — e.g., the QE team in the Slack thread had to do quite a bit of detective work to get here.
Not CodeRabbit — I don't swallow my errors silently either.
There was a problem hiding this comment.
Probably worth addressing as a follow-up unless simple logging is enough? WDYT?
There was a problem hiding this comment.
Agree, follow-up is fine — this PR already fixes the actual scanning failure, no need to scope-creep it.
More detail on the two approaches I see
The internal/golang/ package currently has zero logging — it is a pure library that returns errors via fmt.Errorf. Two clean options:
Option A — Enrich the error string (keep golang/ as pure library). Instead of the generic "could not find valid pclntab", accumulate rejection reasons: "could not find valid pclntab in X (section=.gopclntab): tried 3 candidates: offset 0x1234 bad header, offset 0x5678 parse error, offset 0x9abc 0 funcs". No new imports needed, and the detail propagates all the way to the scan output — which is what the QE team actually sees.
Option B — Add klog.V(2) calls in ReadTable. This follows the pattern used in internal/scan/ and internal/validations/, but would introduce klog as a new dependency to the golang/ package for the first time, changing its character.
I would lean toward A since the error message is what surfaces in the scan report, and the QE team debugging the s390x failures would have had the answer immediately instead of the detective work in the Slack thread.
There was a problem hiding this comment.
{o,o}
|)__)
-"-"-
/ \
hoot hoot -- not a mass-produced code review rabbit
|
/agentic_review |
|
A few items not yet raised in the thread:
|
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bartoszmajsak, smith-xyz The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
big thanks on your help with these issues - @bartoszmajsak and @jiridanek |
Addresses PR openshift#337 review feedback: - slog.Debug in tryParseTable for rejected pclntab candidates - Fix test names (s390x fixture is CGO not PIE, fips is amd64) - Fixture invariant assertions: each test case declares expected ELF section and machine type, with rejectSections to ensure fallback paths are actually exercised - Integration tests for all new fixtures in TestRunLocalScan - ppc64le unit test (quantum=4 LE) - .data.rel.ro fallback unit test (magic scanning loop) Signed-off-by: Bartosz Majsak <bartosz.majsak@gmail.com>
Addresses PR openshift#337 review feedback: - slog.Debug in tryParseTable for rejected pclntab candidates - Fix test names (s390x fixture is CGO not PIE, fips is amd64) - Fixture invariant assertions: each test case declares expected ELF section and machine type, with rejectSections to ensure fallback paths are actually exercised - Integration tests for all new fixtures in TestRunLocalScan - ppc64le unit test (quantum=4 LE) - .data.rel.ro fallback unit test (magic scanning loop) Signed-off-by: Bartosz Majsak <bartosz.majsak@gmail.com>
pending openshift/check-payload#337 getting integrated into the konflux-test image, and then that image getting bumped in the fips check stepaction https://github.com/konflux-ci/build-definitions/blob/main/stepactions/fips-operator-check-step-action/0.1/fips-operator-check-step-action.yaml
Addresses PR openshift#337 review feedback: - slog.Debug in tryParseTable for rejected pclntab candidates - Fix test names (s390x fixture is CGO not PIE, fips is amd64) - Fixture invariant assertions: each test case declares expected ELF section and machine type, with rejectSections to ensure fallback paths are actually exercised - Integration tests for all new fixtures in TestRunLocalScan - ppc64le unit test (quantum=4 LE) - .data.rel.ro fallback unit test (magic scanning loop) Signed-off-by: Bartosz Majsak <bartosz.majsak@gmail.com>
Problem
ReadTablefails on s390x Go 1.24 binaries with "symbol table has no functions (pclntab may be corrupt)". The rawbytes.Indexmagic search finds a coincidental LE magic match (0xF1FFFFFF) inside.gopclntabdata before the real BE pclntab at offset 0, producing an empty symbol table.This is manifested by errors such as:
Prior to the guard added in #333, an empty symbol table seems to cause all FIPS validation to be silently skipped - the binary would appear to pass with no crypto checks performed.
Fix
Validates the 8-byte pclntab header after each magic match and parse-validates candidates to ensure non-zero functions. Also adds
.data.rel.ro.gopclntabto the section lookup for Go <= 1.25 internal PIE binaries and a defensive check for missing.textsection.Verified against
.data.rel.ro.gopclntablayout) - passes.gopclntablayout) - passesmake verify && make test)