Skip to content

feat: add SBOM scanner sidecar for memory-isolated SBOM generation#753

Merged
slashben merged 5 commits intomainfrom
feature/sbom-scanner-sidecar
Mar 23, 2026
Merged

feat: add SBOM scanner sidecar for memory-isolated SBOM generation#753
slashben merged 5 commits intomainfrom
feature/sbom-scanner-sidecar

Conversation

@slashben
Copy link
Contributor

@slashben slashben commented Mar 18, 2026

Summary

  • SBOM scanner sidecar: Moves Syft-based SBOM generation into a separate gRPC sidecar container within the node-agent pod, preventing OOM kills from crashing the main node-agent process
  • Shared helpers refactor: Extracts NodeSource, NodeResolver, and toSyftDocument into a syftutil subpackage to avoid import cycles between the main agent and the sidecar binary
  • Full fallback support: When the sidecar is not configured, the existing in-process Syft scanning continues to work; when the sidecar is configured but not ready, scans are deferred

Key changes

Area Files
Protobuf service pkg/sbomscanner/v1/proto/scanner.protoCreateSBOM and Health RPCs
gRPC server pkg/sbomscanner/v1/server.go — sequential processing, timeout, image-too-large handling
gRPC client pkg/sbomscanner/v1/client.go — reconnection backoff, health checks, crash detection
Scanner binary cmd/sbom-scanner/main.go — Unix socket listener, graceful shutdown
SbomManager pkg/sbommanager/v1/sbom_manager.go — sidecar/not-ready/fallback branching, retry tracking, TooLarge-with-memory-limit annotations
Observability pkg/sbommanager/v1/metrics.gosbom_scan_total, sbom_scan_duration_seconds, sbom_scanner_restarts_total, sbom_scanner_ready
Shared helpers pkg/sbommanager/v1/syftutil/ — extracted source.go, resolver.go, document.go, helpers.go
Dockerfile build/Dockerfile — builds and copies both node-agent and sbom-scanner binaries
Wiring cmd/main.go — creates scanner client from SBOM_SCANNER_SOCKET env var

Test plan

  • Unit tests for gRPC server (health, invalid input, image-too-large, cancelled context)
  • Integration tests (full scan lifecycle, simulated OOM/crash, image-too-large via sidecar, ready check)
  • Existing sbommanager tests pass (source_test, sbom_manager_test)
  • Syftutil package tests pass (toLayers test moved from sbommanager)
  • Full go build ./... passes
  • CI pipeline
  • Staging deployment with Helm chart changes (companion PR)

Made with Cursor

Summary by CodeRabbit

  • New Features

    • SBOM generation can be offloaded to a dedicated scanner sidecar with health/readiness checks, timeouts, and in-process fallback.
    • Runtime image now includes the sbom-scanner executable.
    • Prometheus metrics for scan counts, durations, readiness, and scanner restarts.
    • Configurable scanner memory limits, image/SBOM size limits, automatic crash detection and retry logic.
    • SBOM output conversion improved for downstream storage/consumption.
  • Tests

    • End-to-end and integration tests covering scanner lifecycle, error scenarios, resource limits, and readiness.

Move Syft-based SBOM scanning into a separate gRPC sidecar container
to prevent OOM kills from crashing the main node-agent process. The
sidecar communicates via Unix domain socket and follows the existing
ClamAV sidecar pattern.

- Define protobuf service (CreateSBOM, Health RPCs)
- Extract shared Syft helpers into syftutil subpackage
- Implement gRPC server with sequential scan processing
- Create sbom-scanner binary entrypoint
- Implement gRPC client with crash detection and backoff
- Add sidecar/not-ready/in-process fallback branching in SbomManager
- Add retry tracking with TooLarge-with-memory-limit annotations
- Add Prometheus metrics (scan_total, duration, restarts, ready)
- Update Dockerfile to build both node-agent and sbom-scanner binaries
- Add unit and integration tests

Made-with: Cursor
Signed-off-by: Ben <ben@armosec.io>
@coderabbitai
Copy link

coderabbitai bot commented Mar 18, 2026

📝 Walkthrough

Walkthrough

Adds an SBOM scanner sidecar (gRPC server + Unix-socket client), integrates it into the node-agent with a Syft in-process fallback, introduces protobufs, Prometheus metrics, syft->SyftDocument conversion utilities, tests, and updates the Docker build to include the sbom-scanner binary.

Changes

Cohort / File(s) Summary
Docker Build Configuration
build/Dockerfile
Adds build step to compile sbom-scanner and copies it into the final image.
Node Agent Entry
cmd/main.go
Initializes SBOM scanner client from SBOM_SCANNER_SOCKET env, logs fallback, passes client into CreateSbomManager (signature changed).
Sidecar Binary
cmd/sbom-scanner/main.go
New executable: starts gRPC server on UNIX socket, registers scanner server, handles SIGTERM/SIGINT for graceful shutdown.
SBOM Manager Core
pkg/sbommanager/v1/sbom_manager.go, pkg/sbommanager/v1/metrics.go
Adds scannerClient support, memory-limit annotation, retry logic, crash handling, and Prometheus metrics; delegates scans to sidecar when ready, falls back to Syft otherwise.
Syft Conversion Utilities
pkg/sbommanager/v1/syftutil/document.go, .../helpers.go, .../resolver.go, .../source.go, .../source_test.go
New syftutil package with ToSyftDocument and conversion helpers; package renamed from v1 to syftutil for these files; tests adjusted.
Scanner Protobufs & Gen Config
pkg/sbomscanner/v1/proto/scanner.proto, pkg/sbomscanner/v1/proto/buf.gen.yaml
Adds SBOMScanner protobuf service/messages and Buf generation config for Go codegen.
Scanner Types & API
pkg/sbomscanner/v1/types.go
Defines ScanRequest, ScanResult, SBOMScannerClient interface, and errors ErrScannerCrashed, ErrScannerNotReady.
Scanner Client
pkg/sbomscanner/v1/client.go
gRPC Unix-socket client with backoff health checks, CreateSBOM, Ready, and Close; maps certain gRPC errors to ErrScannerCrashed.
Scanner Server
pkg/sbomscanner/v1/server.go
Server implementation: CreateSBOM with image-status parsing, size checks, Syft invocation, timeout handling, and Health RPC; maps errors to gRPC codes.
Scanner Tests
pkg/sbomscanner/v1/server_test.go, pkg/sbomscanner/v1/integration_test.go
Unit and integration tests for health, lifecycle, simulated crash/OOM, image-too-large, context cancellation, and end-to-end CreateSBOM flows.

Sequence Diagram(s)

sequenceDiagram
    participant Agent as Node Agent
    participant ScannerClient as SBOMScannerClient
    participant GRPC as gRPC (Unix Socket)
    participant Scanner as SBOM Scanner Server
    participant Storage as Storage

    Agent->>ScannerClient: Ready()?
    alt scanner ready
        Agent->>ScannerClient: CreateSBOM(ScanRequest)
        ScannerClient->>GRPC: CreateSBOM RPC
        GRPC->>Scanner: Forward request
        Scanner->>Scanner: Build source & run Syft
        Scanner-->>GRPC: Response (sbom JSON + size)
        GRPC-->>ScannerClient: Response
        ScannerClient->>Agent: ScanResult (SyftDocument, size)
        Agent->>Storage: Persist SBOM
    else scanner not ready
        Agent->>Agent: Run in-process Syft generation (fallback)
        Agent->>Storage: Persist SBOM
    else scanner crashed
        Agent->>Agent: handleScannerCrash() -> retry or mark oversized
    end
Loading
sequenceDiagram
    participant Main as cmd/main.go
    participant ClientCtor as NewSBOMScannerClient
    participant GRPC as gRPC (Unix Socket)
    participant Server as Scanner Server

    Main->>ClientCtor: NewSBOMScannerClient(socketPath)
    ClientCtor->>GRPC: Dial unix socket
    ClientCtor->>Server: Health() with backoff retries
    alt Health OK
        Server-->>ClientCtor: Ready=true
        ClientCtor-->>Main: client ready
    else Health Failed
        ClientCtor-->>Main: error
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested labels

release

Suggested reviewers

  • matthyx

Poem

🐰 I hopped to a socket, soft and bright,

gRPC hummed while Syft took a bite,
Sidecar and agent twined in a scan,
Bytes and metadata shelved by hand,
Hooray — SBOMs safe through the night!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: add SBOM scanner sidecar for memory-isolated SBOM generation' directly and accurately summarizes the main change—introducing a separate gRPC sidecar for SBOM generation to isolate memory usage.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/sbom-scanner-sidecar

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (8)
pkg/sbomscanner/v1/proto/scanner.proto (1)

1-4: Consider fixing the Buf lint directory mismatch.

The static analysis tool reports that files with package sbomscanner.v1 should be in a sbomscanner/v1 directory relative to root, but this file is in pkg/sbomscanner/v1/proto.

Options to resolve:

  1. Add a buf.yaml with lint.ignore for PACKAGE_DIRECTORY_MATCH
  2. Move proto to proto/sbomscanner/v1/scanner.proto and adjust go_package

This won't affect functionality but will cause Buf lint failures in CI if enforced.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/sbomscanner/v1/proto/scanner.proto` around lines 1 - 4, The Buf lint
complaint arises because the proto package name "sbomscanner.v1" in
scanner.proto does not match its directory layout; fix by either adding a
buf.yaml at repo root that ignores PACKAGE_DIRECTORY_MATCH (add lint.ignore: -
PACKAGE_DIRECTORY_MATCH) or move the proto into a directory matching the package
(create proto/sbomscanner/v1/scanner.proto) and update the option go_package
string ("github.com/kubescape/node-agent/pkg/sbomscanner/v1/proto") if
relocating; reference the package declaration "package sbomscanner.v1" and the
option "option go_package" when making the change so Buf lint and go codegen
remain consistent.
pkg/sbommanager/v1/syftutil/helpers.go (1)

9-22: Consider replacing fmt.Println with structured logging.

The diskUsage function uses fmt.Println for error output (lines 13, 20), which bypasses structured logging and may not be captured in container logs. Since this code is being extracted into a shared utility, consider using a logger or returning errors.

♻️ Suggested improvement
-func diskUsage(path string) int64 {
+func diskUsage(path string) (int64, error) {
 	var s int64
 	dir, err := os.Open(path)
 	if err != nil {
-		fmt.Println(err)
-		return s
+		return 0, fmt.Errorf("open directory %s: %w", path, err)
 	}
 	defer dir.Close()

 	files, err := dir.Readdir(-1)
 	if err != nil {
-		fmt.Println(err)
-		return s
+		return 0, fmt.Errorf("read directory %s: %w", path, err)
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/sbommanager/v1/syftutil/helpers.go` around lines 9 - 22, The diskUsage
function currently prints errors with fmt.Println; change it to return errors
and let callers handle logging: update func diskUsage(path string) int64 to func
diskUsage(path string) (int64, error) (or add a logger parameter if preferred),
replace the fmt.Println(err) calls with early returns returning the zero size
and the err, and update any callers (e.g., callers of diskUsage) to handle/log
the returned error using the project's structured logger; ensure to update
references to diskUsage accordingly.
pkg/sbommanager/v1/sbom_manager.go (1)

99-102: Silent failure on invalid SCANNER_MEMORY_LIMIT value.

If the environment variable contains non-numeric data, ParseInt returns 0 and the error is silently ignored. Consider logging a warning so operators know their configuration is invalid.

♻️ Optional: Log warning on parse failure
 	var scannerMemLimit int64
 	if memStr, ok := os.LookupEnv("SCANNER_MEMORY_LIMIT"); ok {
-		scannerMemLimit, _ = strconv.ParseInt(memStr, 10, 64)
+		var parseErr error
+		scannerMemLimit, parseErr = strconv.ParseInt(memStr, 10, 64)
+		if parseErr != nil {
+			logger.L().Warning("invalid SCANNER_MEMORY_LIMIT value, using 0",
+				helpers.String("value", memStr),
+				helpers.Error(parseErr))
+		}
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/sbommanager/v1/sbom_manager.go` around lines 99 - 102, The code silently
ignores strconv.ParseInt errors when reading SCANNER_MEMORY_LIMIT into
scannerMemLimit; modify the os.LookupEnv/ParseInt block to check the ParseInt
error and log a warning (including the invalid value and the parse error)
instead of discarding it, falling back to the existing zero/default behavior;
reference the scannerMemLimit variable, the SCANNER_MEMORY_LIMIT env var, and
strconv.ParseInt so you update that specific parsing branch to emit a warning
via the package logger (or log.Printf) when parsing fails.
pkg/sbomscanner/v1/server_test.go (1)

74-89: Duplicate test helper: consider extracting to shared test utilities.

makeImageStatusJSON is nearly identical to makeTestImageStatus in integration_test.go. Consider extracting to a shared testutil_test.go file to reduce duplication.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/sbomscanner/v1/server_test.go` around lines 74 - 89, Duplicate test
helper functions makeImageStatusJSON and makeTestImageStatus should be
consolidated: create a new shared test file (e.g., testutil_test.go) and move
the common helper into it (export or keep unexported as needed), then replace
calls in pkg/sbomscanner/v1/server_test.go and integration_test.go to use that
single helper; ensure the helper signature and returned value remain the same
(function name, parameters, and json marshal behavior) and update imports/test
files so tests compile.
cmd/main.go (1)

385-392: Consider creating scanner client only when SBOM generation is enabled.

The scanner client is created unconditionally when SBOM_SCANNER_SOCKET is set, even if cfg.EnableSbomGeneration is false. While this doesn't cause issues (the client simply won't be used), it's slightly wasteful to establish a gRPC connection and perform health checks when SBOM generation is disabled.

♻️ Optional: Move client creation inside the SBOM-enabled block
-	// Create the SBOM scanner sidecar client (if configured)
-	var scannerClient sbomscanner.SBOMScannerClient
-	if socket, ok := os.LookupEnv("SBOM_SCANNER_SOCKET"); ok {
-		scannerClient, err = sbomscanner.NewSBOMScannerClient(socket)
-		if err != nil {
-			logger.L().Ctx(ctx).Warning("SBOM scanner sidecar not available, falling back to in-process scanning", helpers.Error(err))
-		}
-	}
-
 	// Create the SBOM manager
 	var sbomManager sbommanager.SbomManagerClient
 	if cfg.EnableSbomGeneration {
+		var scannerClient sbomscanner.SBOMScannerClient
+		if socket, ok := os.LookupEnv("SBOM_SCANNER_SOCKET"); ok {
+			scannerClient, err = sbomscanner.NewSBOMScannerClient(socket)
+			if err != nil {
+				logger.L().Ctx(ctx).Warning("SBOM scanner sidecar not available, falling back to in-process scanning", helpers.Error(err))
+			}
+		}
 		sbomManager, err = sbommanagerv1.CreateSbomManager(ctx, cfg, igK8sClient.RuntimeConfig.SocketPath, storageClient, k8sObjectCache, scannerClient)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/main.go` around lines 385 - 392, Only create the SBOM scanner sidecar
client when SBOM generation is enabled: guard the
sbomscanner.NewSBOMScannerClient call (and the SBOM_SCANNER_SOCKET env check)
with cfg.EnableSbomGeneration so you don't open the gRPC connection or run
health checks when disabled; keep scannerClient in the same scope but move the
socket lookup, NewSBOMScannerClient call, and the existing
logger.L().Ctx(ctx).Warning(...) error handling inside the
cfg.EnableSbomGeneration block to preserve behavior when enabled.
pkg/sbomscanner/v1/client.go (1)

43-59: Consider adding a maximum elapsed time to the startup health check backoff.

The exponential backoff uses default settings which allow retries for up to ~15 minutes. If the scanner sidecar fails to start, this could significantly delay node-agent startup before falling back to in-process scanning.

♻️ Suggested: Configure bounded backoff for faster failure detection
-	_, err = backoff.Retry(context.Background(), func() (struct{}, error) {
+	bo := backoff.NewExponentialBackOff()
+	bo.MaxElapsedTime = 30 * time.Second // Fail faster if scanner won't start
+	_, err = backoff.Retry(context.Background(), func() (struct{}, error) {
 		ctx, cancel := context.WithTimeout(context.Background(), healthCheckTimeout)
 		defer cancel()
 		resp, err := c.client.Health(ctx, &pb.HealthRequest{})
 		if err != nil {
 			return struct{}{}, fmt.Errorf("health check failed: %w", err)
 		}
 		if !resp.Ready {
 			return struct{}{}, fmt.Errorf("scanner not ready")
 		}
 		return struct{}{}, nil
-	}, backoff.WithBackOff(backoff.NewExponentialBackOff()))
+	}, backoff.WithBackOff(bo))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/sbomscanner/v1/client.go` around lines 43 - 59, The exponential backoff
used in the startup health check (backoff.Retry with
backoff.NewExponentialBackOff()) is unbounded and can take ~15 minutes; create
and configure an ExponentialBackOff instance with a bounded MaxElapsedTime
(e.g., a short configurable duration constant) and pass that instance into
backoff.WithBackOff instead of calling backoff.NewExponentialBackOff() inline;
update the retry call around backoff.Retry(...) and use the existing
healthCheckTimeout/timeout context as before so the health check fails fast when
the sidecar doesn't start.
pkg/sbomscanner/v1/integration_test.go (1)

39-62: Consider using t.Cleanup() for automatic test cleanup.

The test helper returns cleanup responsibilities to callers. Using t.Cleanup() would ensure resources are cleaned up even if a test panics or forgets to defer cleanup.

♻️ Optional: Use t.Cleanup for automatic resource management
 func startIntegrationServer(t *testing.T) (SBOMScannerClient, *grpc.Server, string) {
 	t.Helper()
 	dir := t.TempDir()
 	sock := filepath.Join(dir, "scanner.sock")

 	lis, err := net.Listen("unix", sock)
 	require.NoError(t, err)

 	srv := grpc.NewServer()
 	pb.RegisterSBOMScannerServer(srv, NewScannerServer())
 	go srv.Serve(lis)

 	conn, err := grpc.NewClient("unix://"+sock,
 		grpc.WithTransportCredentials(insecure.NewCredentials()),
 	)
 	require.NoError(t, err)

 	client := &sbomScannerClient{
 		conn:   conn,
 		client: pb.NewSBOMScannerClient(conn),
 	}

+	t.Cleanup(func() {
+		client.Close()
+		srv.Stop()
+		os.Remove(sock)
+	})
+
-	return client, srv, sock
+	return client, srv, sock // callers can still use srv for specific test scenarios
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/sbomscanner/v1/integration_test.go` around lines 39 - 62, The helper
startIntegrationServer currently returns resources for callers to clean up;
instead register automatic cleanup with t.Cleanup inside startIntegrationServer
so resources are always released: after creating lis, register t.Cleanup to
close lis (lis.Close()), after creating srv and starting it, register t.Cleanup
to stop the server (srv.Stop()), and after creating conn register t.Cleanup to
close the client connection (conn.Close()); update startIntegrationServer to
perform these t.Cleanup registrations so callers no longer need to remember to
defer cleanup.
pkg/sbommanager/v1/syftutil/document.go (1)

18-19: Silently ignoring JSON marshal errors may hide issues.

Lines 18-19 discard marshal errors for configuration and metadata. While these fields are likely always valid JSON, silently ignoring errors could make debugging difficult if malformed data ever appears.

Consider logging at debug level if marshaling fails, or at minimum document why errors are safely ignored.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/sbommanager/v1/syftutil/document.go` around lines 18 - 19, The code
currently discards errors from json.Marshal for configuration and metadata
(configuration, _ := json.Marshal(doc.Descriptor.Configuration); metadata, _ :=
json.Marshal(doc.Source.Metadata)); update this to capture the error variables,
check them, and log a debug-level message (including which field and the error)
instead of ignoring (e.g., configuration, err := json.Marshal(...); if err !=
nil { logger.Debugf("syft document: failed to marshal Descriptor.Configuration:
%v", err); configuration = []byte("{}") } ), or otherwise handle the error
consistently (return it or set a safe default) so malformed data is observable;
reference doc.Descriptor.Configuration and doc.Source.Metadata when implementing
the check and log.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/sbommanager/v1/sbom_manager.go`:
- Around line 72-74: The scanRetries map is accessed concurrently (e.g., in
handleScannerCrash and processContainer) and may race when worker pool
concurrency increases; protect it by replacing scanRetries map[string]int with a
concurrency-safe structure or guarding accesses with a mutex: either switch to
sync.Map (update uses Load, Store, Delete) or add a sync.Mutex (e.g.,
scanRetriesMu) and wrap all reads/writes in handleScannerCrash,
processContainer, and any other funcs that touch scanRetries to hold the lock;
update all references to use the chosen synchronization so map accesses are
safe.
- Around line 477-497: scanRetries entries are never removed on successful scans
or other terminal non-crash failures, causing a memory leak; remove the entry
when a scan reaches any terminal state. Specifically, in the success path where
sbomScanTotal.WithLabelValues("success").Inc() /
sbomScanDuration.WithLabelValues("success").Observe(...) and syftDoc =
result.SyftDocument are set, add delete(s.scanRetries, sbomName) to clear the
counter; likewise, ensure you delete(s.scanRetries, sbomName) in other
terminal/error branches (any code that marks the SBOM as finished/failed) so
only the handleScannerCrash path relies on its existing deletion when retryCount
>= maxScanRetries.

In `@pkg/sbomscanner/v1/server.go`:
- Around line 67-70: The error handling in the RPC handler currently maps any
ctx.Err() to DeadlineExceeded; update the post-error branch in the handler (the
function containing the shown block, e.g., CreateSBOM/Scan handler) to inspect
ctx.Err() and return status.Error(codes.Canceled, ...) when ctx.Err() ==
context.Canceled and status.Error(codes.DeadlineExceeded, ...) when ctx.Err() ==
context.DeadlineExceeded; use context.Canceled and context.DeadlineExceeded
comparisons to distinguish the two cases and keep the existing error
messages/context in the returned status.Error calls.

---

Nitpick comments:
In `@cmd/main.go`:
- Around line 385-392: Only create the SBOM scanner sidecar client when SBOM
generation is enabled: guard the sbomscanner.NewSBOMScannerClient call (and the
SBOM_SCANNER_SOCKET env check) with cfg.EnableSbomGeneration so you don't open
the gRPC connection or run health checks when disabled; keep scannerClient in
the same scope but move the socket lookup, NewSBOMScannerClient call, and the
existing logger.L().Ctx(ctx).Warning(...) error handling inside the
cfg.EnableSbomGeneration block to preserve behavior when enabled.

In `@pkg/sbommanager/v1/sbom_manager.go`:
- Around line 99-102: The code silently ignores strconv.ParseInt errors when
reading SCANNER_MEMORY_LIMIT into scannerMemLimit; modify the
os.LookupEnv/ParseInt block to check the ParseInt error and log a warning
(including the invalid value and the parse error) instead of discarding it,
falling back to the existing zero/default behavior; reference the
scannerMemLimit variable, the SCANNER_MEMORY_LIMIT env var, and strconv.ParseInt
so you update that specific parsing branch to emit a warning via the package
logger (or log.Printf) when parsing fails.

In `@pkg/sbommanager/v1/syftutil/document.go`:
- Around line 18-19: The code currently discards errors from json.Marshal for
configuration and metadata (configuration, _ :=
json.Marshal(doc.Descriptor.Configuration); metadata, _ :=
json.Marshal(doc.Source.Metadata)); update this to capture the error variables,
check them, and log a debug-level message (including which field and the error)
instead of ignoring (e.g., configuration, err := json.Marshal(...); if err !=
nil { logger.Debugf("syft document: failed to marshal Descriptor.Configuration:
%v", err); configuration = []byte("{}") } ), or otherwise handle the error
consistently (return it or set a safe default) so malformed data is observable;
reference doc.Descriptor.Configuration and doc.Source.Metadata when implementing
the check and log.

In `@pkg/sbommanager/v1/syftutil/helpers.go`:
- Around line 9-22: The diskUsage function currently prints errors with
fmt.Println; change it to return errors and let callers handle logging: update
func diskUsage(path string) int64 to func diskUsage(path string) (int64, error)
(or add a logger parameter if preferred), replace the fmt.Println(err) calls
with early returns returning the zero size and the err, and update any callers
(e.g., callers of diskUsage) to handle/log the returned error using the
project's structured logger; ensure to update references to diskUsage
accordingly.

In `@pkg/sbomscanner/v1/client.go`:
- Around line 43-59: The exponential backoff used in the startup health check
(backoff.Retry with backoff.NewExponentialBackOff()) is unbounded and can take
~15 minutes; create and configure an ExponentialBackOff instance with a bounded
MaxElapsedTime (e.g., a short configurable duration constant) and pass that
instance into backoff.WithBackOff instead of calling
backoff.NewExponentialBackOff() inline; update the retry call around
backoff.Retry(...) and use the existing healthCheckTimeout/timeout context as
before so the health check fails fast when the sidecar doesn't start.

In `@pkg/sbomscanner/v1/integration_test.go`:
- Around line 39-62: The helper startIntegrationServer currently returns
resources for callers to clean up; instead register automatic cleanup with
t.Cleanup inside startIntegrationServer so resources are always released: after
creating lis, register t.Cleanup to close lis (lis.Close()), after creating srv
and starting it, register t.Cleanup to stop the server (srv.Stop()), and after
creating conn register t.Cleanup to close the client connection (conn.Close());
update startIntegrationServer to perform these t.Cleanup registrations so
callers no longer need to remember to defer cleanup.

In `@pkg/sbomscanner/v1/proto/scanner.proto`:
- Around line 1-4: The Buf lint complaint arises because the proto package name
"sbomscanner.v1" in scanner.proto does not match its directory layout; fix by
either adding a buf.yaml at repo root that ignores PACKAGE_DIRECTORY_MATCH (add
lint.ignore: - PACKAGE_DIRECTORY_MATCH) or move the proto into a directory
matching the package (create proto/sbomscanner/v1/scanner.proto) and update the
option go_package string
("github.com/kubescape/node-agent/pkg/sbomscanner/v1/proto") if relocating;
reference the package declaration "package sbomscanner.v1" and the option
"option go_package" when making the change so Buf lint and go codegen remain
consistent.

In `@pkg/sbomscanner/v1/server_test.go`:
- Around line 74-89: Duplicate test helper functions makeImageStatusJSON and
makeTestImageStatus should be consolidated: create a new shared test file (e.g.,
testutil_test.go) and move the common helper into it (export or keep unexported
as needed), then replace calls in pkg/sbomscanner/v1/server_test.go and
integration_test.go to use that single helper; ensure the helper signature and
returned value remain the same (function name, parameters, and json marshal
behavior) and update imports/test files so tests compile.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2487321b-d149-4b24-a51c-d9f527930e73

📥 Commits

Reviewing files that changed from the base of the PR and between f673d45 and 6b3bf16.

⛔ Files ignored due to path filters (2)
  • pkg/sbomscanner/v1/proto/scanner.pb.go is excluded by !**/*.pb.go
  • pkg/sbomscanner/v1/proto/scanner_grpc.pb.go is excluded by !**/*.pb.go
📒 Files selected for processing (17)
  • build/Dockerfile
  • cmd/main.go
  • cmd/sbom-scanner/main.go
  • pkg/sbommanager/v1/metrics.go
  • pkg/sbommanager/v1/sbom_manager.go
  • pkg/sbommanager/v1/syftutil/document.go
  • pkg/sbommanager/v1/syftutil/helpers.go
  • pkg/sbommanager/v1/syftutil/resolver.go
  • pkg/sbommanager/v1/syftutil/source.go
  • pkg/sbommanager/v1/syftutil/source_test.go
  • pkg/sbomscanner/v1/client.go
  • pkg/sbomscanner/v1/integration_test.go
  • pkg/sbomscanner/v1/proto/buf.gen.yaml
  • pkg/sbomscanner/v1/proto/scanner.proto
  • pkg/sbomscanner/v1/server.go
  • pkg/sbomscanner/v1/server_test.go
  • pkg/sbomscanner/v1/types.go

Signed-off-by: Ben <ben@armosec.io>
…or handling for canceled and timed-out scans

Signed-off-by: Ben <ben@armosec.io>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
pkg/sbommanager/v1/sbom_manager.go (2)

371-379: ⚠️ Potential issue | 🟡 Minor

Clear stale crash-retry state on non-crash error returns.

After prior crashes, Lines 371-379 return on non-crash errors without cleaning scanRetries[sbomName], so entries can linger indefinitely for images that never reach success/max-retries.

🧹 Proposed fix
 		sbomScanTotal.WithLabelValues("error").Inc()
 		sbomScanDuration.WithLabelValues("error").Observe(scanDuration)
+		delete(s.scanRetries, sbomName)
 		logger.L().Ctx(s.ctx).Error("SbomManager - sidecar scan failed",
 			helpers.Error(scanErr),
 			helpers.String("namespace", notif.Container.K8s.Namespace),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/sbommanager/v1/sbom_manager.go` around lines 371 - 379, The error-return
path in SbomManager leaves stale entries in the scanRetries map for sbomName
when a non-crash error occurs; before returning in the non-crash error block
(the branch that logs "SbomManager - sidecar scan failed"), remove the retry
state by deleting scanRetries[sbomName] (or resetting its counter) so retries do
not linger for that image, and ensure you perform the deletion under the same
synchronization used for scanRetries (the same lock/guard used elsewhere in the
SbomManager methods).

99-102: ⚠️ Potential issue | 🟡 Minor

Handle invalid SCANNER_MEMORY_LIMIT explicitly instead of silently coercing to 0.

Lines 99-102 ignore strconv.ParseInt errors. A malformed env var silently disables memory-limit-based behavior and makes retries/annotations harder to reason about.

🔧 Proposed fix
 	var scannerMemLimit int64
 	if memStr, ok := os.LookupEnv("SCANNER_MEMORY_LIMIT"); ok {
-		scannerMemLimit, _ = strconv.ParseInt(memStr, 10, 64)
+		parsed, parseErr := strconv.ParseInt(memStr, 10, 64)
+		if parseErr != nil || parsed <= 0 {
+			logger.L().Ctx(ctx).Warning("SbomManager - invalid SCANNER_MEMORY_LIMIT, ignoring",
+				helpers.String("value", memStr),
+				helpers.Error(parseErr))
+		} else {
+			scannerMemLimit = parsed
+		}
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/sbommanager/v1/sbom_manager.go` around lines 99 - 102, The code silently
ignores strconv.ParseInt errors when reading SCANNER_MEMORY_LIMIT into
scannerMemLimit; change the os.LookupEnv/strconv.ParseInt block to check the
parse error and handle it explicitly: call strconv.ParseInt into a temp value
and if err != nil then return (or propagate) a descriptive error (or at minimum
log it via the package logger and fail fast) instead of assigning zero,
otherwise assign scannerMemLimit = parsed value; reference the scannerMemLimit
variable, os.LookupEnv and strconv.ParseInt in your change so reviewers can find
the fix.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/sbommanager/v1/sbom_manager.go`:
- Around line 491-498: The current branch finalizes a TooLarge terminal status
but ignores errors from s.storageClient.ReplaceSBOM and unconditionally deletes
retry state (s.scanRetries[sbomName]); change this so you capture the error
returned by ReplaceSBOM when updating wipSbom (setting
Annotations[helpersv1.StatusMetadataKey]=helpersv1.TooLarge,
ScannerMemoryLimitAnnotation, and clearing Spec to v1beta1.SBOMSyftSpec{}), and
only remove the retry entry from s.scanRetries if ReplaceSBOM succeeds; if
ReplaceSBOM returns an error, log/propagate the error and leave s.scanRetries
intact so retry state persists.

In `@pkg/sbomscanner/v1/server.go`:
- Around line 24-28: The CreateSBOM handler currently calls s.mu.Lock() before
creating a context with timeout, so mutex wait is unbounded; replace the plain
mutex with a context-aware semaphore and acquire it after creating the request
context: change scannerServer.mu from sync.Mutex to a semaphore channel (e.g.,
sem chan struct{} initialized with one token), in CreateSBOM call ctx, cancel :=
context.WithTimeout(...) before trying to acquire the gate, then use a select to
either receive the token from s.sem (acquire) or return on ctx.Done()
(timeout/cancel), and defer returning the token (release); apply the same
pattern for other methods that used s.mu.Lock()/Unlock() so lock acquisition is
protected by the request context and respects timeouts.

---

Duplicate comments:
In `@pkg/sbommanager/v1/sbom_manager.go`:
- Around line 371-379: The error-return path in SbomManager leaves stale entries
in the scanRetries map for sbomName when a non-crash error occurs; before
returning in the non-crash error block (the branch that logs "SbomManager -
sidecar scan failed"), remove the retry state by deleting scanRetries[sbomName]
(or resetting its counter) so retries do not linger for that image, and ensure
you perform the deletion under the same synchronization used for scanRetries
(the same lock/guard used elsewhere in the SbomManager methods).
- Around line 99-102: The code silently ignores strconv.ParseInt errors when
reading SCANNER_MEMORY_LIMIT into scannerMemLimit; change the
os.LookupEnv/strconv.ParseInt block to check the parse error and handle it
explicitly: call strconv.ParseInt into a temp value and if err != nil then
return (or propagate) a descriptive error (or at minimum log it via the package
logger and fail fast) instead of assigning zero, otherwise assign
scannerMemLimit = parsed value; reference the scannerMemLimit variable,
os.LookupEnv and strconv.ParseInt in your change so reviewers can find the fix.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 257b1544-5f22-402e-aa0e-23195a640dc5

📥 Commits

Reviewing files that changed from the base of the PR and between 6b3bf16 and c3f3a41.

📒 Files selected for processing (3)
  • cmd/sbom-scanner/main.go
  • pkg/sbommanager/v1/sbom_manager.go
  • pkg/sbomscanner/v1/server.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • cmd/sbom-scanner/main.go

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
pkg/sbommanager/v1/sbom_manager.go (1)

491-497: ⚠️ Potential issue | 🟠 Major

Preserve retry state when persisting TooLarge fails.

Line 496 ignores ReplaceSBOM errors and Line 497 always deletes retry state; that can drop recovery context without persisting terminal status.

🔧 Proposed fix
 	if retryCount >= maxScanRetries {
 		delete(wipSbom.Annotations, NodeNameMetadataKey)
 		wipSbom.Annotations[helpersv1.StatusMetadataKey] = helpersv1.TooLarge
 		wipSbom.Annotations[ScannerMemoryLimitAnnotation] = fmt.Sprintf("%d", s.scannerMemLimit)
 		wipSbom.Spec = v1beta1.SBOMSyftSpec{}
-		_, _ = s.storageClient.ReplaceSBOM(wipSbom)
+		if _, err := s.storageClient.ReplaceSBOM(wipSbom); err != nil {
+			logger.L().Error("SbomManager - failed to persist TooLarge SBOM after scanner crashes",
+				helpers.Error(err),
+				helpers.String("namespace", notif.Container.K8s.Namespace),
+				helpers.String("pod", notif.Container.K8s.PodName),
+				helpers.String("container", notif.Container.K8s.ContainerName),
+				helpers.String("sbomName", sbomName))
+			return
+		}
 		delete(s.scanRetries, sbomName)
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/sbommanager/v1/sbom_manager.go` around lines 491 - 497, The current block
ignores the error returned by s.storageClient.ReplaceSBOM and unconditionally
removes retry state via delete(s.scanRetries, sbomName), which can lose recovery
context if persistence fails; update the code in the retry-finalization branch
(references: retryCount, maxScanRetries, wipSbom, s.storageClient.ReplaceSBOM,
delete(s.scanRetries, sbomName)) to check and handle the ReplaceSBOM error —
attempt the ReplaceSBOM call, if it returns an error then log/return the error
and do NOT delete s.scanRetries[sbomName] (preserve retry state); only remove
the retry entry after ReplaceSBOM succeeds. Ensure any logging includes context
(sbomName and the error) so operators can act.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/sbommanager/v1/sbom_manager.go`:
- Around line 351-360: The CreateSBOM RPC is called with s.ctx (a background
context) which has no deadline; wrap s.ctx with a client-side timeout using
context.WithTimeout before calling s.scannerClient.CreateSBOM (and defer cancel)
and pass that derived ctx into CreateSBOM instead of s.ctx so transport-level
stalls are bounded; keep the existing Timeout field in sbomscanner.ScanRequest
but enforce an overall network/transport deadline in the client.

---

Duplicate comments:
In `@pkg/sbommanager/v1/sbom_manager.go`:
- Around line 491-497: The current block ignores the error returned by
s.storageClient.ReplaceSBOM and unconditionally removes retry state via
delete(s.scanRetries, sbomName), which can lose recovery context if persistence
fails; update the code in the retry-finalization branch (references: retryCount,
maxScanRetries, wipSbom, s.storageClient.ReplaceSBOM, delete(s.scanRetries,
sbomName)) to check and handle the ReplaceSBOM error — attempt the ReplaceSBOM
call, if it returns an error then log/return the error and do NOT delete
s.scanRetries[sbomName] (preserve retry state); only remove the retry entry
after ReplaceSBOM succeeds. Ensure any logging includes context (sbomName and
the error) so operators can act.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 693f55df-2d30-4ab3-b1d7-01ed92347f75

📥 Commits

Reviewing files that changed from the base of the PR and between c3f3a41 and 4fbfe64.

📒 Files selected for processing (1)
  • pkg/sbommanager/v1/sbom_manager.go

… handling

- Introduced a context timeout for SBOM creation to ensure it adheres to server-side limits.
- Added error handling for SBOM replacement to log failures when marking SBOMs as TooLarge after scanner crashes.

Signed-off-by: Ben <ben@armosec.io>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
pkg/sbomscanner/v1/server.go (1)

36-93: req.MaxSbomSize is defined in the proto but never enforced.

The CreateSBOMRequest includes a MaxSbomSize field (per proto definition), but the server doesn't validate the generated SBOM size against this limit before returning. If callers expect server-side enforcement of SBOM size limits, this could lead to unexpected behavior.

Consider adding a check after serialization:

♻️ Proposed fix to enforce max SBOM size
 	docBytes, err := json.Marshal(doc)
 	if err != nil {
 		return nil, status.Errorf(codes.Internal, "failed to serialize SBOM: %v", err)
 	}
+
+	if req.MaxSbomSize > 0 && int32(len(docBytes)) > req.MaxSbomSize {
+		return nil, status.Errorf(codes.FailedPrecondition, "SBOM size %d exceeds maximum %d", len(docBytes), req.MaxSbomSize)
+	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/sbomscanner/v1/server.go` around lines 36 - 93, CreateSBOM currently
ignores req.MaxSbomSize; after serializing the SBOM to docBytes in CreateSBOM,
enforce the limit by checking if req.MaxSbomSize > 0 and if int64(len(docBytes))
exceeds req.MaxSbomSize then return an appropriate gRPC error (e.g.,
status.Error(codes.FailedPrecondition, "sbom size exceeds maximum allowed
size")). Place this check immediately after the json.Marshal(doc) error handling
and before logger.L().Info and the successful pb.CreateSBOMResponse return so
the response never returns an oversized SBOM.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/sbomscanner/v1/server.go`:
- Around line 77-78: The call to syftutil.ToSyftDocument after v1beta1.StripSBOM
can silently drop json.Marshal errors; update syftutil.ToSyftDocument to surface
failures (change signature to return (Document, error)) or at minimum log
warnings when internal marshals fail, and then update this call site
(syftutil.ToSyftDocument in server.go) to handle the returned error (log and
return a 500 / fail SBOM generation) so partial/zero-value documents aren't
propagated; reference the ToSyftDocument function and v1beta1.StripSBOM usage
when making the change.
- Around line 62-64: When req.EnableEmbeddedSboms is true the call to
cfg.WithCatalogers(...) uses the builder pattern and returns an updated config
that must be assigned back to cfg; change the call that uses
pkgcataloging.NewCatalogerReference(sbomcataloger.NewCataloger(),
[]string{pkgcataloging.ImageTag}) so the result is stored in cfg (i.e., cfg =
cfg.WithCatalogers(...)) to ensure the embedded SBOM cataloger is actually added
and used.

---

Nitpick comments:
In `@pkg/sbomscanner/v1/server.go`:
- Around line 36-93: CreateSBOM currently ignores req.MaxSbomSize; after
serializing the SBOM to docBytes in CreateSBOM, enforce the limit by checking if
req.MaxSbomSize > 0 and if int64(len(docBytes)) exceeds req.MaxSbomSize then
return an appropriate gRPC error (e.g., status.Error(codes.FailedPrecondition,
"sbom size exceeds maximum allowed size")). Place this check immediately after
the json.Marshal(doc) error handling and before logger.L().Info and the
successful pb.CreateSBOMResponse return so the response never returns an
oversized SBOM.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f6ee7ce3-ce57-4767-8bc7-2ef540e4c6cc

📥 Commits

Reviewing files that changed from the base of the PR and between 4fbfe64 and 4f97a14.

📒 Files selected for processing (2)
  • pkg/sbommanager/v1/sbom_manager.go
  • pkg/sbomscanner/v1/server.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/sbommanager/v1/sbom_manager.go

@matthyx matthyx added the release Create release label Mar 23, 2026
@slashben slashben merged commit 1a85870 into main Mar 23, 2026
27 checks passed
@slashben slashben deleted the feature/sbom-scanner-sidecar branch March 23, 2026 12:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release Create release

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

2 participants