Skip to content

Conversation

@leodido
Copy link
Contributor

@leodido leodido commented Nov 18, 2025

Overview

Makes SBOM generation deterministic by normalizing timestamps and UUIDs after Syft generates them. This enables reproducible builds where the same source produces the same artifact SHA256.

Part of https://linear.app/ona-team/issue/CLC-2096/prevent-attestation-overwrites-in-concurrent-builds
Part of https://linear.app/ona-team/issue/CLC-2097/improve-builds-determinism

Changes

Implementation (pkg/leeway/sbom.go)

  1. getGitCommitTimestamp(): Retrieves deterministic timestamp from git commit

    • Respects SOURCE_DATE_EPOCH environment variable for reproducible builds
    • Falls back to git commit timestamp
    • Logs warning if SOURCE_DATE_EPOCH is invalid
  2. generateDeterministicUUID(): Generates UUIDv5 from content

    • Uses RFC 4122 DNS namespace UUID
    • SHA-1 based for determinism
  3. normalizeCycloneDX(): Normalizes CycloneDX SBOM

    • Replaces timestamp with git commit time
    • Generates deterministic UUID from content
  4. normalizeSPDX(): Normalizes SPDX SBOM

    • Replaces timestamp with git commit time
    • Generates deterministic UUID in documentNamespace
  5. Updated writeSBOM(): Calls normalizers after SBOM generation

    • Fails fast if deterministic timestamp cannot be obtained
    • Provides clear error message with troubleshooting hints

Tests (pkg/leeway/sbom_normalize_test.go)

Comprehensive test coverage including:

  • UUID generation (determinism, format validation)
  • Git commit timestamp (SOURCE_DATE_EPOCH, git fallback)
  • CycloneDX normalization (happy path, malformed input, file errors)
  • SPDX normalization (happy path, malformed input, file errors)

Why These Changes

Problem

SBOMs contained non-deterministic fields:

  • Current timestamp (changes on every build)
  • Random UUIDs (different on every build)

This prevented:

  • Reproducible builds (same source → different artifact hash)
  • Reliable caching
  • Attestation verification

Solution

Post-process SBOMs after Syft generates them to replace non-deterministic values with deterministic ones based on git commit metadata.

Why Not Wait for Upstream?

  • Syft has an open PR for this feature, but it's not merged yet
  • Our post-processing approach works with any Syft version
  • Both approaches are compatible (we can remove normalization once Syft supports it natively)

Format-Specific Behavior

  • CycloneDX: Normalized (has timestamps and UUIDs)
  • SPDX: Normalized (has timestamps and UUIDs)
  • Syft JSON: No normalization needed (naturally deterministic - no timestamp field, no random UUIDs)

See anchore/syft#3931 for upstream discussion.

Testing

Unit Tests

go test -v ./pkg/leeway -run "TestGenerateDeterministicUUID|TestGetGitCommitTimestamp|TestNormalizeCycloneDX|TestNormalizeSPDX"

All tests pass with coverage for normal operation and error conditions.

Manual Verification

# Build twice from scratch
rm -rf /tmp/leeway-cache
leeway build //:helloworld -Dversion=test1 --save /tmp/build1.tar.gz

rm -rf /tmp/leeway-cache
leeway build //:helloworld -Dversion=test1 --save /tmp/build2.tar.gz

# Verify SBOMs are identical
tar -xzf /tmp/build1.tar.gz ./sbom.cdx.json && mv sbom.cdx.json sbom1.cdx.json
tar -xzf /tmp/build2.tar.gz ./sbom.cdx.json && mv sbom.cdx.json sbom2.cdx.json
diff sbom1.cdx.json sbom2.cdx.json  # No differences

# Verify timestamp uses git commit time
cat sbom1.cdx.json | jq '.metadata.timestamp'
# Output: "2025-11-17T09:11:11Z" (matches git commit timestamp)

# Verify UUIDs are deterministic
cat sbom1.cdx.json | jq '.serialNumber'
cat sbom2.cdx.json | jq '.serialNumber'
# Both output: "urn:uuid:7cfdf9a9-1f08-55b0-9fbb-23cee00912fe"

Error Handling

Fails fast with clear error message if deterministic timestamp cannot be obtained:

failed to get deterministic timestamp for SBOM normalization (commit: abc123): failed to get commit timestamp: ...
Ensure git is available and the repository is not a shallow clone, or set SOURCE_DATE_EPOCH environment variable

Breaking Changes

None. This is a non-breaking enhancement that makes builds more deterministic.

Co-authored-by: Ona no-reply@ona.com

- Add getGitCommitTimestamp() to retrieve deterministic timestamp from git
  commit or SOURCE_DATE_EPOCH environment variable
- Add generateDeterministicUUID() to create UUIDv5 from content
- Add normalizeCycloneDX() to normalize CycloneDX SBOM timestamps and UUIDs
- Add normalizeSPDX() to normalize SPDX SBOM timestamps and UUIDs
- Update writeSBOM() to call normalizers after SBOM generation
- Add comprehensive unit tests for all normalization functions
- Fix sbomSPDXFileExtension and sbomSyftFileExtension constants

This enables reproducible builds where the same source produces the same
artifact SHA256, supporting reliable caching and attestation verification.

Co-authored-by: Ona <no-reply@ona.com>
@leodido leodido self-assigned this Nov 18, 2025
Copy link
Member

@geropl geropl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes and tests LGTM! ✔️

Copy link
Contributor

@corneliusludmann corneliusludmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added 2 comments with improvements (nothing major). Approve to unblock.

leodido and others added 3 commits November 18, 2025 17:33
Add context parameter to getGitCommitTimestamp() to support cancellation:
- Use exec.CommandContext instead of exec.Command
- Allows git command to be terminated if build is cancelled
- Prevents orphaned git processes
- Respects build timeouts
- Follows standard Go pattern for cancellable operations

Also add test for context cancellation behavior.

Addresses review feedback from Cornelius on PR #281.

Co-authored-by: Ona <no-reply@ona.com>
Replace manual string manipulation with regex-based UUID matching:
- Use regex pattern to find and replace UUIDs
- Validate that a UUID exists before attempting replacement
- Handle edge cases: UUID at end, in middle, multiple UUIDs, no UUID
- Log warning if no UUID found in namespace (unexpected format)
- Add comprehensive test coverage for all edge cases

Benefits:
- More robust: validates UUID format before replacement
- Handles any UUID position in the namespace
- Fails gracefully with warning instead of silently
- Better test coverage for edge cases

Addresses review feedback from Cornelius on PR #281.

Co-authored-by: Ona <no-reply@ona.com>
Critical improvements to SPDX UUID replacement:

1. **Type validation**: Check that documentNamespace is a string
   - Prevents silent corruption when field has wrong type
   - Returns clear error message with actual type

2. **Empty validation**: Check that documentNamespace is not empty
   - Prevents invalid SBOM generation
   - Fails fast instead of silently continuing

3. **UUID validation**: Fail if no UUID found in namespace
   - Previously logged warning and continued (non-deterministic!)
   - Now returns error with helpful message
   - Alerts to potential Syft format changes

4. **Multiple UUID handling**: Log warning when multiple UUIDs found
   - Documents intentional behavior (replace all with same UUID)
   - Helps debugging unexpected formats

5. **Comprehensive edge case tests**:
   - documentNamespace is not a string
   - documentNamespace is empty
   - documentNamespace has no UUID
   - All cases now properly fail with clear errors

Benefits:
- Fails fast instead of silently producing non-deterministic builds
- Better error messages for debugging
- Catches unexpected SBOM format changes
- Prevents SBOM corruption

Addresses critical issues identified in PR review.

Co-authored-by: Ona <no-reply@ona.com>
@leodido leodido merged commit 3362511 into main Nov 18, 2025
6 checks passed
leodido added a commit that referenced this pull request Nov 18, 2025
Add context parameter to getGitCommitTimestamp() to support cancellation:
- Use exec.CommandContext instead of exec.Command
- Allows git command to be terminated if build is cancelled
- Prevents orphaned git processes
- Respects build timeouts
- Follows standard Go pattern for cancellable operations

Also add test for context cancellation behavior.

Addresses review feedback from Cornelius on PR #281.

Co-authored-by: Ona <no-reply@ona.com>
leodido added a commit that referenced this pull request Nov 18, 2025
Replace manual string manipulation with regex-based UUID matching:
- Use regex pattern to find and replace UUIDs
- Validate that a UUID exists before attempting replacement
- Handle edge cases: UUID at end, in middle, multiple UUIDs, no UUID
- Log warning if no UUID found in namespace (unexpected format)
- Add comprehensive test coverage for all edge cases

Benefits:
- More robust: validates UUID format before replacement
- Handles any UUID position in the namespace
- Fails gracefully with warning instead of silently
- Better test coverage for edge cases

Addresses review feedback from Cornelius on PR #281.

Co-authored-by: Ona <no-reply@ona.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants