Skip to content

javacruft/sbomfetch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

chainguard-source

A Go program that fetches source code referenced in Chainguard SBOMs. It parses SPDX-formatted SBOM (Software Bill of Materials) files to download and extract source code archives referenced in the downloadLocation fields, and also fetches melange configuration files for each package. It supports both local SBOM files and direct container image references that retrieve SBOMs from Sigstore attestations.

Project Structure

chainguard-source/
├── cmd/
│   └── chainguard-source/     # Main entrypoint
│       └── main.go
├── internal/
│   ├── cache/                 # Cache management
│   ├── sbom/                  # SBOM parsing and attestations
│   ├── download/              # Download functionality
│   ├── melange/               # Melange config handling
│   └── archive/               # Archive extraction
├── go.mod
├── go.sum
└── README.md

Features

  • SPDX SBOM Parsing: Reads SPDX 2.3 formatted JSON files
  • Container Image Support: Directly fetch SBOMs from container images via Sigstore attestations
  • Melange Config Fetching: Downloads melange configuration files from various repositories (wolfi-dev/os, chainguard-dev/enterprise-packages, chainguard-dev/extra-packages)
  • Relationship Analysis: Uses GENERATED_FROM relationships to map source packages to APK packages
  • Concurrent Downloads: Configurable parallel downloading (default: 4 concurrent)
  • Multi-format Archive Support: Handles .tar.gz, .tar.xz, .tar.bz2, and .tgz files
  • Optional Extraction: Extract archives with -x flag to a sources/ subdirectory
  • APK Traceability: Shows which APK package each source archive belongs to
  • Dry Run Mode: Preview what would be downloaded without actually downloading
  • Resource Warnings: Warns about disk and network usage with bypass option
  • Security: Includes path traversal protection during extraction
  • Organized Output: Creates structured directory layout with subdirectories for different content types
  • On-disk Cache: Caches downloaded archives in ~/.cache/chainguard-source to avoid re-downloading

Prerequisites

  • Go 1.21 or later
  • Internet connection for downloading archives

Installation

Build from source:

go build -o chainguard-source ./cmd/chainguard-source

Usage

chainguard-source [OPTIONS] [ -i|--image IMAGE[:TAG] | -s|--sbom SBOM.spdx.json ]

Options

  • -i, --image IMAGE[:TAG]: Target image to fetch sources
  • -s, --sbom SBOM.spdx.json: Target SBOM file to process
  • -a, --arch [amd64|arm64]: Architecture (default: amd64)
  • -d, --dry-run: Dry run - skip actual downloads
  • -x, --extract: Extract downloaded archives to sources/ subdirectory
  • -y, --yes: Automatically answer yes to resource warnings
  • --github-token TOKEN: GitHub personal access token for private repositories
  • --concurrency N: Number of concurrent downloads (default: 4)

Environment Variables

  • GITHUB_TOKEN: GitHub personal access token (used if --github-token is not provided)

Examples

# Fetch sources from a container image
chainguard-source -i cgr.dev/chainguard/python:latest

# Process a local SBOM file
chainguard-source -s sbom.spdx.json

# Dry run to see what would be downloaded
chainguard-source --image cgr.dev/chainguard/unbound:latest --dry-run

# Skip resource warning prompt
chainguard-source -i cgr.dev/chainguard/nginx:latest -y

# Specify architecture for multi-arch images
chainguard-source -i cgr.dev/chainguard/nginx:latest -a arm64

# Use more concurrent downloads
chainguard-source -i cgr.dev/chainguard/haproxy:latest --concurrency 8

# Download and extract archives
chainguard-source -i cgr.dev/chainguard/python:latest -x

# Fetch sources including private melange configs with GitHub token
chainguard-source -i cgr.dev/chainguard-private/myapp:latest --github-token ghp_xxxxx

# Or use environment variable
export GITHUB_TOKEN=ghp_xxxxx
chainguard-source -i cgr.dev/chainguard-private/myapp:latest

Output Structure

The program creates a hierarchical sources/ directory structure that mirrors the image registry path:

sources/
└── cgr.dev/                     # Registry domain
    └── chainguard/               # Organization
        └── static:latest/        # Image:tag
            ├── sboms/            # SBOM files
            │   └── static:latest.sbom.spdx.json
            ├── melange-configs/  # Melange configuration files
            │   ├── package1.yaml
            │   ├── package2.yaml
            │   └── ...
            ├── artifacts/        # Downloaded source archives
            │   ├── source1.tar.gz
            │   ├── source2.tar.xz
            │   └── ...
            └── sources/          # Extracted source code (only with -x flag)
                ├── source1-extracted/
                ├── source2-extracted/
                └── ...

For local SBOM files, a flat structure is used:

sources/
└── <sbom-filename>/             # Named after the SBOM file
    ├── sboms/
    ├── melange-configs/
    ├── artifacts/
    └── ...

Sample Output

🔍 Retrieving SBOM from container image: cgr.dev/chainguard/unbound:latest
💾 SBOM saved to: sources/cgr.dev/chainguard/unbound:latest/sboms/unbound:latest.sbom.spdx.json
🔍 Found 8 populated downloadLocation URLs
🔍 Found 12 packages with melange configs to fetch
🚀 Starting download (1/20): gcc-15.1.0.tar.xz [libgcc]
🚀 Fetching melange config (9/20): unbound
📦 Downloaded (1/20): gcc-15.1.0.tar.xz [libgcc]
📄 Saved melange config: unbound.yaml
...
🎁 Download complete. Files saved to: sources/cgr.dev/chainguard/unbound:latest

📊 === EXECUTION SUMMARY ===
🚀 Downloads: 20 successful, 0 failed (total: 20)
💡 Tip: Use -x or --extract flag to extract archives

# With extraction enabled:
🗄️ Extracting 8 archives to sources/ subdirectory...
📤 Extracting (1/8): gcc-15.1.0.tar.xz [libgcc]
🎉 Extracted: gcc-15.1.0.tar.xz [libgcc]
...
🎆 Extraction complete. Archives extracted to: sources/cgr.dev/chainguard/unbound:latest/sources

GitHub Authentication

For accessing private repositories (chainguard-dev/enterprise-packages, chainguard-dev/extra-packages), you'll need a GitHub personal access token:

  1. Create a token: Go to GitHub Settings → Developer settings → Personal access tokens
  2. Required scopes: Select repo scope for full access to private repositories
  3. Usage: Provide via --github-token flag or GITHUB_TOKEN environment variable

The tool uses the GitHub API to fetch melange configurations, which:

  • Supports authentication for private repositories
  • Provides helpful error messages when authentication is needed
  • Respects API rate limits (higher limits with authentication)

How It Works

  1. Input Processing: Accepts either local SBOM files or container image references
  2. Resource Warning: Displays disk space and warns about network/disk usage (unless -y flag used)
  3. SBOM Retrieval: For container images, fetches SBOM from Sigstore attestations
  4. SBOM Parsing: Reads the SPDX JSON file and parses packages, relationships, and external references
  5. Package Analysis:
    • Maps source packages to APK packages via GENERATED_FROM relationships
    • Extracts package build configuration references from external refs
  6. Download Planning: Identifies downloadable content:
    • Source archives from downloadLocation fields
    • Melange configs from referenced GitHub repositories at specific commits
  7. Cache Check: Before downloading, checks ~/.cache/chainguard-source for cached archives (indexed by SHA256 of URL)
  8. Concurrent Download: Downloads archives and configs in parallel using configurable worker pools
  9. Artifact Organization: Stores downloaded files in appropriate subdirectories
  10. Automatic Extraction: Optionally extracts all archives to a sources subdirectory (with -x flag)
  11. Summary Reporting: Provides detailed execution statistics

Caching

The tool maintains an on-disk cache at ~/.cache/chainguard-source to avoid re-downloading identical files:

  • Both source archives and melange configs are cached using the SHA256 checksum of their URLs as the cache key
  • When a URL is requested, the cache is checked first
  • If found in cache, the file is copied from cache instead of downloading
  • If not in cache, the file is downloaded and saved to both the destination and cache
  • The cache persists across runs, significantly speeding up repeated fetches of the same images

Differences from Original chainguard-source Script

This tool is designed to replace the original bash-based chainguard-source script with improvements:

  • Uses downloadLocation: Directly uses SBOM downloadLocation fields instead of parsing git PURLs
  • Melange Config Support: Fetches melange configuration files from exact commits referenced in SBOM
  • Multi-Repository Support: Handles configs from wolfi-dev/os, chainguard-dev/enterprise-packages, and chainguard-dev/extra-packages
  • Go Implementation: More maintainable and portable than shell script
  • Better Performance: Concurrent downloads with configurable parallelism
  • Cleaner Output: Organized directory structure with separate subdirectories
  • No Git Cloning: Avoids heavy git clone operations, using direct archive downloads instead

Supported Archive Formats

  • .tar.gz and .tgz (gzip compressed)
  • .tar.xz (XZ compressed)
  • .tar.bz2, .tbz2, .tar.bz (bzip2 compressed)

Error Handling

  • Continues downloading other files if individual downloads fail
  • Continues extracting other archives if individual extractions fail
  • Provides detailed error messages for troubleshooting
  • Validates file paths during extraction for security
  • Reports validation warnings for missing download locations

Dependencies

  • github.com/ulikunitz/xz - XZ decompression support
  • github.com/google/go-containerregistry - Container registry operations
  • github.com/sigstore/cosign/v2 - Sigstore attestation verification

License

Licensed under the Apache License, Version 2.0. See the LICENSE file for details.

About

SBOM Downloads Fetcher

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages