A Go program that fetches source code referenced in Chainguard SBOMs. It parses SPDX-formatted SBOM (Software Bill of Materials) files to download and extract source code archives referenced in the downloadLocation
fields, and also fetches melange configuration files for each package. It supports both local SBOM files and direct container image references that retrieve SBOMs from Sigstore attestations.
chainguard-source/
├── cmd/
│ └── chainguard-source/ # Main entrypoint
│ └── main.go
├── internal/
│ ├── cache/ # Cache management
│ ├── sbom/ # SBOM parsing and attestations
│ ├── download/ # Download functionality
│ ├── melange/ # Melange config handling
│ └── archive/ # Archive extraction
├── go.mod
├── go.sum
└── README.md
- SPDX SBOM Parsing: Reads SPDX 2.3 formatted JSON files
- Container Image Support: Directly fetch SBOMs from container images via Sigstore attestations
- Melange Config Fetching: Downloads melange configuration files from various repositories (wolfi-dev/os, chainguard-dev/enterprise-packages, chainguard-dev/extra-packages)
- Relationship Analysis: Uses
GENERATED_FROM
relationships to map source packages to APK packages - Concurrent Downloads: Configurable parallel downloading (default: 4 concurrent)
- Multi-format Archive Support: Handles
.tar.gz
,.tar.xz
,.tar.bz2
, and.tgz
files - Optional Extraction: Extract archives with
-x
flag to asources/
subdirectory - APK Traceability: Shows which APK package each source archive belongs to
- Dry Run Mode: Preview what would be downloaded without actually downloading
- Resource Warnings: Warns about disk and network usage with bypass option
- Security: Includes path traversal protection during extraction
- Organized Output: Creates structured directory layout with subdirectories for different content types
- On-disk Cache: Caches downloaded archives in
~/.cache/chainguard-source
to avoid re-downloading
- Go 1.21 or later
- Internet connection for downloading archives
Build from source:
go build -o chainguard-source ./cmd/chainguard-source
chainguard-source [OPTIONS] [ -i|--image IMAGE[:TAG] | -s|--sbom SBOM.spdx.json ]
-i, --image IMAGE[:TAG]
: Target image to fetch sources-s, --sbom SBOM.spdx.json
: Target SBOM file to process-a, --arch [amd64|arm64]
: Architecture (default: amd64)-d, --dry-run
: Dry run - skip actual downloads-x, --extract
: Extract downloaded archives tosources/
subdirectory-y, --yes
: Automatically answer yes to resource warnings--github-token TOKEN
: GitHub personal access token for private repositories--concurrency N
: Number of concurrent downloads (default: 4)
GITHUB_TOKEN
: GitHub personal access token (used if --github-token is not provided)
# Fetch sources from a container image
chainguard-source -i cgr.dev/chainguard/python:latest
# Process a local SBOM file
chainguard-source -s sbom.spdx.json
# Dry run to see what would be downloaded
chainguard-source --image cgr.dev/chainguard/unbound:latest --dry-run
# Skip resource warning prompt
chainguard-source -i cgr.dev/chainguard/nginx:latest -y
# Specify architecture for multi-arch images
chainguard-source -i cgr.dev/chainguard/nginx:latest -a arm64
# Use more concurrent downloads
chainguard-source -i cgr.dev/chainguard/haproxy:latest --concurrency 8
# Download and extract archives
chainguard-source -i cgr.dev/chainguard/python:latest -x
# Fetch sources including private melange configs with GitHub token
chainguard-source -i cgr.dev/chainguard-private/myapp:latest --github-token ghp_xxxxx
# Or use environment variable
export GITHUB_TOKEN=ghp_xxxxx
chainguard-source -i cgr.dev/chainguard-private/myapp:latest
The program creates a hierarchical sources/
directory structure that mirrors the image registry path:
sources/
└── cgr.dev/ # Registry domain
└── chainguard/ # Organization
└── static:latest/ # Image:tag
├── sboms/ # SBOM files
│ └── static:latest.sbom.spdx.json
├── melange-configs/ # Melange configuration files
│ ├── package1.yaml
│ ├── package2.yaml
│ └── ...
├── artifacts/ # Downloaded source archives
│ ├── source1.tar.gz
│ ├── source2.tar.xz
│ └── ...
└── sources/ # Extracted source code (only with -x flag)
├── source1-extracted/
├── source2-extracted/
└── ...
For local SBOM files, a flat structure is used:
sources/
└── <sbom-filename>/ # Named after the SBOM file
├── sboms/
├── melange-configs/
├── artifacts/
└── ...
🔍 Retrieving SBOM from container image: cgr.dev/chainguard/unbound:latest
💾 SBOM saved to: sources/cgr.dev/chainguard/unbound:latest/sboms/unbound:latest.sbom.spdx.json
🔍 Found 8 populated downloadLocation URLs
🔍 Found 12 packages with melange configs to fetch
🚀 Starting download (1/20): gcc-15.1.0.tar.xz [libgcc]
🚀 Fetching melange config (9/20): unbound
📦 Downloaded (1/20): gcc-15.1.0.tar.xz [libgcc]
📄 Saved melange config: unbound.yaml
...
🎁 Download complete. Files saved to: sources/cgr.dev/chainguard/unbound:latest
📊 === EXECUTION SUMMARY ===
🚀 Downloads: 20 successful, 0 failed (total: 20)
💡 Tip: Use -x or --extract flag to extract archives
# With extraction enabled:
🗄️ Extracting 8 archives to sources/ subdirectory...
📤 Extracting (1/8): gcc-15.1.0.tar.xz [libgcc]
🎉 Extracted: gcc-15.1.0.tar.xz [libgcc]
...
🎆 Extraction complete. Archives extracted to: sources/cgr.dev/chainguard/unbound:latest/sources
For accessing private repositories (chainguard-dev/enterprise-packages, chainguard-dev/extra-packages), you'll need a GitHub personal access token:
- Create a token: Go to GitHub Settings → Developer settings → Personal access tokens
- Required scopes: Select
repo
scope for full access to private repositories - Usage: Provide via
--github-token
flag orGITHUB_TOKEN
environment variable
The tool uses the GitHub API to fetch melange configurations, which:
- Supports authentication for private repositories
- Provides helpful error messages when authentication is needed
- Respects API rate limits (higher limits with authentication)
- Input Processing: Accepts either local SBOM files or container image references
- Resource Warning: Displays disk space and warns about network/disk usage (unless
-y
flag used) - SBOM Retrieval: For container images, fetches SBOM from Sigstore attestations
- SBOM Parsing: Reads the SPDX JSON file and parses packages, relationships, and external references
- Package Analysis:
- Maps source packages to APK packages via
GENERATED_FROM
relationships - Extracts package build configuration references from external refs
- Maps source packages to APK packages via
- Download Planning: Identifies downloadable content:
- Source archives from
downloadLocation
fields - Melange configs from referenced GitHub repositories at specific commits
- Source archives from
- Cache Check: Before downloading, checks
~/.cache/chainguard-source
for cached archives (indexed by SHA256 of URL) - Concurrent Download: Downloads archives and configs in parallel using configurable worker pools
- Artifact Organization: Stores downloaded files in appropriate subdirectories
- Automatic Extraction: Optionally extracts all archives to a sources subdirectory (with
-x
flag) - Summary Reporting: Provides detailed execution statistics
The tool maintains an on-disk cache at ~/.cache/chainguard-source
to avoid re-downloading identical files:
- Both source archives and melange configs are cached using the SHA256 checksum of their URLs as the cache key
- When a URL is requested, the cache is checked first
- If found in cache, the file is copied from cache instead of downloading
- If not in cache, the file is downloaded and saved to both the destination and cache
- The cache persists across runs, significantly speeding up repeated fetches of the same images
This tool is designed to replace the original bash-based chainguard-source
script with improvements:
- Uses downloadLocation: Directly uses SBOM
downloadLocation
fields instead of parsing git PURLs - Melange Config Support: Fetches melange configuration files from exact commits referenced in SBOM
- Multi-Repository Support: Handles configs from wolfi-dev/os, chainguard-dev/enterprise-packages, and chainguard-dev/extra-packages
- Go Implementation: More maintainable and portable than shell script
- Better Performance: Concurrent downloads with configurable parallelism
- Cleaner Output: Organized directory structure with separate subdirectories
- No Git Cloning: Avoids heavy git clone operations, using direct archive downloads instead
.tar.gz
and.tgz
(gzip compressed).tar.xz
(XZ compressed).tar.bz2
,.tbz2
,.tar.bz
(bzip2 compressed)
- Continues downloading other files if individual downloads fail
- Continues extracting other archives if individual extractions fail
- Provides detailed error messages for troubleshooting
- Validates file paths during extraction for security
- Reports validation warnings for missing download locations
github.com/ulikunitz/xz
- XZ decompression supportgithub.com/google/go-containerregistry
- Container registry operationsgithub.com/sigstore/cosign/v2
- Sigstore attestation verification
Licensed under the Apache License, Version 2.0. See the LICENSE file for details.