A command-line tool for processing Go code coverage collected from running applications across an entire OpenShift/Kubernetes cluster. It downloads coverage data from S3, compiles it into a SQLite database, clones source repositories, and generates interactive HTML reports with annotated source code.
A "collection" can span multiple cluster lifecycles and is not tied to a single cluster instance.
go build -o coverage-collector ./cmd/coverage-collectorOr install directly:
go install github.com/jupierce/cluster-code-coverage-analysis/cmd/coverage-collector@latestGo binaries built with -cover can be linked with a small HTTP coverage server
that serves binary coverage data on a well-known port (default 53700+). A
coverage producer running on the cluster collects this data and uploads it to
S3. This tool downloads that data and processes it into interactive HTML reports.
Coverage data is cumulative: Go's in-process counters record every code path
executed since the process started. Multiple collections accumulate additional
covcounters.* files, so no data is lost. When the data is compiled and
rendered, all counter files are merged to produce combined reports.
Lines originating from coverage_server.go (the embedded HTTP server) are
automatically filtered out of all reports.
The primary workflow uses the cluster command group. Run these subcommands in
order:
coverage-collector cluster download --collection <name> # Download from S3
coverage-collector cluster compile --collection <name> # Build SQLite DB
coverage-collector cluster clone-sources --collection <name> # Clone source repos
coverage-collector cluster render --collection <name> # Generate HTML
Download coverage data (covmeta and covcounters files) from an S3 bucket.
Generates metadata.json for each coverage entry from S3 path components.
coverage-collector cluster download --collection my-collection \
--bucket art-ocp-code-coverage \
--prefix openshift-ci/coverage \
--profile saml \
--region us-east-1
# Skip already-downloaded entries
coverage-collector cluster download --collection my-collection \
--bucket art-ocp-code-coverage \
--prefix openshift-ci/coverage \
--skip-existingFlags:
| Flag | Default | Description |
|---|---|---|
--bucket |
(required) | S3 bucket name |
--prefix |
(required) | S3 path prefix |
--profile |
AWS CLI profile | |
--region |
AWS region | |
--skip-existing |
false | Skip entries that already have local data |
Process raw coverage data into an SQLite database. This step:
- Converts binary coverage to text format via
go tool covdata textfmt - Filters out
coverage_server.golines - Groups reports by owner (Deployment, DaemonSet, StatefulSet, Job, Host, Pod)
- Merges coverage from multiple pods of the same owner/binary
- Resolves source repository URLs from
info.jsonfiles and image labels/env vars - Computes per-file coverage statistics
Change detection uses MD5 hashes; only changed reports are reprocessed.
# Incremental compile
coverage-collector cluster compile --collection my-collection
# Force full recompilation
coverage-collector cluster compile --collection my-collection --update '*'
# Force recompilation for a specific namespace
coverage-collector cluster compile --collection my-collection \
--update 'namespace=openshift-apiserver'Flags:
| Flag | Default | Description |
|---|---|---|
--update |
Force recomputation (repeatable, AND logic). Use '*' for all, or field=glob for namespace, node, container, image |
Clone source repositories identified during compile. Uses source_url and
source_commit from the image_sources table to clone at the exact commit.
coverage-collector cluster clone-sources --collection my-collectionFlags:
| Flag | Default | Description |
|---|---|---|
--skip-existing |
true | Skip already-cloned repositories |
Generate HTML reports from the compiled database. Produces one HTML report per
unique binary (identified by covmeta hash), plus an interactive index.html.
Multiple owners running the same binary share a single HTML report. The index shows all owners with their individual metadata, linking to the shared report.
# Render to the default location (<collection>/html/)
coverage-collector cluster render --collection my-collection
# Render to a custom directory
coverage-collector cluster render --collection my-collection \
--output-dir my-collection/html-post-e2e
# Only generate the index (skip per-binary HTML)
coverage-collector cluster render --collection my-collection \
--skip-component-htmlFlags:
| Flag | Default | Description |
|---|---|---|
--output-dir |
<collection>/html |
Output directory for HTML reports |
--skip-component-html |
false | Only generate the index |
These flags apply to all cluster subcommands:
| Flag | Default | Description |
|---|---|---|
--collection |
(required) | Collection name; also used as the working directory |
--verbosity |
info | Log verbosity: error, info, debug, trace |
--max-concurrency |
8 | Maximum concurrent operations |
After a full run, the collection directory looks like:
<collection>/
coverage/ # Raw coverage data from S3
<ns>-<pod>-<container>/
metadata.json # Pod/container/binary info
info.json # Source URL/commit (from S3 producer)
covmeta.<hash> # Coverage metadata (deterministic per binary)
covcounters.<hash> # Coverage counters (unique per collection)
coverage.db # SQLite database (~7GB for a full cluster)
repos/ # Cloned source repositories
github.com/<org>/<repo>/<commit-prefix>/
html/ # Generated HTML reports
index.html # Interactive dashboard
<hash>.html # Per-binary coverage reports (named by covmeta hash)
logs/ # Timestamped log files
The generated index.html provides:
- Search by namespace, owner name, container, or binary
- Filter by namespace, owner type, or coverage level
- Sort by any column (namespace, owner, container/binary, coverage %, statements)
- Color-coded coverage: Excellent (>=70%), Good (>=50%), Moderate (>=30%), Poor (>=15%), Critical (<15%)
- Click-through to per-binary HTML reports with annotated source code
- Expandable rows showing pods, hosts, and image details
- Checkbox filters to hide e2e-* and openshift-must-gather-* namespaces (checked by default)
- Deduplicated stats: Overall coverage percentages are computed by unique binary hash, so the same binary running in multiple owners is only counted once
Each <hash>.html report includes:
- Collapsible header listing all owner groups that run this binary (namespace, type, owner, containers, pod count, hosts)
- Stat cards with source file count, overall coverage %, total and covered statements
- File table with search, coverage level filter, and sortable columns
- Source code viewer with line numbers, green/red coverage highlighting, and per-line execution counts
- Split view mode for viewing the file list alongside source code
- Deep linking via URL hash (
#file0,#file1, etc.) - Unresolved files: Files without cloned source show "No source code resolved for this file" with their coverage stats still computed
Reports are grouped by owner type, inferred from pod name patterns:
| Pattern | Owner Type |
|---|---|
name-<hash>-<5char> |
Deployment |
name-<number> |
StatefulSet |
name-<5char> |
DaemonSet |
installer-*, pruner-* |
Job |
| Host-level processes | Host |
| Unrecognized pods | Pod (No Owner) |
Owners with the same binary (same covmeta hash) share a single HTML report. This prevents inflated statement counts when the same binary runs in multiple pods with different names (e.g., static pods with per-node names).
The tool uses a 3-strategy cascade to find source code for annotated reports:
- Image labels/env vars (fast): Looks up
io.openshift.build.source-location/io.openshift.build.commit.idfrom container image labels, and__doozer_group/__doozer_keyfrom image environment variables. Also checksinfo.jsonfiles from the coverage producer. Validates that the repo's Go module matches the coverage package path. - Package path matching: Walks cloned repos and scores by Go module prefix match.
- Owner name fallback: Matches owner name to repository directory names.
For host binaries (no container image), source info comes from info.json files
using synthetic host:<binary_name> keys.
The bigquery command group persists coverage data from the SQLite database
into BigQuery for cross-collection analysis and querying.
# Ingest all coverage data
coverage-collector bigquery --project my-gcp-project --dataset my_dataset \
ingest --collection my-collection
# Ingest only specific namespaces
coverage-collector bigquery --project my-gcp-project --dataset my_dataset \
ingest --collection my-collection \
--namespace 'openshift-apiserver' --namespace 'openshift-etcd'
# Filter by owner name
coverage-collector bigquery --project my-gcp-project --dataset my_dataset \
ingest --collection my-collection \
--owner 'kube-apiserver' --owner 'etcd'Flags (bigquery group):
| Flag | Default | Description |
|---|---|---|
--project |
(required) | GCP project ID |
--dataset |
(required) | BigQuery dataset name |
Flags (ingest subcommand):
| Flag | Default | Description |
|---|---|---|
--collection |
(required) | Collection name (same as cluster subcommands) |
--namespace |
* |
Namespace glob filter (repeatable, OR logic) |
--owner |
* |
Owner name glob filter (repeatable, OR logic) |
Two tables are created automatically:
-
coverage_data: One row per source line per binary. Includes source code text, line number, and execution count. Partitioned byingestion_time, clustered by(binary_hash, collection_id). -
coverage_generators: One row per unique binary hash. Lists all owners (namespace, owner, container, binary) as a repeated record. Includessoftware_groupandsoftware_keymetadata from the coverage producer. Partitioned byingestion_time, clustered by(software_group, binary_hash, collection_id, source_url).
Uses GCP Application Default Credentials. Run gcloud auth application-default login before using, or set the GOOGLE_APPLICATION_CREDENTIALS environment
variable.
-
Coverage data in S3: The coverage producer must have uploaded covmeta and covcounters files to the configured S3 bucket.
-
Go toolchain: Required for
go tool covdata textfmt(converting binary coverage to text format). -
AWS CLI: Required by
downloadto fetch data from S3. -
Git: Required by
clone-sourcesto clone repositories. -
ocCLI (optional): Used duringcompileto inspect container image labels and environment variables for source repository and software group/key info. Falls back toinfo.jsonif unavailable. -
GCP credentials (for BigQuery ingest):
gcloud auth application-default loginorGOOGLE_APPLICATION_CREDENTIALSenvironment variable.
The coverage HTTP client was derived from psturc/go-coverage-http.