DiffKeeper: The Kubernetes Time Machine

DiffKeeper is a Black Box Flight Recorder for your containers.

It watches your application's filesystem in real-time and records every change. When a container crashes—or a CI test flakes—you can rewind the state to any exact moment and see exactly what happened.

Note: The earlier "stateful containers" design is archived. See Genesis & Pivot for the story.

The Problem: "Why did that test fail?"

You have a flaky test in CI. It fails 1 out of 50 times. You re-run the job, and it passes. You have no idea why.

Logs only show you what the application printed.
They don't show you that a config file was corrupted, a temp file was locked, or a binary was overwritten.

The Solution: Instant Replay

DiffKeeper uses eBPF to capture filesystem writes at line-rate and stores them in Pebble. Then it gives you a timeline so you never guess timestamps again.

1) Record a Session

Wrap your flaky test (or any command). Minimal overhead.

diffkeeper record --state-dir=/tmp/trace -- go test ./...

2) See the Timeline (no blindfolds)

List every write in order to pick the exact second to rewind:

diffkeeper timeline --state-dir=/tmp/trace
[00m:01s] WRITE    status.log (13B)
[00m:05s] WRITE    db.lock (6B)
[02m:14s] WRITE    status.log (22B)   <-- the failure

3) Export the Crash Site

Restore the filesystem to the moment of failure:

diffkeeper export --state-dir=/tmp/trace --out=./debug_fs --time="2m14s"

cd ./debug_fs and inspect files exactly as they existed at that moment.

Drop-in GitHub Action

No curl | sh snippets needed—use the composite action directly:

steps:
  - uses: actions/checkout@v4
  - name: Record flaky test
    uses: saworbit/diffkeeper@v1
    with:
      command: go test ./...
      state-dir: diffkeeper-trace

On failure the trace uploads as an artifact; you can run diffkeeper timeline to find the culprit write, then diffkeeper export to reconstruct it locally.

The "Flaky CI" Demo

Run the built-in demo to see the loop end-to-end:

diffkeeper record --state-dir=./trace -- go run ./demo/flaky-ci-test
diffkeeper timeline --state-dir=./trace
diffkeeper export --state-dir=./trace --out=./restored --time="2s"
cat ./restored/status.log  # ERROR: Connection Lost

Architecture

Engine: Pure Go + eBPF (CO-RE)
Storage: Pebble (LSM) for high-speed ingestion.
Diffing: bsdiff (binary patches) for efficient storage.

CI / Dogfooding

GitHub Actions (.github/workflows/ci.yml) runs unit/race tests, cross-platform builds, and a functional time-machine test that records a flaky script and verifies exports.
BoltDB-era workflows remain archived under docs/archive/v1-legacy/workflows/.

Requirements & Compatibility

Build Process: The recommended way to build the project is with Docker, which requires no local dependencies. Simply run make build-dockerized. For local builds, you will need Go, clang, and bpftool.
Runtime Privileges: The core recording feature requires sudo privileges on Linux to attach the eBPF probes to the kernel. The application will provide a clear error if run without them.
Cross-Platform Support: The high-performance eBPF monitoring is Linux-specific. The tool provides a fallback for macOS and Windows, but its behavior and performance will differ.

Getting Started

See the Quickstart to record, view the timeline, and export your first trace.

Name		Name	Last commit message	Last commit date
Latest commit History 90 Commits
.github		.github
bench		bench
demo		demo
docs		docs
ebpf		ebpf
internal		internal
k8s		k8s
pkg		pkg
release-assets		release-assets
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
.golangci.yml		.golangci.yml
.goreleaser.yml		.goreleaser.yml
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
Dockerfile.dev		Dockerfile.dev
Dockerfile.postgres		Dockerfile.postgres
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
action.yml		action.yml
benchmark_results.txt		benchmark_results.txt
demo.sh		demo.sh
go.mod		go.mod
go.sum		go.sum
k8s-statefulset.yaml		k8s-statefulset.yaml
main.go		main.go
permissions_unix.go		permissions_unix.go
permissions_windows.go		permissions_windows.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DiffKeeper: The Kubernetes Time Machine

The Problem: "Why did that test fail?"

The Solution: Instant Replay

1) Record a Session

2) See the Timeline (no blindfolds)

3) Export the Crash Site

Drop-in GitHub Action

The "Flaky CI" Demo

Architecture

CI / Dogfooding

Requirements & Compatibility

Getting Started

About

Uh oh!

Releases 1

Packages

Contributors 2

Uh oh!

Languages

License

saworbit/diffkeeper

Folders and files

Latest commit

History

Repository files navigation

DiffKeeper: The Kubernetes Time Machine

The Problem: "Why did that test fail?"

The Solution: Instant Replay

1) Record a Session

2) See the Timeline (no blindfolds)

3) Export the Crash Site

Drop-in GitHub Action

The "Flaky CI" Demo

Architecture

CI / Dogfooding

Requirements & Compatibility

Getting Started

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Uh oh!

Languages

Packages