A simple tool that checks whether your Kubernetes cluster (observed state) matches what’s in Git (desired state), and shows what changed when they go out of sync.
The tool reads Kubernetes manifests, compares them with the current cluster state, and reports any drift. It also includes an optional remediation mode to re-apply the desired state.
This project is not intended to replace tools like ArgoCD or Flux. Those systems handle full GitOps workflows: sync orchestration, rollbacks, multi-cluster management, and integration with tools like Helm or Kustomize.
Keeping the scope small makes the behavior predictable and allows it to sit alongside an existing GitOps setup rather than replace one.
The overall architecture, reconciliation loop, diff behavior, and remediation approach are documented in DESIGN.md.
The focus is on keeping the reconciliation loop straightforward while correctly handling Kubernetes edge cases: defaulted fields and list reordering from sidecar injection.
RUNBOOK.md provides a step-by-step walkthrough to run the project locally using kind, including expected output and remediation flow.
An end-to-end drift detection flow is available via:
scripts/e2e-kind.shfor automated testing- Manual drift simulation using
kubectl set imageandkubectl patch(see Quick Start below) scripts/docker-demo.shas an optional shortcut covering steps 1-4 without a local Python environment (see RUNBOOK.md)
This project intentionally focuses on a small subset of resources (Deployment, Service, ConfigMap, Namespace) and operates on a single cluster using plain YAML manifests.
It does not aim to be a full GitOps platform, alerting system, or history store. See Assumptions and descoped areas for details.
# 1. Install dependencies (create a venv first to avoid touching system Python)
python3 -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
# 2. Start a local cluster (requires kind)
./scripts/setup-kind.sh
# 3. Apply the example desired manifests
kubectl apply -f examples/desired/
# 4. Run dry-run detection (nothing drifted yet)
python3 -m gitops_drift.main --manifests ./examples/desired --namespace default --dry-run --once
# 5. Simulate drift
./scripts/inject-drift.sh
# 6. Detect drift
python3 -m gitops_drift.main --manifests ./examples/desired --namespace default --dry-run --once
# 7. Remediate
# Service uses a full replace; revert the drifted label first to avoid HTTP 422 on clusterIP
kubectl patch service frontend --patch '{"metadata":{"labels":{"app":"frontend"}}}'
python3 -m gitops_drift.main --manifests ./examples/desired --namespace default --remediate --onceSee RUNBOOK.md for a more detailed walkthrough, including cluster setup and expected output.
python3 -m gitops_drift.main [options]
--manifests PATH Directory of desired-state YAML manifests (required)
--namespace NAME Default namespace when manifest omits one (default: default)
--dry-run Report drift without modifying the cluster (default: on)
--no-dry-run Disable dry-run; report drift without applying changes
--remediate Re-apply desired state when drift is found (disables dry-run)
--once Run one reconciliation cycle and exit
--interval SECONDS Loop interval in seconds (default: 60)
--kubeconfig PATH Path to kubeconfig; defaults to ~/.kube/config
--ignore-fields PATHS Comma-separated global field paths to ignore
--output FORMAT Report output format: text | json (default: text)
--log-level LEVEL DEBUG | INFO | WARNING | ERROR (default: INFO)
--fail-on-drift Exit with status 1 if drift is detected (for CI pipelines)
The controller reconciles all resources in the manifest directory in one cycle:
Drift Report
Git revision : 3b0406bf9c1a
============================================================
ConfigMap/app-config (ns: default)
Action : drift-detected (dry-run)
Fields : 2 drifted
data.LOG_LEVEL
desired : info
live : debug
data.MAX_CONNECTIONS
desired : 100
live : 200
Deployment/frontend (ns: default)
Action : drift-detected (dry-run)
Fields : 3 drifted
spec.template.spec.containers[name=frontend].image
desired : nginx:1.25
live : nginx:1.19
spec.template.spec.containers[name=frontend].resources.limits.cpu
desired : 250m
live : 500m
spec.template.spec.containers[name=frontend].resources.limits.memory
desired : 256Mi
live : 512Mi
Deployment/redis (ns: default)
Action : drift-detected (dry-run)
Fields : 1 drifted
spec.template.spec.containers[name=redis].image
desired : redis:7.2
live : redis:7.0
Namespace/demo (ns: )
Action : drift-detected (dry-run)
Fields : 1 drifted
metadata.labels.env
desired : dev
live : staging
Service/frontend (ns: default)
Action : drift-detected (dry-run)
Fields : 1 drifted
metadata.labels.app
desired : frontend
live : frontend-drift
============================================================
Total: 5 resource(s) drifted, 8 field(s) changed
Container paths use [name=<container-name>] notation. Containers are matched by name rather than position, so sidecar injection or reordering does not produce false positives.
Deployment/api does not appear in the report because it has drift.gitops.io/skip: "true". Deployment/redis is detected in dry-run mode; during remediation it reports remediation-blocked and requires operator review.
Three annotations and a global CLI flag control what the controller checks and corrects.
metadata:
annotations:
drift.gitops.io/skip: "true"The resource is not fetched and does not appear in the drift report. Common cases: a ConfigMap managed by an external operator, a Deployment diverged intentionally during a canary rollout, or a resource mid-migration. (example)
metadata:
annotations:
drift.gitops.io/ignore-fields: "spec.replicas,metadata.labels.env"The listed fields are excluded from the diff. During remediation, the controller reads the current live value for each ignored field and injects it into the replace body, so externally managed fields such as HPA-controlled spec.replicas are not reset. (example)
--ignore-fields applies the same exclusion to every resource in the run:
gitops-drift --manifests ./manifests --ignore-fields "spec.replicas"metadata:
annotations:
drift.gitops.io/remediation-policy: "alert"The resource is fetched, diffed, and reported normally on every cycle. Even when --remediate is active, the controller never re-applies it. The action string shows drift-detected (remediation-blocked). (example)
Use this for stateful resources, such as databases, caches, and queues, where an operator must review and approve any state change before it is applied. The drift remains visible in the report until an operator either corrects it manually or updates Git to match reality.
When running inside Kubernetes, replace the kubeconfig approach with a ServiceAccount:
apiVersion: v1
kind: ServiceAccount
metadata:
name: drift-controller
namespace: drift-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: drift-controller
rules:
- apiGroups: ["apps"]
resources: ["deployments"]
verbs: ["get", "list", "create", "update"]
- apiGroups: [""]
resources: ["services", "configmaps", "namespaces"]
verbs: ["get", "list", "create", "update"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: drift-controller
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: drift-controller
subjects:
- kind: ServiceAccount
name: drift-controller
namespace: drift-systemThe Python client tries the configured kubeconfig first. If kubeconfig loading fails, it falls back to in-cluster ServiceAccount credentials via load_incluster_config().
Use --once --fail-on-drift together to return a non-zero exit code when drift is detected, which makes this useful as a CI gate:
gitops-drift --manifests ./manifests --dry-run --once --fail-on-drift --output jsonExit code 0 means no drift was detected. Exit code 1 indicates drift. The JSON output can be parsed with jq for structured reporting.
Activate the venv first, then run:
source .venv/bin/activate
./scripts/e2e-kind.sh # drift detection only
REMEDIATE=true ./scripts/e2e-kind.sh # also test remediationSee scripts/e2e-kind.sh for full details. Requires kind, kubectl, and jq.
Only Deployment, Service, ConfigMap, and Namespace are supported. StatefulSet, DaemonSet, CronJob, Ingress, and CRDs are not included. Adding support is mechanical, but each resource type has its own update semantics and edge cases (for example, StatefulSet update strategies or CRD validation). Keeping the scope narrow means each type can be handled deliberately.
Lists of objects with name keys are matched by name to avoid false positives from container reordering or sidecar injection. Lists without stable identifiers fall back to positional comparison.
The tool operates on a single kubeconfig context. Running across multiple clusters requires running separate instances.
The tool reports to stdout and logs. Integrating with systems like Prometheus, PagerDuty, or Slack is out of scope. The JSON output can be piped to those systems.
Manifests are expected to be plain YAML. Template rendering is treated as a separate concern.
A small recursive diff keeps the implementation easy to follow and avoids introducing an external dependency. deepdiff is more feature-rich, but adds complexity that isn’t necessary for this scope.
Replace is predictable: the body you send becomes the new state. The tradeoff is that it can overwrite fields managed by other controllers, which is why ignored fields are preserved. Server-side apply is the safer production path.
The current implementation fetches each configured resource by name, which works fine for a small manifest set. At scale, this would move to informers and a work queue.
Add StatefulSet to SUPPORTED_KINDS, implement _get, _create, and _replace in kubernetes_client.py, and add test coverage. The diff and normalization logic remains unchanged.
Because the cost of an unintended apply is higher than the cost of requiring one extra flag. Any tool that modifies cluster state should require explicit opt-in.