This solution enables application teams to perform advanced network debugging (e.g., tcpdump, ncat, ip, ifconfig) on OpenShift nodes and pods without requiring cluster-admin or privileged SCC access. It uses a secure, auditable workflow with tightly scoped RBAC and a privileged service account in a dedicated namespace.
- App users (e.g.,
app1-admin
ServiceAccount) can launch debug jobs via a wrapper script. - Privileged operations are performed by a dedicated
debugger-sa
ServiceAccount in thedebugger
namespace, which is bound to a privileged SCC. - RBAC ensures app users have only the minimum permissions required to launch and monitor debug jobs.
- Gatekeeper Policy: A Gatekeeper policy is enforced in the
debugger
namespace to prevent app users from deploying any container images other than the approved debug image. This ensures only trusted debug workloads can run with privileged access.
-
App user runs the wrapper script:
run-debugger-job.sh
is executed by the app user (e.g.,app1-admin
SA).- The script takes parameters for node, pod, namespace, command, and arguments (including capture duration for tcpdump).
-
A Kubernetes Job is created in the
debugger
namespace:- The Job uses the
debugger-sa
ServiceAccount, which is bound to a privileged SCC (debugger-privileged-scc
). - The Job mounts host paths and runs a script from a ConfigMap (
execute-command-configmap.yaml
).
- The Job uses the
-
The debug script executes the requested command:
- Supports
tcpdump
,ncat
,ip
, andifconfig
. - Handles network namespace entry, output file management, and logs/audits all actions.
- Supports
-
App user can fetch logs and results:
- The app user can get logs and results (e.g., pcap files) as permitted by RBAC.
- OpenShift 4.x cluster
oc
CLI access with cluster-admin
oc create namespace debugger
oc create sa debugger-sa -n debugger
redhat-debugger-secret.yml can be found in quay.
oc create -f redhat-debugger-secret.yml --namespace=debugger
oc adm policy add-scc-to-user privileged -z debugger-sa -n debugger
oc apply -f k8s/execute-command-configmap.yaml
6. Apply RBAC for the Debugger Service Account. Check RBAC Role to ensure its correct namespace that you assign the role to debugger-sa account
For example, the below is giving debugger-sa account admin role to altiplano namespace. For fttc-ancillary namespace, it needs to be updated to namesapce: fttc-ancillary
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: debugger-sa-binding-altiplanoadmin
namespace: altiplano
subjects:
- kind: ServiceAccount
name: debugger-sa
namespace: debugger
roleRef:
kind: Role
name: admin
apiGroup: rbac.authorization.k8s.io
oc apply -f k8s/debugger-sa-rbac.yaml
---- tested till here ----
This role grants application users the necessary permissions to trigger the debug script and perform allowed actions. Update the file with the application team specific namespace.
- A Role named debugger-role-for-appteams is created in the debugger namespace.
- A RoleBinding assigns the
ad-app-altiplano-operators
group in thefttc-ancillary
namespace to the debugger-role-for-appteams Role.
In production, assign this role to the actual application team users that require debug access.
oc apply -f k8s/appsteam-admin-debugger-rbac.yaml
- Ensure the app user (e.g.,
ad-app-altiplano-operators
) is granted the roles defined inappsteam-admin-debugger-rbac.yaml
.
- Apply the Gatekeeper policy to restrict which images can be used in the
debugger
namespace:
oc apply -f k8s/gatekeeper-debugger-image-policy.yaml
- This policy ensures only the approved debug image(s) can be used for jobs in the
debugger
namespace, blocking any attempt by app users to run arbitrary images.
This solution provides built-in monitoring and alerting for debug job activity and security events using Prometheus and Alertmanager.
Apply the provided ServiceMonitor and PrometheusRule:
oc apply -f monitoring/prometheus-rules.yaml
- ServiceMonitor: Scrapes metrics from the debugger daemon (or debug jobs) exposing
/metrics
on themetrics
port. - PrometheusRule: Defines alerts for privilege violations, unauthorized command attempts, daemon downtime, and high job failure rates.
- DebuggerPrivilegeViolation: Triggered if a user attempts a privileged operation they are not authorized for.
- DebuggerUnauthorizedCommand: Triggered if a blocked/unauthorized command is attempted.
- DebuggerDaemonDown: Triggered if the debugger daemon is not up for more than 2 minutes.
- DebuggerHighJobFailureRate: Triggered if the job failure rate exceeds a threshold.
- Metrics are exposed on the
/metrics
endpoint of the debugger daemon or debug job pods. - You can query metrics such as
debugger_privilege_violations_total
,debugger_unauthorized_commands_total
, anddebugger_job_failures_total
in Prometheus.
- Alerts will appear in Alertmanager and can be routed to email, Slack, or other notification systems as configured in your cluster.
The debugger tool integrates with Prometheus Pushgateway to enable real-time monitoring and alerting of debugging operations.
Prometheus Pushgateway acts as an intermediary that allows ephemeral jobs (like our debugger jobs) to expose their metrics to Prometheus. Since debugging jobs are short-lived, Pushgateway stores these metrics until Prometheus scrapes them.
Deploy Pushgateway in the same namespace as the debugger tool:
# Apply the Pushgateway deployment manifest
oc apply -f k8s/pushgateway.yaml
The pushgateway.yaml
file includes:
- Pushgateway Deployment
- Service to expose Pushgateway
- ServiceMonitor for Prometheus integration
- PrometheusRules with alert definitions specific to debugging operations
┌───────────────────┐ Push Metrics ┌───────────────────┐ Scrape ┌───────────────────┐
│ │ │ │ │ │
│ Debugger Job │────────────────────▶│ Pushgateway │◀────────────────│ Prometheus │
│ run-debugger-job │ │ │ │ │
│ │ │ │ │ │
└───────────────────┘ └───────────────────┘ └─────────┬─────────┘
│ │
│ │
│ │
│ │ Fire Alerts
│ │
│ ┌───────────────────┐ ┌─────────▼─────────┐
│ │ │ Create Tickets │ │
│ │ ServiceNow │◀─────────────────────────────────────────│ AlertManager │
└────────▶│ (Incidents) │ │ │
Job status │ │ │ │
information └───────────────────┘ └───────────────────┘
The debug jobs automatically push these metrics to Pushgateway:
- debugger_job_started_total: Counter of started debugging jobs
- debugger_job_completed_total: Counter of successfully completed jobs
- debugger_job_failed_total: Counter of failed debugging jobs
- debugger_job_status: Status indicator (1=success, 0=running, -1=failed)
- debugger_job_duration_seconds: Duration of job execution
- debugger_pcap_files_generated_total: Number of PCAP files generated by tcpdump jobs
The Pushgateway deployment includes several alert rules that notify administrators about debugging activities:
- DebuggerJobCreated: Notifies when a debugging job starts
- DebuggerJobCompleted: Notifies when a job completes successfully
- DebuggerJobFailed: Notifies when a job fails
- DebuggerJobLongRunning: Alerts on jobs running longer than expected (>10 minutes)
- DebuggerPcapGenerated: Tracks PCAP file generation
- DebuggerTcpdumpNoPcap: Alerts when tcpdump jobs don't generate expected files
Alerts generated by Prometheus are sent to ServiceNow via AlertManager, creating tickets that can be tracked and managed through the ServiceNow interface.
# Check Pushgateway logs
oc logs deployment/pushgateway -n debugger
# Check if metrics exist in Pushgateway
oc exec $(oc get pod -l app=pushgateway -n debugger -o name | head -1) -n debugger -- curl http://localhost:9091/metrics | grep debugger_
# Test direct push to Pushgateway
oc exec $(oc get pod -l app=pushgateway -n debugger -o name | head -1) -n debugger -- sh -c "echo 'test_metric 1' | curl --data-binary @- http://localhost:9091/metrics/job/test"
For more detailed information, refer to the Pushgateway Integration Guide.
./run-debugger-job.sh <node-name> <pod-name> <pod-namespace> tcpdump 60
# Example:
./run-debugger-job.sh worker-node-1 my-app-pod app1 tcpdump 60
- This will capture traffic for 60 seconds in the pod's network namespace.
tcpdump [args] <duration>
ncat [args]
ip [args]
ifconfig [args]
netstat [args]
# Get job logs
kubectl logs <debugger-job-pod> -n debugger
# Download pcap files (if generated)
ls ./pcap-dump-*
- App users have only the minimum RBAC required to launch and monitor debug jobs in the
debugger
namespace. - Privileged operations (host access, network namespace entry) are performed only by
debugger-sa
with a privileged SCC. - Gatekeeper policy prevents app users from deploying unapproved images in the
debugger
namespace. - All actions are auditable via Kubernetes events and logs.
Below is a demo of the solution in action:
run-debugger-job.sh
(wrapper script)k8s/execute-command-configmap.yaml
(debug script ConfigMap)k8s/rbac.yaml
(RBAC for debugger-sa)k8s/app1-admin-sa.yaml
(RBAC for app user)k8s/scc.yaml
(privileged SCC)
-
The binary builder is in toolhost svcas3000010np.nbndc.local
-
Login to this host and
shc
is located /usr/local/bin/ -
Run shc to wrap run-debugger-job.sh script to be a binary
shc -r -T -f run-debugger-job.sh
this will create a bianry file name run-debugger-job.sh.x
- Rename the binary and upload it to application repo for use.
application repo is https://github.com/nbnco/fttx-securedebugger.git
mv run-debugger-job.sh.x run-debugger-job
Alerting and monitoring has been added in this release using pushgateway and alert manager. More information on Pushgateway is mentioned above. To deploy the latest release, follow the steps below.
oc apply -f k8s/pushgateway
The pushgateway.yaml
file includes:
- Pushgateway Deployment
- Service to expose Pushgateway
- ServiceMonitor for Prometheus integration
- PrometheusRules with alert definitions specific to debugging operations
oc apply -f k8s/execute-command-configmap.yaml
oc apply -f k8s/debugger-sa-rbac.yaml
./run-debugger-job.sh <node-name> <pod-name> <pod-namespace> tcpdump 60
# Example:
./run-debugger-job.sh worker-node-1 my-app-pod app1 tcpdump 60
if the solution is working - the logs should show something like below
DEBUG: Pushing metric debugger_job_started_total with value 1
DEBUG: URL: http://172.30.106.126:9091/metrics/job/debugger-job/user/rakeshkumarmallam/command/tcpdump
SUCCESS: Pushed metric debugger_job_started_total to Pushgateway