## **Kubernetes Error Codes & Exit Codes: The Interview Cheat Sheet**

For your Google Kubernetes Technical Solutions Engineer interview, here are the **must-know** error codes categorized by what they tell you. I've focused on the ones that actually appear in debugging conversations.

---

## **üìä EXIT CODES (Container/Process Termination)**

These come from `kubectl describe pod` or `kubectl get pod -o yaml`. The pattern: **128 + signal number** = exit code.

### **üî¥ Memory & Resource Related**

| Exit Code | Signal | Common Meaning | Kubernetes Status | Your Interview Soundbite |
|-----------|--------|----------------|-------------------|-------------------------|
| **137** | SIGKILL (9) | **OOMKilled** - Container exceeded memory limit | `OOMKilled` | "Exit code 137 means 128+9‚Äîthe OOM killer terminated the process. First check: are memory limits too low, or does the app have a leak?"  |
| **1** | - | General application error | `Error` or `CrashLoopBackOff` | "Code 1 is the catch-all 'something went wrong'‚Äîapp logs are your friend here." |

### **üî¥ Signal-Related (Kernel/System Kills)**

| Exit Code | Signal | Common Meaning | Debugging Clue |
|-----------|--------|----------------|----------------|
| **134** | SIGABRT (6) | **Assertion failure** - Process aborted itself | Usually application bug, glibc detecting heap corruption  |
| **139** | SIGSEGV (11) | **Segmentation fault** - Invalid memory access | Pointer bugs, library mismatches, native code issues  |
| **143** | SIGTERM (15) | **Graceful termination requested** | Pod being shut down normally (128+15=143) |

### **üî¥ Configuration & Startup**

| Exit Code | Meaning | Scenario | Fix |
|-----------|---------|----------|-----|
| **127** | Command not found | Entrypoint or command in pod spec doesn't exist in image | Check your image's PATH or the command spelling  |
| **126** | Permission denied | Command exists but can't execute | File permissions, missing execute bit |
| **0** | Success | Container exited normally | Jobs/CronJobs expected behavior |

---

## **üìä HTTP CODES (Service Exposure & Probes)**

These appear in Ingress, Service, and Probe debugging.

### **üîµ Probe-Related (Liveness/Readiness)**

| HTTP Code | Context | What It Means | Interview Gold |
|-----------|---------|---------------|----------------|
| **503** | Readiness probe fails | App returned "Service Unavailable" during health check | "A failing readiness probe with 503 tells me the app is alive but not ready for traffic‚Äîdependencies might be down."  |
| **5xx** | Liveness probe fails | App in bad state, needs restart | Liveness triggers restart; readiness removes from service |

### **üîµ Auth & Access (The 401/403 Distinction)**

| HTTP Code | Kubernetes Context | Root Cause | Diagnostic Move |
|-----------|-------------------|------------|-----------------|
| **401 Unauthorized** | API Server requests | **Authentication failed** - Who are you? | Token expired, wrong credentials, missing auth header  |
| **403 Forbidden** | API Server requests | **Authorization failed** - I know who you are, but you can't do that | RBAC missing, insufficient permissions  |
| **401 vs 403** | **The interview soundbite** | *"401 is 'who are you?', 403 is 'I know you, but NO.'"* | This distinction alone shows depth |

**Real-world nuance**: A 403 from the API server on port 6443 could also mean a proxy is blocking you‚Äîcheck for `Via` or `X-Cache` headers in the response .

### **üîµ Other Important HTTP Codes**

| Code | Context | Meaning |
|------|---------|---------|
| **400** | GKE operations | Node pool requires recreation (during credential rotation)  |
| **404** | Ingress/Service | Endpoint not found‚Äîservice selector mismatched? |
| **504** | Ingress/Proxy | Upstream timeout‚Äîapplication slow or dead |

---

## **üìä KUBERNETES STATUS REASONS (kubectl describe)**

These are the **human-readable** reasons you see in pod status.

| Status Reason | What It Means | Common Exit Code |
|---------------|---------------|------------------|
| **OOMKilled** | Killed for memory overrun | 137  |
| **Error** | Container terminated with non-zero exit | Varies (1, 127, etc.) |
| **CrashLoopBackOff** | Pod starts, crashes, repeats | Varies |
| **ImagePullBackOff** | Can't pull image | N/A |
| **CreateContainerConfigError** | ConfigMap/Secret missing | N/A |
| **Preempting** | Higher priority pod kicked this out | N/A |

---

## **üéØ THE INTERVIEW QUICK REFERENCE CARD**

### **When you see...**
```
Exit Code 137 ‚Üí "OOMKilled. Memory limit or leak."
Exit Code 143 ‚Üí "Graceful termination. Probably scaling down."
Exit Code 1   ‚Üí "App error. Check logs."
Exit Code 127 ‚Üí "Command not found in image."
```

```
503 on readiness ‚Üí "App not ready. Dependencies?"
401 on API call   ‚Üí "Authentication failed. Token expired?"
403 on API call   ‚Üí "Authorization failed. RBAC missing."
```

### **The Diagnostic Flow**

> **"I look at three things: exit code tells me HOW it died, pod status tells me WHAT Kubernetes thinks, and logs tell me WHY."**

### **The Google-Grade Soundbite**

For a 401 vs 403 question:
> *"401 is an authentication failure‚Äîthe request lacks valid credentials. 403 is authorization failure‚Äîthe credentials are valid, but the user doesn't have permission. In Kubernetes, I debug 401 by checking service account tokens or kubeconfig; 403 sends me straight to RBAC roles and bindings."* 

For OOMKilled (137):
> *"Exit code 137 = 128 + SIGKILL(9). The Linux OOM killer terminated the process. I immediately check if memory limits are too restrictive or if the application has a leak. The pod's QoS class tells me how it ranks for termination‚ÄîGuaranteed pods are the last to go."* 

---

## **üìù PRO TIP FOR THE INTERVIEW**

When they ask about a failing pod:
1. **Name the exit code** first (shows you know where to look)
2. **Translate it** (137 ‚Üí OOMKilled)
3. **Explain the mechanism** (OOM killer, signal, etc.)
4. **State your next action** (check limits, logs, etc.)

This structure alone signals senior-level troubleshooting.