This is a classic Kubernetes troubleshooting interview question. Let me break down both concepts from diagnosis to resolution.

## OOMKilled (Out of Memory Killed)

**What it is**: When a container exceeds its memory limit, the Linux kernel's OOM killer terminates the process. Kubernetes reports this as `OOMKilled`.

### Detection Commands:
```bash
# Check pod status and reason
kubectl get pods
kubectl describe pod <pod-name>

# Look for state: OOMKilled in pod status
kubectl get pod <pod-name> -o yaml | grep -A 10 "lastState"

# Check pod events
kubectl get events --field-selector involvedObject.name=<pod-name>

# Monitor memory usage over time
kubectl top pod <pod-name>
kubectl logs <pod-name> --previous  # Logs from terminated container
```

### Common Causes & Solutions:

1. **Application Memory Leak**
   ```bash
   # Check memory trend
   kubectl exec -it <pod-name> -- top
   kubectl exec -it <pod-name> -- cat /sys/fs/cgroup/memory/memory.usage_in_bytes
   ```

2. **Insufficient Limits**
   ```yaml
   # Fix: Increase memory limit
   resources:
     requests:
       memory: "256Mi"
     limits:
       memory: "512Mi"  # Increase this
   ```

3. **Heap Size Misconfiguration**
   - Java apps: Adjust `-Xmx` flag
   - Node.js: Set `--max-old-space-size`

## CrashLoopBackOff

**What it is**: Kubernetes restart loop - pod starts, crashes, restarts repeatedly with exponential backoff delay.

### Detection Commands:
```bash
# Quick status check
kubectl get pods -o wide
kubectl describe pod <pod-name> | grep -A 10 "State:"

# Check restart count
kubectl get pods | grep CrashLoopBackOff

# View logs from previous crashed instance
kubectl logs <pod-name> --previous

# Watch pod status in real-time
kubectl get pods -w | grep <pod-name>
```

### Debugging Workflow:

1. **Check Application Logs**
   ```bash
   # Get logs from the crashed container
   kubectl logs <pod-name> --previous --tail=50
   
   # Stream logs from current attempt
   kubectl logs -f <pod-name>
   ```

2. **Verify Configuration**
   ```bash
   # Check if configmaps/secrets are mounted correctly
   kubectl describe pod <pod-name>
   
   # Validate environment variables
   kubectl exec -it <pod-name> -- env
   ```

3. **Test Readiness/Liveness Probes**
   ```bash
   # Check probe configuration
   kubectl get pod <pod-name> -o yaml | grep -A 15 "livenessProbe"
   
   # Test endpoint manually
   kubectl port-forward <pod-name> 8080:8080
   curl http://localhost:8080/health
   ```

### Common CrashLoopBackOff Causes:

**Application Error on Startup**
```bash
# Application exits immediately due to error
# Check entrypoint script
kubectl exec -it <pod-name> -- cat /entrypoint.sh
```

**Missing Dependencies**
```bash
# Database not reachable
kubectl exec -it <pod-name> -- nc -zv database-service 5432

# ConfigMap/Secret missing
kubectl get configmap,secret
```

**Resource Constraints**
```yaml
# Pod gets OOMKilled and restarts (mix of both issues)
# Check previous container's exit reason
kubectl describe pod <pod-name> | grep -i "exit code"
# Exit code 137 = OOMKilled
# Exit code 1 = application error
```

## Systematic Troubleshooting Approach:

```bash
# Step 1: Gather initial info
kubectl describe pod <pod-name> > pod_info.txt

# Step 2: Check logs from crashed instance
kubectl logs <pod-name> --previous > crash_logs.txt

# Step 3: Verify node resources
kubectl describe node <node-name> | grep -A 5 "Allocated resources"

# Step 4: Check if it's application or infrastructure
# Deploy a debug container alongside
kubectl debug -it <pod-name> --image=busybox --target=<container-name>

# Step 5: Check events at cluster level
kubectl get events --all-namespaces --sort-by='.lastTimestamp'
```

## Key Exit Codes Reference:
- **137** (SIGKILL) - OOMKilled or manually killed
- **143** (SIGTERM) - Graceful termination
- **1** - Application error
- **127** - Command not found (often entrypoint issue)

The critical difference: OOMKills are resource-based, CrashLoopBackOff can be any startup failure. Sometimes they intersect when a pod gets OOMKilled and enters CrashLoopBackOff after multiple restarts.