Here's a focused cheatsheet for your Kubernetes interview preparation. I've organized it by category with the **must-know** topics for each.

---

# Kubernetes Interview Cheatsheet

---

## 1. Auto Scaling

### Core Concepts
| Component | What it does | Key Interview Points |
|-----------|--------------|----------------------|
| **HPA (Horizontal Pod Autoscaler)** | Scales pods in/out based on metrics | • Works on **requests**, not limits<br>• Metrics server must be installed<br>• Supports CPU, memory, custom metrics |
| **VPA (Vertical Pod Autoscaler)** | Adjusts CPU/memory requests of existing pods | • Can cause pod restarts<br>• Three modes: Off, Auto, Initial |
| **Cluster Autoscaler** | Adds/removes worker nodes | • Works with cloud provider APIs<br>• Respects pod disruption budgets<br>• Won't scale down nodes with non-movable pods |
| **Metrics Server** | Provides resource usage data | • In-memory, no persistent storage<br>• Required for HPA to work |

### Key Formulas
```
Desired Replicas = ceil(current_replicas × (current_value / desired_value))

Example: CPU at 80%, target 50% → 2 × (80/50) = 3.2 → 4 replicas
```

### Must-Know
- **HPA can't scale to zero** (use KEDA for that)
- **Cooldown periods** prevent thrashing
- **Custom metrics** require adapter (Prometheus, etc.)

---

## 2. Service Discovery

### Core Concepts
| Component | What it does | Key Interview Points |
|-----------|--------------|----------------------|
| **Service** | Stable endpoint for pods | • Selector-based pod discovery<br>• Gets a stable ClusterIP and DNS name |
| **kube-proxy** | Implements Service on each node | • Modes: iptables (default), IPVS, userspace<br>• Watches API server for Service changes |
| **CoreDNS** | Internal DNS server | • Pods resolve services as: `svc-name.namespace.svc.cluster.local`<br>• Automatically configured for all Services |
| **Endpoints / EndpointSlices** | Tracks healthy pod IPs for a Service | • Updated automatically when pods change<br>• EndpointSlices scale better for large clusters |

### DNS Resolution Patterns
```
my-service                    → same namespace
my-service.default            → specific namespace
my-service.default.svc.cluster.local → fully qualified
```

### Must-Know
- **Headless Services** (`clusterIP: None`) → Direct pod DNS records
- **Environment variables** (deprecated) → DNS is preferred
- **Service without selector** → Manual endpoint management

---

## 3. Load Balancer

### Core Concepts
| Component | Type | Use Case |
|-----------|------|----------|
| **ClusterIP** | Internal | Default, only inside cluster |
| **NodePort** | External | Exposes on each node's IP:30000-32767 |
| **LoadBalancer** | External | Cloud provider LB integration |
| **Ingress** | External L7 | HTTP/HTTPS routing, host/path based |

### Traffic Flow
```
External → LoadBalancer (cloud LB) → NodePort → Service → Pods
External → Ingress (controller) → Service → Pods
```

### Must-Know
| Concept | Explanation |
|---------|-------------|
| **ExternalTrafficPolicy** | `Local` preserves client IP, `Cluster` spreads load |
| **Session Affinity** | Sticky sessions via `service.spec.sessionAffinity` |
| **Ingress Controller** | Not deployed by default (NGINX, Traefik, AWS ALB, etc.) |
| **Service Type Order** | LoadBalancer > NodePort > ClusterIP |

### Port Ranges
- **NodePort:** 30000-32767
- **Ingress:** Usually 80/443

---

## 4. Self Healing

### Core Concepts
| Component | Heals what? | How? |
|-----------|-------------|------|
| **kubelet** | Pods on its node | Restarts containers if they crash |
| **ReplicaSet controller** | Pod count | Creates new pods if any are missing |
| **Deployment controller** | Desired state | Rolls back or recreates if drifted |
| **Node controller** | Nodes | Marks nodes NotReady after timeout |
| **Control Plane** | Itself | Static pods or systemd restarts |

### Must-Know Checks

**Liveness Probe:**
```yaml
livenessProbe:
  httpGet:
    path: /health
    port: 8080
  initialDelaySeconds: 30
  periodSeconds: 10
```
→ If fails, **kubelet restarts the container**

**Readiness Probe:**
```yaml
readinessProbe:
  httpGet:
    path: /ready
    port: 8080
```
→ If fails, **pod removed from Service endpoints**

**Startup Probe:**
→ For slow-starting apps, protects liveness during boot

### Failure Scenarios
| Failure | What heals it? |
|---------|----------------|
| Pod crashes | kubelet restarts it |
| Node dies | Pods reschedule on other nodes (if not DaemonSet) |
| Deployment misconfiguration | ReplicaSet creates correct pods |

---

## 5. Zero Downtime Deployments

### Core Concepts
| Strategy | How it works | Downtime? |
|----------|--------------|-----------|
| **RollingUpdate** | Replaces pods one by one | None if done right |
| **Blue/Green** | New version fully deployed, then switch | None |
| **Canary** | Small % of traffic to new version | None |
| **Recreate** | Kills all old, creates new | Yes |

### RollingUpdate Configuration
```yaml
strategy:
  type: RollingUpdate
  rollingUpdate:
    maxSurge: 25%        # Extra pods during update
    maxUnavailable: 25%   # Pods that can be down
```

### Must-Have for Zero Downtime

| Requirement | Why |
|-------------|-----|
| **Readiness Probes** | Don't send traffic to starting pods |
| **PodDisruptionBudget** | Prevent voluntary evictions from taking all pods |
| **Graceful Shutdown** | App handles SIGTERM, finishes requests |
| **Multiple replicas** | At least 2-3 to survive pod loss |
| **Anti-affinity** | Spread pods across nodes |

### PodDisruptionBudget Example
```yaml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: app-pdb
spec:
  minAvailable: 2        # Or maxUnavailable: 1
  selector:
    matchLabels:
      app: my-app
```

### Common Pitfalls
| Mistake | Consequence |
|---------|-------------|
| No readiness probe | Traffic sent to starting/broken pods |
| `terminationGracePeriodSeconds` too short | Requests cut mid-process |
| Database schema changes incompatible | New pods fail against old DB |
| No PodDisruptionBudget | Node drain kills all pods at once |

---

## Bonus: Quick Comparison Table

| Category | Key Components | Most Important Concept |
|----------|----------------|------------------------|
| **Auto Scaling** | HPA, VPA, Cluster Autoscaler | HPA scales based on **requests**, not actual usage |
| **Service Discovery** | Service, CoreDNS, Endpoints | DNS: `svc.namespace.svc.cluster.local` |
| **Load Balancer** | Service (NodePort/LB), Ingress | Ingress is L7, Service is L4 |
| **Self Healing** | kubelet, controllers, probes | Liveness vs Readiness probes |
| **Zero Downtime** | RollingUpdate, PDB, probes | maxSurge + maxUnavailable = safe updates |

---

## Final Interview Tips

1. **Always mention "requests vs limits"** when discussing scheduling/scaling
2. **Know the difference:** Liveness (restart) vs Readiness (traffic)
3. **For zero downtime:** Always mention readiness probes + PDB
4. **For service discovery:** Mention that DNS is the modern way (not env vars)
5. **For load balancers:** Know that Ingress requires a controller

Good luck with your interview!