Here’s a **highest-ROI, interview-ready cheat sheet** pulled from *Kubernetes Patterns* (PDF: ). I’m listing **the “Problem → Solution → Interview soundbite / gotcha”** items most likely to help in a Google Kubernetes TSE conversation.

---

## 1) Health Probe (Liveness vs Readiness) — **automation + troubleshooting gold**

**Problem:** “Process is running” is not the same as “app is healthy.” Apps can hang, deadlock, OOM inside the runtime, etc. 
**Solution:**

* **Liveness** detects “must restart” failures (Kubelet restarts container). 
* **Readiness** detects “don’t send traffic yet/for now” (remove from Service endpoints; no restart). 
* Readiness gates rolling updates: only when readiness passes is a Deployment considered successful for progressing the rollout. 
  **Interview soundbite:** “Readiness prevents bad rollouts and protects users; liveness self-heals stuck processes.”

  The significance of a readiness probe in Kubernetes rolling deployments is to ensure zero-downtime updates by preventing traffic from being routed to new Pods until they are fully functional and ready to handle requests.

---

## 2) Predictable Demands (requests/limits, QoS) — **scheduling + outages**

**Problem:** If you don’t size containers, the scheduler can’t place workloads well; and runtime behavior under pressure becomes unpredictable. 
**Solution:**

* Understand **compressible vs incompressible** resources: CPU throttles, memory kills. 
* Set **requests** (minimum) and **limits** (max). Scheduler uses **requests (not limits)** for placement. 
* QoS classes matter during node pressure: **Best-Effort / Burstable / Guaranteed**. 
  **Interview soundbite:** “Most ‘random evictions/OOMs’ start with bad requests/limits.”

  Key Best Practices for QoS:
Guaranteed QoS: Set CPU and memory requests equal to limits for critical services to ensure guaranteed resources.
Burstable QoS: Set limits higher than requests to allow apps to handle spikes, but keep the ratio reasonable (e.g., 
 the request).
Resource Monitoring: Use tools like Prometheus, Grafana, or kubectl top to monitor actual usage vs. allocated, adjusting accordingly.
Avoid BestEffort: Never use BestEffort in production, as these pods are the first to be killed under resource contention.
Memory vs. CPU: Memory is incompressible (leads to OOMKilled), making strict limits essential. CPU is compressible (throttling), allowing for more flexible, burstable configurations.
Namespace Quotas: Define resource quotas per namespace to manage aggregate resource usage. 


---

## 3) Declarative Deployment (Rolling, Canary, Blue-Green) — **zero-downtime thinking**

**Problem:** Imperative rollouts are brittle (client-side orchestration, hard to repeat, drifts over time). 
**Solution:**

* Use **Deployment** + **RollingUpdate**, shape rollout with **maxSurge/maxUnavailable**, and **don’t forget readiness**. 
* Canary: run a small ReplicaSet of the new version, direct some traffic, then scale new up/old down. 
  **Interview soundbite:** “Readiness is what turns ‘rolling update’ into ‘safe rolling update’.”

```
Primary Deployment: (you existing stable deployment)

apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp-primary
spec:
  replicas: 5
  selector: 
    matchLabels: 
      app: myapp 
      version: primary 
  template: 
    metadata: 
      labels: 
        app: myapp
        version: primary 
    spec: 
      containers: 
      - name: myapp 
        image: myapp:1.0

Canary deployment (new version, we are rolling with only 1 replica):

apiVersion: apps/v1 
kind: Deployment 
metadata: 
  name: myapp-canary 
spec: 
  replicas: 1
  selector: matchLabels: 
    app: myapp 
    version: canary 
  template: 
    metadata: 
      labels: 
        app: myapp
        version: canary 
    spec: 
      containers: 
      - name: myapp 
        image: myapp:1.1
  ```


kubectl create deployment web-stable --image=nginx:1.26 --replicas=9
kubectl scale deployment web-stable --replicas=9

kubectl expose deployment web-stable --port=80 --target-port=80 --name=web-service


kubectl create deployment web-canary --image=nginx:1.27 --replicas=1

# Scale canary up to take more traffic
kubectl scale deployment web-canary --replicas=2
# Scale stable down to reduce traffic
kubectl scale deployment web-stable --replicas=8

However, the above imperative commands are only to give you idea, it is not prod best practice, it is not repeatable and can easily drift
Canary goes better along with ingress, there you can route traffic, regions
Main story: Readiness probe and Rollouts are best friends, dont make the miss each other, you will miss your prod

---

## 4) Elastic Scale — HPA/VPA/CA + the **HPA+VPA conflict insight**

### 4A) HPA: picking the right metric

**Problem:** Autoscaling is easy to misconfigure; some metrics don’t correlate with replica count. 
**Solution:** Pick metrics that drop when you add replicas (CPU or QPS often works; memory often doesn’t unless the app truly redistributes memory). 

To deal with memory spikes in Kubernetes, the Horizontal Pod Autoscaler (HPA) should not be the primary tool because memory usage often does not scale linearly with the number of pods. Memory spikes are better addressed through a combination of proper resource allocation (Vertical Pod Autoscaler) and proactive strategies (like setting appropriate resource limits and requests) to manage the underlying cause of the spikes

### 4B) HPA: the formula + “CPU% is based on requests”

HPA computes desired replicas from current vs desired metric value. 
For resource metrics, the percentage is based on **container requests, not limits**. 

$$DesiredReplicas = \lceil CurrentReplicas \times \frac{CurrentMetricValue}{TargetMetricValue} \rceil$$

In k8s, (not in general linux), cpu% is based on the requests meaning, the configured resource request in container

### 4C) Delays and thrash control (why it reacts “late”)

Scaling is a pipeline (cAdvisor → kubelet → metrics server → HPA loop), and deliberate smoothing adds delay; tuning is tradeoffs. 

3 Important "Rules" the HPA FollowsTo prevent your cluster from constantly "jittering" (scaling up and down every few seconds), Kubernetes adds some safety logic:

The 10% Tolerance: By default, the HPA will not scale if the ratio ($Current / Target$) is between 0.9 and 1.1. This prevents tiny fluctuations from triggering a scale event.

Missing Metrics: If one pod in the group isn't reporting CPU (e.g., it's still starting up), the HPA ignores it during the calculation to avoid "skewing" the average.

Cooldown/Stabilization: HPA usually waits (default is 5 minutes for scale-down) to ensure the load is actually gone before killing pods.

### 4D) VPA: disruption + update modes

VPA can recommend/apply request changes; **Auto** can evict pods to apply changes (disruptive). 
VPA recommender->keeps watching metrics server // VPA updater for evictions // VPA admission controller for intercepting evicts and create new bumped pod. VPA recommender gets in touch with both

MODES: off // initial // recreate // auto*

### 4E) **Why HPA + VPA together (same resource metrics) is a problem**

**Core issue:** they’re not aware of each other, so you can get “double scaling” and weird feedback loops. 
**Interview-ready explanation (tie it to the math):**

* If HPA uses CPU% and CPU% is computed vs **requests** 
* …and VPA changes requests, then the **same real CPU usage suddenly looks like a different percentage**, so HPA may scale up/down unexpectedly.
  **Practical guidance (what to say you’d do):**
* Use VPA **Off/Initial** for right-sizing signals, then set stable requests; run HPA on CPU/QPS/custom metrics after. 

| Scaler | Metric | Mode | Best Use Case |
| --- | --- | --- | --- |
| **HPA** | CPU | Active | Handling sudden traffic spikes. |
| **VPA** | CPU/Mem | **Off** | Long-term "Right-sizing" advice. |
| **HPA** | QPS | Active | Scaling before the CPU even gets hot. |

---

## 5) Managed Lifecycle (SIGTERM, grace period, hooks) — **reliability during restarts**

**Problem:** Pods get killed/restarted for many reasons; if the app doesn’t shut down cleanly you drop in-flight work. 
**Solution:**

* Handle **SIGTERM** quickly and gracefully; Kubernetes waits a default grace period (~30s) before SIGKILL. 
* Hooks exist (postStart/preStop), but don’t put critical logic there without caution. 
  **Interview soundbite:** “Graceful shutdown is part of SLOs: readiness + SIGTERM handling go together.”

What is SIGTERM?
SIGTERM (Signal Terminate) is a signal sent to a process to request its termination. In Kubernetes:
When Kubernetes decides to stop a pod, it sends a SIGTERM signal to the main process (PID 1) in each container
It's a polite request to shut down, not a forceful kill
The process can catch this signal and perform cleanup operations

Best Practices
Set appropriate grace periods:
Too short → processes get killed abruptly
Too long → deployment updates are slow

Common patterns:

Rolling updates: New pods start, old pods get preStop hook to drain connections
Stateful applications: Save data during preStop, restore during postStart
Service mesh: Properly deregister from service mesh before termination

Debugging Tips
Check pod events for hook failures:
kubectl describe pod <pod-name>
kubectl logs <pod-name> --previous  # Check logs of terminated container
---

## 6) Service Discovery (ClusterIP, Headless, NodePort, LB, Ingress) — **network path clarity**

**Problem:** Pods are ephemeral; you need stable discovery for consumers.
**Solution:** Use the Service abstraction; common mechanisms summarized: ClusterIP, Headless, NodePort, LoadBalancer, Ingress, plus manual options. 
**Interview soundbite:** “When debugging ‘service broken’, I think: Service selector → Endpoints → Pod readiness.”

Headless = When a client sends a request to a headless Service, it will get back a list of all the Pods that this Service represents. (Good for statefulsets is of the uses)

Also endpoints, kube proxy, iptables along with its commands. Endpoints are directly works with readiness probes

---

## 7) Automated Placement (affinity, taints/tolerations) — **multi-tenant realism**

**Problem:** Default scheduling isn’t enough when you need topology, isolation, or hardware constraints.
**Solution:** Progress from simple to complex: nodeSelector → affinities → taints/tolerations → custom scheduler; keep intervention minimal.  
**Interview soundbite:** “Prefer labels + policies over pinning pods to nodes (nodeName is last resort).” 

nodeSelector - not recommended unless while troubleshooting something
node affinity - more expressive with In, Notin, Exists, DoesNotExit, Gt, Lt
Pod Affinity and Pod Anti-Affinity: Topology aware, if you want them to co-exist or spread apart
Taints and Tolerations - Only pods with exact tolerations can be scheduled to the tainted node

Taint effects:
NoSchedule: Don't schedule new pods
PreferNoSchedule: Try to avoid scheduling
NoExecute: Evict existing pods without toleration

custom scheduler: nginx
---

## 8) Pod Disruption Budget — **non-disruptive maintenance**

**Problem:** Drains/cluster scale-down can evict too many replicas at once and break quorum or capacity.
**Solution:** PDB limits voluntary evictions (e.g., drain/maintenance), ensuring minAvailable (or maxUnavailable). 
**Interview soundbite:** “PDB is how you make node maintenance ‘boring’ for critical services.” 

Manual: 
kubectl cordon <nodename> // kubectl drain <nodename> // kubectl uncordon <nodename>

$kubectl create pod-disruptionbudget --> minAvailable, maxAvailable, labels

ETCD Quorum = (n + 1) // 2  (integer division)
Or more precisely: floor(n/2) + 1
3-node cluster: Needs 2 of 3 to agree

---
---
---

## 9) Configuration Resource (ConfigMap/Secret) — **debugging config issues**

**Problem:** Env vars don’t scale well for complex config and become hard to track/override. 
**Solution:**

* Use **ConfigMap/Secret** as key-value stores; consume as **env vars** or **mounted files**. 
* Mounted-file configs **can update on ConfigMap change**, but env-var configs **won’t update without restart**. 
  **Interview soundbite:** “If config changed but pods didn’t, I ask: env var or mounted file?” 

  Config not updating in pods?
│
├─→ Is it consumed as ENV VAR?
│   └─→ Must restart pods (rollout restart)
│
├─→ Is it consumed as MOUNTED FILE?
│   ├─→ Does app watch for file changes?
│   │   ├─→ Yes → Should auto-update
│   │   └─→ No → Still needs restart
│   └─→ Check: mount path correct? subPath used?
│
└─→ Is Secret updated? (Secrets update slower - kubelet sync period) -> Kubelet syncs secrets every 60-360 seconds

---
---
---

## 10) Self Awareness (Downward API) — **production-grade “introspection”**

**Problem:** Apps sometimes need runtime facts (pod IP/name, namespace, resource limits) without hardcoding or calling the API. 
**Solution:** Downward API injects metadata via env vars/files; app stays Kubernetes-agnostic. 

"The Downward API gives pods their own metadata without coupling apps to Kubernetes APIs. It's like handing someone their ID card—they know who they are without having to ask the DMV. Your app just reads files or env vars, staying Kubernetes-agnostic while being cloud-native."

---
---
---

## 11) Configuration Template / Config size limits — **one-liner that impresses**

If you push giant configs into ConfigMaps/Secrets, you hit size/complexity limits; the book notes an **etcd-backed 1 MB limit** on total values and warns about embedding complex files. 
**Interview soundbite:** “For large configs, template at startup or use another pattern—don’t fight ConfigMap limits.”

etcd is NOT a file system!
├─→ Designed for consistent, fast key-value storage
├─→ Not for blob storage or large files
├─→ 1MB limit prevents performance degradation
└─→ Protects the entire cluster from one bad config

Need to store config?
│
├─→ < 1MB, static? → ConfigMap is perfect
│
├─→ > 1MB but can split? → Multiple ConfigMaps
│
├─→ > 1MB, dynamic, templates? → Template at startup
│
├─→ > 1MB, rarely changes? → Init container + object storage
│
└─→ > 1MB, changes often? → Dedicated config service

ConfigMaps are for configuration, not file storage. If you need more than 1MB, you're probably doing something that belongs in a different system. Template, split, or serve it from elsewhere. Use HELM for templating + monitor in CI/CD etc

---
---
---


## 12) Priority + Preemption (bonus: multi-tenant + safety)

Pod priority affects scheduling order and can preempt lower-priority pods when there isn’t capacity; this is powerful but risky and can violate assumptions like quorum. 
**Interview soundbite:** “Priority is an ops tool; misuse can cascade failures in shared clusters.”

---

### If you only memorize **one “smart insight”** for tomorrow (your HPA+VPA point)

> “HPA on CPU uses CPU% relative to requests, not limits. If VPA changes those requests while HPA is scaling on them, they can fight and ‘double scale’ because they’re not aware of each other.”  

If you want, paste **one interview scenario** you expect (e.g., “latency spike + HPA not scaling” or “rollout caused outage”), and I’ll map it to the exact patterns above in a tight answer outline.
