
---

# ☸️ **11. Docker with Kubernetes (K8s)**

---

## 🧠 Mental Model — Pod vs Container

| 🧩          | Docker Container | K8s Pod                       |
| ----------- | ---------------- | ----------------------------- |
| **Unit**    | One container    | 1+ containers (smallest unit) |
| **IP**      | Own IP           | Shared IP in Pod              |
| **Storage** | Per container    | Shared volumes                |
| **Usage**   | Standalone       | Managed by **controllers**    |

🔑 Deploy **Pods via Deployments**, not raw.

---

## 🔄 Build → Push → Deploy

```bash
docker build -t repo/myapp:v1 .
docker push repo/myapp:v1
```

K8s YAML:

```yaml
containers:
- name: myapp
  image: repo/myapp:v1
```

✅ Use **immutable digests**, `imagePullSecrets` for private registries.

---

## 🏗️ Core Blocks (LLM mapping)

* **Deployment** → stateless API (LLM gateway).
* **StatefulSet** → vector DB (Milvus/Qdrant).
* **DaemonSet** → node-wide agents (metrics).
* **Job/CronJob** → batch embedding / retrain.
* **Service** → stable IP/DNS.
* **Ingress** → HTTPS routes.
* **PVC** → model weights / HF cache.
* **ConfigMap/Secret** → env + API keys.
* **HPA/KEDA** → autoscale (CPU/RAM/requests).

---

## 🌐 Networking (cheat sheet)

* **ClusterIP** → default internal.
* **NodePort** → dev only.
* **LoadBalancer** → cloud external IP.
* **Ingress** → domain + TLS.

🔎 DNS: `svc.ns.svc.cluster.local`

---

## 💾 Storage (quick)

* **RWO PVC** → one pod writes.
* **RWX PVC** → share across pods (better for models).
* Never write inside container FS → mount volumes.

---

## 🔐 Config & Secrets

```yaml
envFrom:
- configMapRef: { name: app-cfg }
- secretRef:    { name: app-secret }
```

Keep tokens in **Secrets**, configs in **ConfigMaps**.

---

## ❤️ Probes & Lifecycle

* **readinessProbe** → traffic only when ready
* **livenessProbe** → restart if stuck
* **startupProbe** → grace for slow model loads
* **terminationGracePeriodSeconds** → drain gracefully

---

## 📊 Resources & Scaling

```yaml
resources:
  requests: { cpu: "200m", memory: "256Mi" }
  limits:   { cpu: "500m", memory: "512Mi" }
```

* **HPA** → autoscale CPU/mem
* **KEDA** → scale on tokens/sec, queue length
* **PDB** → keep N pods during updates

---

## 🔒 Security Basics

```yaml
securityContext:
  runAsNonRoot: true
  allowPrivilegeEscalation: false
  capabilities: { drop: ["ALL"] }
```

* Add **RBAC** & **NetworkPolicies**
* Pin image digests, avoid `:latest`

---

## 🧩 Minimal Prod Stack

* Namespace + ConfigMap + Secret
* Deployment (2 replicas)
* ClusterIP Service
* Probes + resources + securityContext
* Optional Ingress for HTTPS

---

## ⚡ Quick Ops

```bash
kubectl get pods,svc,deploy -n app
kubectl logs -f deploy/myapp -n app
kubectl exec -it deploy/myapp -n app -- sh
kubectl port-forward svc/myapp-svc 8080:80 -n app
kubectl rollout status deploy/myapp -n app
```

---

## 🔁 Compose → K8s Map

| Compose      | Kubernetes                 |
| ------------ | -------------------------- |
| `services`   | Deployment + Service       |
| `volumes`    | PV + PVC                   |
| `depends_on` | initContainers + readiness |
| `.env`       | ConfigMap/Secret           |
| `networks`   | Service + DNS              |

---

👉 That’s the **K8s essentials for LLM/GenAI apps**: Pods = inference unit, Deployments scale them, Services/Ingress expose them, PVCs hold models, Config/Secrets wire tokens, Probes keep them healthy.

---
