
---

# ⚙️ **ConfigMap · Secret · GPU/CPU/RAM**

---

## 🗂️ Config (non-secret) → ConfigMap

* Store plain configs (env vars, timeouts, limits).

```yaml
data:
  MAX_TOKENS: "512"
  GATEWAY_TIMEOUT: "60"
```

---

## 🔑 Secrets (tokens/keys) → Secret

* Store sensitive values (API keys, HF tokens).

```yaml
stringData:
  HF_TOKEN: "xxx"
  API_KEY: "yyy"
```

👉 Mount as **env** or **files**.
👉 For private images → use `imagePullSecrets`.

---

## 📦 Use in Deployment

```yaml
envFrom:
  - configMapRef: { name: app-cfg }
  - secretRef:    { name: app-secret }
volumeMounts:
  - { name: secret-files, mountPath: /var/run/secret, readOnly: true }
```

---

## 📊 Resources (requests/limits)

* Define **baseline + max usage** per pod.

```yaml
resources:
  requests: { cpu: "2", memory: "16Gi", nvidia.com/gpu: 1 }
  limits:   { cpu: "4", memory: "20Gi", nvidia.com/gpu: 1 }
```

👉 GPU requires **requests == limits**.

---

## 🎮 GPU node scheduling

```yaml
tolerations:
  - key: "nvidia.com/gpu" 
    operator: "Exists"
    effect: "NoSchedule"
nodeSelector:
  accelerator: nvidia
```

---

## 🛡️ Security basics

```yaml
securityContext:
  runAsNonRoot: true
  allowPrivilegeEscalation: false
  capabilities: { drop: ["ALL"] }
```

---

## 📐 Sizing hints (LLM pods)

* **CPU gateway** → `0.5–2 CPU`, `512Mi–2Gi RAM`
* **GPU inference** → `2–4 CPU`, `16–32Gi RAM`, `1 GPU`
* Tune throughput via → vLLM `--max-num-seqs`, TGI `--max-batch-size`.

---

## 🔍 Quick ops

```bash
kubectl -n llm get cm,secret
kubectl -n llm describe deploy vllm
kubectl -n llm get pods -o=custom-columns=NAME:.metadata.name,CPU:.spec.containers[*].resources.requests.cpu,MEM:.spec.containers[*].resources.requests.memory
```

---

✅ Rule: **Configs → ConfigMap**, **Secrets → Secret**, **GPU via nvidia.com/gpu**, always add **securityContext**.

---
