
---

# 🌐 **Services · Ingress · gRPC**

---

## 🛠 Service types

* **📦 ClusterIP** → in-cluster only (default).
* **🌍 LoadBalancer** → cloud external IP.
* **🔌 NodePort** → dev-only, not for prod.
* **🚪 Ingress** → HTTP(S) routing + TLS (needs controller).

---

## 🧪 Minimal Service (local test)

```yaml
apiVersion: v1
kind: Service
metadata: { name: vllm-svc, namespace: llm }
spec:
  selector: { app: vllm }
  ports: [{ port: 80, targetPort: 8000 }]
  type: ClusterIP
```

👉 Test:

```bash
kubectl -n llm port-forward svc/vllm-svc 8080:80
curl http://localhost:8080/health
```

---

## ☁️ Public exposure

* **LoadBalancer** → simple external IP in cloud.
* **Ingress** → route by hostname + TLS.

```yaml
nginx.ingress.kubernetes.io/proxy-read-timeout: "3600"   # long gens
```

---

## 🔗 gRPC (for vLLM/TGI)

* Needs **HTTP/2 end-to-end**.
* In NGINX Ingress:

```yaml
nginx.ingress.kubernetes.io/backend-protocol: "GRPC"
```

👉 Test:

```bash
grpcurl -plaintext llm.example.com:443 list
```

---

## ⚡ LLM specifics

* **Streaming** → set **long timeouts**.
* **CORS** → enable in gateway (or Ingress annotations).

---

## 🔍 Debug quick

```bash
kubectl -n llm get svc,ep
kubectl -n llm describe svc vllm-svc
kubectl -n llm run -it netcheck --image=curlimages/curl --rm -- curl vllm-svc.llm.svc.cluster.local/health
kubectl -n llm logs deploy/vllm --tail=50 -f
```

---

✅ **Rule of thumb:** Start with **ClusterIP + port-forward** → use **LoadBalancer/Ingress** in cloud → add **gRPC** only if needed.

---

