<img src="https://opea.dev/wp-content/uploads/sites/9/2024/04/opea-horizontal-color.svg" alt="OPEA Logo">

# Deploy and Learn ChatQnA using OPEA on Intel Tiber AI Cloud 
## Replace your LLM model from your deployment

# 📦 What Is a Helm Chart?

A **Helm chart** is a collection of YAML templates that describe Kubernetes resources. Helm makes it easy to package, configure, and deploy applications to Kubernetes clusters. It is the de facto standard for Kubernetes application deployment.

Helm allows developers to:
- Define application components using templates
- Configure deployments using values files (`values.yaml`)
- Deploy applications with a single command

---




## 📦 OPEA Helm Chart Structure (OPEA GenAIInfra)

HELM charts are part of the [GenAIInfra](https://github.com/opea-project/GenAIInfra) repository, which provides infrastructure templates to deploy multiple OPEA blueprints on Kubernetes.

For this example, we will be exploring ChatqnA blueprint. The `chatqna` chart defines the resources required to deploy an end-to-end RAG-based question-answering application using modular microservices (LLM backend, retriever service, data prep, etc.).

```text
helm-charts/
└── chatqna/
    ├── Chart.yaml                  # Chart metadata (e.g., name, version, description)
    ├── values.yaml                 # Default configuration values for the chart
    ├── templates/                  # Kubernetes resource templates
    │   ├── deployment.yaml         # Defines Pods, containers, and other resources for LLM, retriever, etc.
    │   ├── service.yaml            # Defines the networking setup and service exposure
    ├── cpu-values.yaml          # ✅ Optional config file for CPU-only environments
    └── ...                      # ✅ Optional config file for multiple configurations


## ⚙️ How to Customize a Helm Chart. 

### ⚙️ cpu-values.yaml — CPU-Optimized Configuration

The `cpu-values.yaml` file is a custom override used to configure the `chatqna` Helm chart to run efficiently on CPU-only infrastructure. This is particularly useful for developers working on local machines, CI/CD pipelines, or edge environments.

This file overrides default values found in `values.yaml`.

### 🔍 Key Overrides in `cpu-values.yaml`

```yaml
vllm:
  LLM_MODEL_ID: meta-llama/Meta-Llama-3-8B-Instruct
  # Uncomment for DeepSeek models:
  # VLLM_CPU_KVCACHE_SPACE: 40
  # resources:
  #   requests:
  #     memory: 60Gi
## Custom Configuration: `cpu-values.yaml`
```

The `cpu-values.yaml` file is **part of the Helm chart directory** itself ([`helm-charts/chatqna`](https://github.com/opea-project/GenAIInfra/tree/main/helm-charts/chatqna)), because it is meant to be a **user-provided override configuration**.

### Benefits of using specific configuration files.

- Keeps the chart generic and reusable.
- Allows multiple profiles like:
  - `cpu-values.yaml` – for CPU-based deployments.
  - `cpu-ollama-values.yaml` – for CPU-based deployments using Ollama as LLM.
  - `cpu-milvus-values.yaml` – for CPU-based deployments using Milvus as Vector DB.
- Makes upgrades and version tracking easier since your overrides are managed separately.

To deploy using `cpu-values.yaml` and to change `chatqna` application with a custom Hugging Face model (e.g., `deepseek-ai/DeepSeek-R1-Distill-Qwen-7B`), run the following command:

In [None]:
%cd /home/devcloud/GenAI-Workshops/GenAIExamples/ChatQnA/kubernetes/helm

!helm upgrade --install chatqna oci://ghcr.io/opea-project/charts/chatqna \
  --set global.HUGGINGFACEHUB_API_TOKEN="your_huggingface_token" \
  --set vllm.LLM_MODEL_ID=deepseek-ai/DeepSeek-R1-Distill-Qwen-7B \
  --set vllm.VLLM_CPU_KVCACHE_SPACE=20 \
  --set vllm.resources.requests.memory=60Gi \
  -f cpu-values.yaml

## 🚀 What Happens When You Run This Helm Command

This Helm command installs (or upgrades) the `chatqna` application using the OPEA Helm chart hosted in GitHub Container Registry. Here's a breakdown of what happens step-by-step:

### 1. 🧠 Loads the Helm Chart
The command pulls the `chatqna` Helm chart from: `oci://ghcr.io/opea-project/charts/chatqna`


This chart includes Kubernetes manifests for deploying all required components such as:
- LLM (vLLM-based model serving)
- Retriever (e.g. Redis, Qdrant)
- Frontend and backend services
- ConfigMaps, Services, Deployments

### 2. ⚙️ Applies Your CPU-Specific Overrides
It loads custom settings from your `cpu-values.yaml` file to optimize the deployment for CPU-based environments.

Then it overrides/adds more specific values using `--set`:

| Setting | Purpose |
|--------|---------|
| `vllm.LLM_MODEL_ID=deepseek-ai/DeepSeek-R1-Distill-Qwen-7B` | Sets the LLM model to use. This is a compact, efficient model ideal for CPU inference. |
| `vllm.VLLM_CPU_KVCACHE_SPACE=40` | Reserves 40GB for vLLM's KV cache (used to speed up attention mechanisms during inference). |
| `vllm.resources.requests.memory=60Gi` | Requests 60GB of RAM for the pod running vLLM. This ensures Kubernetes schedules it with enough memory. |

### 3. 🚢 Installs or Updates the Release
Helm will:
- Install the chart as a new release named `chatqna` (if not installed yet)
- Or **upgrade** the existing `chatqna` release with the new settings

### 4. 📡 Starts the Pods
Kubernetes will:
- Pull the necessary Docker images (e.g. vLLM, retriever, UI)
- Allocate CPU and memory as specified
- Launch the containers
- Monitor readiness/liveness probes

---

## ✅ Result: ChatQnA is Live

Once complete, you'll have the `chatqna` service running on your cluster with:

- A **DeepSeek-based LLM** running via vLLM on CPU
- A retriever connected to your vector database (Redis/Qdrant)
- Optional frontend and backend endpoints available via service

You can now access the application (e.g., via port-forward) or query the model via API.

---

## 🔍 Useful Follow-up Commands

Check pod status:
```bash
kubectl get pods
```
## ✅ Verifying That the Correct Model Is Running

After deploying your Helm chart, you’ll want to ensure that the expected model is actually being served by the application. Here's how to confirm this:

### 1. 📄 Check the Logs using `k9s` in your console
