opea-project · joshuayao · Nov 5, 2025 · Nov 3, 2025 · Nov 4, 2025 · Nov 4, 2025
@@ -106,19 +106,58 @@ flowchart LR
 
 This CodeGen example can be deployed manually on various hardware platforms using Docker Compose or Kubernetes. Select the appropriate guide based on your target environment:
 
-| Hardware        | Deployment Mode      | Guide Link                                                               |
-| :-------------- | :------------------- | :----------------------------------------------------------------------- |
-| Intel Xeon CPU  | Single Node (Docker) | [Xeon Docker Compose Guide](./docker_compose/intel/cpu/xeon/README.md)   |
-| Intel Gaudi HPU | Single Node (Docker) | [Gaudi Docker Compose Guide](./docker_compose/intel/hpu/gaudi/README.md) |
-| AMD EPYC CPU    | Single Node (Docker) | [EPYC Docker Compose Guide](./docker_compose/amd/cpu/epyc/README.md)     |
-| AMD ROCm GPU    | Single Node (Docker) | [ROCm Docker Compose Guide](./docker_compose/amd/gpu/rocm/README.md)     |
-| Intel Xeon CPU  | Kubernetes (Helm)    | [Kubernetes Helm Guide](./kubernetes/helm/README.md)                     |
-| Intel Gaudi HPU | Kubernetes (Helm)    | [Kubernetes Helm Guide](./kubernetes/helm/README.md)                     |
-| Intel Xeon CPU  | Kubernetes (GMC)     | [Kubernetes GMC Guide](./kubernetes/gmc/README.md)                       |
-| Intel Gaudi HPU | Kubernetes (GMC)     | [Kubernetes GMC Guide](./kubernetes/gmc/README.md)                       |
+| Hardware        | Deployment Mode                      | Guide Link                                                                               |
+| :-------------- | :----------------------------------- | :--------------------------------------------------------------------------------------- |
+| Intel Xeon CPU  | Single Node (Docker)                 | [Xeon Docker Compose Guide](./docker_compose/intel/cpu/xeon/README.md)                   |
+| Intel Xeon CPU  | Single Node (Docker) with Monitoring | [Xeon Docker Compose with Monitoring Guide](./docker_compose/intel/cpu/xeon/README.md)   |
+| Intel Gaudi HPU | Single Node (Docker)                 | [Gaudi Docker Compose Guide](./docker_compose/intel/hpu/gaudi/README.md)                 |
+| Intel Gaudi HPU | Single Node (Docker) with Monitoring | [Gaudi Docker Compose with Monitoring Guide](./docker_compose/intel/hpu/gaudi/README.md) |
+| AMD EPYC CPU    | Single Node (Docker)                 | [EPYC Docker Compose Guide](./docker_compose/amd/cpu/epyc/README.md)                     |
+| AMD ROCm GPU    | Single Node (Docker)                 | [ROCm Docker Compose Guide](./docker_compose/amd/gpu/rocm/README.md)                     |
+| Intel Xeon CPU  | Kubernetes (Helm)                    | [Kubernetes Helm Guide](./kubernetes/helm/README.md)                                     |
+| Intel Gaudi HPU | Kubernetes (Helm)                    | [Kubernetes Helm Guide](./kubernetes/helm/README.md)                                     |
+| Intel Xeon CPU  | Kubernetes (GMC)                     | [Kubernetes GMC Guide](./kubernetes/gmc/README.md)                                       |
+| Intel Gaudi HPU | Kubernetes (GMC)                     | [Kubernetes GMC Guide](./kubernetes/gmc/README.md)                                       |
 
 _Note: Building custom microservice images can be done using the resources in [GenAIComps](https://github.com/opea-project/GenAIComps)._
 
+## Monitoring
+
+The CodeGen example supports monitoring capabilities for Intel Xeon and Intel Gaudi platforms. Monitoring includes:
+
+- **Prometheus**: For metrics collection and querying
+- **Grafana**: For visualization and dashboards
+- **Node Exporter**: For system metrics collection
+
+### Monitoring Features
+
+- Real-time metrics collection from all CodeGen microservices
+- Pre-configured dashboards for:
+  - vLLM/TGI performance metrics
+  - CodeGen MegaService metrics
+  - System resource utilization
+  - Node-level metrics
+
+### Enabling Monitoring
+
+Monitoring can be enabled by using the `compose.monitoring.yaml` file along with the main compose file:
+
+```bash
+# For Intel Xeon
+docker compose -f compose.yaml -f compose.monitoring.yaml up -d
+
+# For Intel Gaudi
+docker compose -f compose.yaml -f compose.monitoring.yaml up -d
+```
+
+### Accessing Monitoring Services
+
+Once deployed with monitoring, you can access:
+
+- **Prometheus**: `http://${HOST_IP}:9090`
+- **Grafana**: `http://${HOST_IP}:3000` (username: `admin`, password: `admin`)
+- **Node Exporter**: `http://${HOST_IP}:9100`
+
 ## Benchmarking
 
 Guides for evaluating the performance and accuracy of this CodeGen deployment are available:

@@ -49,7 +49,8 @@ This uses the default vLLM-based deployment using `compose.yaml`.
     # export https_proxy="your_https_proxy"
     # export no_proxy="localhost,127.0.0.1,${HOST_IP}" # Add other hosts if necessary
     source intel/set_env.sh
-    cd /intel/cpu/xeon
+    cd intel/cpu/xeon
+    bash grafana/dashboards/download_opea_dashboard.sh
     ```
 
     _Note: The compose file might read additional variables from set_env.sh. Ensure all required variables like ports (`LLM_SERVICE_PORT`, `MEGA_SERVICE_PORT`, etc.) are set if not using defaults from the compose file._
@@ -146,7 +147,7 @@ Key parameters are configured via environment variables set before running `dock
 Most of these parameters are in `set_env.sh`, you can either modify this file or overwrite the env variables by setting them.
 
 ```shell
-source CodeGen/docker_compose/set_env.sh
+source CodeGen/docker_compose/intel/set_env.sh
 ```
 
 #### Compose Files
@@ -252,7 +253,63 @@ Users can interact with the backend service using the `Neural Copilot` VS Code e
 - **"Container name is in use"**: Stop existing containers (`docker compose down`) or change `container_name` in the compose file.
 - **Resource Issues:** CodeGen models can be memory-intensive. Monitor host RAM usage. Increase Docker resources if needed.
 
-## Stopping the Application
+## Monitoring Deployment
+
+To enable monitoring for the CodeGen application, you can use the monitoring Docker Compose file along with the main deployment.
+
+### Option #1: Default Deployment (without monitoring)
+
+To deploy the CodeGen services without monitoring, execute:
+
+```bash
+docker compose up -d
+```
+
+### Option #2: Deployment with Monitoring
+
+> NOTE: To enable monitoring, `compose.monitoring.yaml` file need to be merged along with default `compose.yaml` file.
+
+To deploy with monitoring:
+
+```bash
+bash grafana/dashboards/download_opea_dashboard.sh
+docker compose -f compose.yaml -f compose.monitoring.yaml up -d
+```
+
+### Accessing Monitoring Services
+
+Once deployed with monitoring, you can access:
+
+- **Prometheus**: `http://${HOST_IP}:9090`
+- **Grafana**: `http://${HOST_IP}:3000` (username: `admin`, password: `admin`)
+- **Node Exporter**: `http://${HOST_IP}:9100`
+
+### Monitoring Components
+
+The monitoring stack includes:
+
+- **Prometheus**: For metrics collection and querying
+- **Grafana**: For visualization and dashboards
+- **Node Exporter**: For system metrics collection
+
+### Monitoring Dashboards
+
+The following dashboards are automatically downloaded and configured:
+
+- vLLM Dashboard
+- TGI Dashboard
+- CodeGen MegaService Dashboard
+- Node Exporter Dashboard
+
+### Stopping the Application
+
+If monitoring is enabled, execute the following command:
+
+```bash
+docker compose -f compose.yaml -f compose.monitoring.yaml down
+```
+
+If monitoring is not enabled, execute:
 
 ```bash
 docker compose down  # for vLLM (compose.yaml)

@@ -0,0 +1,58 @@
+# Copyright (C) 2024 Intel Corporation
+# SPDX-License-Identifier: Apache-2.0
+
+services:
+  prometheus:
+    image: prom/prometheus:v2.52.0
+    container_name: opea_prometheus
+    user: root
+    volumes:
+      - ./prometheus.yaml:/etc/prometheus/prometheus.yaml
+      - ./prometheus_data:/prometheus
+    command:
+      - '--config.file=/etc/prometheus/prometheus.yaml'
+    ports:
+      - '9090:9090'
+    ipc: host
+    restart: unless-stopped
+
+  grafana:
+    image: grafana/grafana:11.0.0
+    container_name: grafana
+    volumes:
+      - ./grafana_data:/var/lib/grafana
+      - ./grafana/dashboards:/var/lib/grafana/dashboards
+      - ./grafana/provisioning:/etc/grafana/provisioning
+    user: root
+    environment:
+      GF_SECURITY_ADMIN_PASSWORD: admin
+      GF_RENDERING_CALLBACK_URL: http://grafana:3000/
+      GF_LOG_FILTERS: rendering:debug
+      no_proxy: ${no_proxy}
+      host_ip: ${host_ip}
+    depends_on:
+      - prometheus
+    ports:
+      - '3000:3000'
+    ipc: host
+    restart: unless-stopped
+
+  node-exporter:
+    image: prom/node-exporter
+    container_name: node-exporter
+    volumes:
+      - /proc:/host/proc:ro
+      - /sys:/host/sys:ro
+      - /:/rootfs:ro
+    command:
+      - '--path.procfs=/host/proc'
+      - '--path.sysfs=/host/sys'
+      - --collector.filesystem.ignored-mount-points
+      - "^/(sys|proc|dev|host|etc|rootfs/var/lib/docker/containers|rootfs/var/lib/docker/overlay2|rootfs/run/docker/netns|rootfs/var/lib/docker/aufs)($$|/)"
+    environment:
+      no_proxy: ${no_proxy}
+    ports:
+      - 9100:9100
+    restart: always
+    deploy:
+      mode: global
@@ -0,0 +1,13 @@
+#!/bin/bash
+# Copyright (C) 2025 Intel Corporation
+# SPDX-License-Identifier: Apache-2.0
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+cd "$SCRIPT_DIR"
+if ls *.json 1> /dev/null 2>&1; then
+    rm *.json
+fi
+
+wget https://raw.githubusercontent.com/opea-project/GenAIEval/refs/heads/main/evals/benchmark/grafana/vllm_grafana.json
+wget https://raw.githubusercontent.com/opea-project/GenAIEval/refs/heads/main/evals/benchmark/grafana/tgi_grafana.json
+wget https://raw.githubusercontent.com/opea-project/GenAIEval/refs/heads/main/evals/benchmark/grafana/codegen_megaservice_grafana.json
+wget https://raw.githubusercontent.com/opea-project/GenAIEval/refs/heads/main/evals/benchmark/grafana/node_grafana.json
@@ -0,0 +1,14 @@
+# Copyright (C) 2025 Intel Corporation
+# SPDX-License-Identifier: Apache-2.0
+
+apiVersion: 1
+
+providers:
+- name: 'default'
+  orgId: 1
+  folder: ''
+  type: file
+  disableDeletion: false
+  updateIntervalSeconds: 10 #how often Grafana will scan for changed dashboards
+  options:
+    path: /var/lib/grafana/dashboards
@@ -0,0 +1,54 @@
+# Copyright (C) 2025 Intel Corporation
+# SPDX-License-Identifier: Apache-2.0
+
+# config file version
+apiVersion: 1
+
+# list of datasources that should be deleted from the database
+deleteDatasources:
+  - name: Prometheus
+    orgId: 1
+
+# list of datasources to insert/update depending
+# what's available in the database
+datasources:
+  # <string, required> name of the datasource. Required
+- name: Prometheus
+  # <string, required> datasource type. Required
+  type: prometheus
+  # <string, required> access mode. direct or proxy. Required
+  access: proxy
+  # <int> org id. will default to orgId 1 if not specified
+  orgId: 1
+  # <string> url
+  url: http://$host_ip:9090
+  # <string> database password, if used
+  password:
+  # <string> database user, if used
+  user:
+  # <string> database name, if used
+  database:
+  # <bool> enable/disable basic auth
+  basicAuth: false
+  # <string> basic auth username, if used
+  basicAuthUser:
+  # <string> basic auth password, if used
+  basicAuthPassword:
+  # <bool> enable/disable with credentials headers
+  withCredentials:
+  # <bool> mark as default datasource. Max one per org
+  isDefault: true
+  # <map> fields that will be converted to json and stored in json_data
+  jsonData:
+     httpMethod: GET
+     graphiteVersion: "1.1"
+     tlsAuth: false
+     tlsAuthWithCACert: false
+  # <string> json object of data that will be encrypted.
+  secureJsonData:
+    tlsCACert: "..."
+    tlsClientCert: "..."
+    tlsClientKey: "..."
+  version: 1
+  # <bool> allow users to edit datasources from the UI.
+  editable: true
@@ -0,0 +1,27 @@
+# Copyright (C) 2025 Intel Corporation
+# SPDX-License-Identifier: Apache-2.0
+# [IP_ADDR]:{PORT_OUTSIDE_CONTAINER} -> {PORT_INSIDE_CONTAINER} / {PROTOCOL}
+global:
+  scrape_interval: 5s
+  external_labels:
+    monitor: "my-monitor"
+scrape_configs:
+  - job_name: "prometheus"
+    static_configs:
+      - targets: ["opea_prometheus:9090"]
+  - job_name: "vllm"
+    metrics_path: /metrics
+    static_configs:
+      - targets: ["vllm-server:80"]
+  - job_name: "tgi"
+    metrics_path: /metrics
+    static_configs:
+      - targets: [ "tgi-service:80" ]
+  - job_name: "codegen-backend-server"
+    metrics_path: /metrics
+    static_configs:
+      - targets: ["codegen-xeon-backend-server:7778"]
+  - job_name: "prometheus-node-exporter"
+    metrics_path: /metrics
+    static_configs:
+      - targets: ["node-exporter:9100"]
@@ -49,7 +49,10 @@ This uses the default vLLM-based deployment using `compose.yaml`.
     # export https_proxy="your_https_proxy"
     # export no_proxy="localhost,127.0.0.1,${HOST_IP}" # Add other hosts if necessary
     source intel/set_env.sh
-    cd /intel/hpu/gaudi
+    cd intel/hpu/gaudi
+    cd grafana/dashboards
+    bash download_opea_dashboard.sh
+    cd ../..
     ```
 
     _Note: The compose file might read additional variables from set_env.sh. Ensure all required variables like ports (`LLM_SERVICE_PORT`, `MEGA_SERVICE_PORT`, etc.) are set if not using defaults from the compose file._
@@ -228,7 +231,62 @@ Use the `Neural Copilot` extension configured with the CodeGen backend URL: `htt
 - **Model Download Issues:** Check `HF_TOKEN`, internet access, proxy settings. Check LLM service logs.
 - **Connection Errors:** Verify `HOST_IP`, ports, and proxy settings. Use `docker ps` and check service logs.
 
-## Stopping the Application
+## Monitoring Deployment
+
+To enable monitoring for the CodeGen application on Gaudi, you can use the monitoring Docker Compose file along with the main deployment.
+
+### Option #1: Default Deployment (without monitoring)
+
+To deploy the CodeGen services without monitoring, execute:
+
+```bash
+docker compose up -d
+```
+
+### Option #2: Deployment with Monitoring
+
+> NOTE: To enable monitoring, `compose.monitoring.yaml` file need to be merged along with default `compose.yaml` file.
+
+To deploy with monitoring:
+
+```bash
+docker compose -f compose.yaml -f compose.monitoring.yaml up -d
+```
+
+### Accessing Monitoring Services
+
+Once deployed with monitoring, you can access:
+
+- **Prometheus**: `http://${HOST_IP}:9090`
+- **Grafana**: `http://${HOST_IP}:3000` (username: `admin`, password: `admin`)
+- **Node Exporter**: `http://${HOST_IP}:9100`
+
+### Monitoring Components
+
+The monitoring stack includes:
+
+- **Prometheus**: For metrics collection and querying
+- **Grafana**: For visualization and dashboards
+- **Node Exporter**: For system metrics collection
+
+### Monitoring Dashboards
+
+The following dashboards are automatically downloaded and configured:
+
+- vLLM Dashboard
+- TGI Dashboard
+- CodeGen MegaService Dashboard
+- Node Exporter Dashboard
+
+### Stopping the Application
+
+If monitoring is enabled, execute the following command:
+
+```bash
+docker compose -f compose.yaml -f compose.monitoring.yaml down
+```
+
+If monitoring is not enabled, execute:
 
 ```bash
 docker compose down  # for vLLM (compose.yaml)