Skip to content

Latest commit

 

History

History
144 lines (118 loc) · 4.68 KB

monitoring.md

File metadata and controls

144 lines (118 loc) · 4.68 KB

Table of Contents

Created by gh-md-toc

部署 prometheus 和 grafana 监控 Fluid 应用

注:prometheus 需要In-Cluster部署

1. 部署或配置 Prometheus

如果你的集群中没有Prometheus,请按照安装指南来正确地在你的生产环境中设置Prometheus。

如集群内有 prometheus,可将以下配置写到 prometheus 配置文件中:

scrape_configs:
  - job_name: 'alluxio runtime'
    metrics_path: /metrics/prometheus
    kubernetes_sd_configs:
      - role: endpoints
    relabel_configs:
    - source_labels: [__meta_kubernetes_service_label_monitor]
      regex: alluxio_runtime_metrics
      action: keep
    - source_labels: [__meta_kubernetes_endpoint_port_name]
      regex: web
      action: keep
    - source_labels: [__meta_kubernetes_namespace]
      target_label: namespace
      replacement: $1
      action: replace
    - source_labels: [__meta_kubernetes_service_label_release]
      target_label: fluid_runtime
      replacement: $1
      action: replace
    - source_labels: [__meta_kubernetes_endpoint_address_target_name]
      target_label: pod
      replacement: $1
      action: replace

2. 部署 grafana

# docker 部署
$ docker run -d \
  -p 3000:3000 \
  --name=grafana \
  --restart=always \
  --name grafana \
  grafana/grafana

如果在Kubernetes中部署,可以参考文档

3. 配置 grafana

  1. 登录 grafana 如果以docker 方式部署,访问 http://$grafana-node-ip:3000;以 In-CLuster 方式部署,访问http://$grafana-node-ip:NodePort,默认账号密码 admin:admin:
# 查看 NodePort
$ kubectl describe svc monitoring-grafana -n kube-system
  1. 首先查看 prometheus svc 端口
$ kubectl get svc -n kube-system | grep prometheus-svc
prometheus-svc             NodePort    10.100.0.144   <none>        9090:31245/TCP           22h
$ kubectl describe svc prometheus-svc -n kube-system
Name:                     prometheus-svc
Namespace:                kube-system
Labels:                   kubernetes.io/name=Prometheus
                          name=prometheus-svc
Annotations:              kubectl.kubernetes.io/last-applied-configuration:
                            {"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"labels":{"kubernetes.io/name":"Prometheus","name":"prometheus-svc"},"nam...
Selector:                 app=prometheus
Type:                     NodePort
IP:                       10.100.0.144
Port:                     prometheus  9090/TCP
TargetPort:               9090/TCP
NodePort:                 prometheus  31245/TCP
Endpoints:                10.99.224.138:9090
Session Affinity:         None
External Traffic Policy:  Cluster
Events:                   <none>
  1. 配置 prometheus data source

注: 如果 grafana In-Cluster 部署, URL 填写 Service Endpoints 即可;如果以 docker 方式部署,URL 填写prometheus 部署节点 ip:NodePort 即可 导入完成后点击Save & Test 显示 Data source is working 即可

  1. 导入模板文件 grafana 选择导入模板 Json 文件 fluid-prometheus-grafana-monitor.json, 它的位置是integration/prometheus/fluid-prometheus-grafana-monitor.json

  2. 启动 fluid 任务

$ cat<<EOF >dataset.yaml
apiVersion: data.fluid.io/v1alpha1
kind: Dataset
metadata:
  name: spark
spec:
  mounts:
    - mountPoint: https://mirrors.bit.edu.cn/apache/spark/
      name: spark
---
apiVersion: data.fluid.io/v1alpha1
kind: AlluxioRuntime
metadata:
  name: spark
spec:
  replicas: 2
  tieredstore:
    levels:
      - mediumtype: MEM
        path: /dev/shm
        quota: 1Gi
        high: "0.95"
        low: "0.7"
  # 默认在v0.5.0版本之后,alluxio runtime已经开启了Prometheus数据,如果需要关闭可以主动设置disablePrometheus: true
  # disablePrometheus: false  
EOF

注意:默认Prometheus是开启的。如果需要关闭Prometheus,可以设置 disablePrometheus: true, 默认为 false

  1. 查看监控 在 grafana HOME 中知道名为Fluid-Prometheus-Grafana-Monitor视图即可,如下所示:

注:User of runtime 对应Fluid Alluxio runtime user; fluid_runtime 对应Fluid runtime name; namespace 对应Fluid runtime namespace