Skip to content

Commit

Permalink
update docs for new plugins
Browse files Browse the repository at this point in the history
  • Loading branch information
qmhu committed Jun 13, 2023
1 parent ba75922 commit 94f37f4
Show file tree
Hide file tree
Showing 7 changed files with 325 additions and 14 deletions.
37 changes: 37 additions & 0 deletions site/content/en/docs/Tutorials/Recommendation/pv-recommendation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
---
title: "PV 推荐"
description: "PV 推荐功能介绍"
weight: 15
---

Service 推荐通过扫描集群中 Service 的运行状况,帮助用户找到闲置的 Kubernetes Service。

## 动机

通常在 Kubernetes 中我们会使用 Service + Workload 来自动创建和管理负载均衡并 将负载均衡挂载到应用上,在日常的运营中难免会出现空闲和低利用率的负载均衡,浪费了大量成本,Service 推荐尝试帮助用户找到这部分 Service 来实现成本优化。

## 推荐示例

```yaml

```

在该示例中:

- 推荐的 TargetRef 指向了 PV:
- 推荐类型为 PV 推荐
- action 是 Delete,这里只是给出建议

## 实现原理

PV 推荐按以下步骤完成一次推荐过程:

1. 扫描集群中所有 PV,找到 PV 对应的 Pod 列表
2. 如果 PV 没有对应的 PVC,则判断为闲置 PV
3. 如果没有 Pod 关联这个 PV 和 PVC,则判断为闲置 PVC

## 参数配置

目前 PV 推荐没有参数配置。

如何更新推荐的配置请参考:[**推荐框架**](/zh-cn/docs/tutorials/recommendation/recommendation-framework)
Original file line number Diff line number Diff line change
Expand Up @@ -128,6 +128,8 @@ Currently, Crane support these Recommenders:
- [**Replicas Recommendation**](/docs/tutorials/recommendation/replicas-recommendation): Use the HPA algorithm to analyze the actual usage of applications and recommend more appropriate replicas configurations.
- [**HPA Recommendation**](/docs/tutorials/recommendation/hpa-recommendation): Scan the Workload in a cluster and recommend HPA configurations for Workload that are suitable for horizontal autoscaling
- [**IdleNode Recommendation**](/docs/tutorials/recommendation/idlenode-recommendation): Find the idle nodes in cluster
- [**Service Recommendation**](/zh-cn/docs/tutorials/recommendation/service-recommendation): Find the idle load balancer service in cluster
- [**PV Recommendation**](/zh-cn/docs/tutorials/recommendation/pv-recommendation): Find the idle persist volume in cluster

### Recommender Framework

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
---
title: "闲置节点推荐"
description: "闲置节点推荐功能介绍"
weight: 15
---

Service 推荐通过扫描集群中 Service 的运行状况,帮助用户找到闲置的 Kubernetes Service。

## 动机

通常在 Kubernetes 中我们会使用 Service + Workload 来自动创建和管理负载均衡并将负载均衡挂载到应用上,在日常的运营中难免会出现空闲和低利用率的负载均衡,浪费了大量成本,Service 推荐尝试帮助用户找到这部分 Service 来实现成本优化。

## 推荐示例

```yaml
apiVersion: analysis.crane.io/v1alpha1
kind: Recommendation
metadata:
annotations:
analysis.crane.io/last-start-time: "2023-06-12 11:52:23"
analysis.crane.io/message: Success
analysis.crane.io/run-number: "7823"
creationTimestamp: "2023-06-12T09:44:23Z"
labels:
analysis.crane.io/recommendation-rule-name: service-rule
analysis.crane.io/recommendation-rule-recommender: Service
analysis.crane.io/recommendation-rule-uid: 67807cd9-b4c9-4d63-8493-d330ccace364
analysis.crane.io/recommendation-target-kind: Service
analysis.crane.io/recommendation-target-name: nginx
analysis.crane.io/recommendation-target-namespace: crane-system
analysis.crane.io/recommendation-target-version: v1
name: service-rule-service-cnwt5
namespace: crane-system
ownerReferences:
- apiVersion: analysis.crane.io/v1alpha1
blockOwnerDeletion: false
controller: false
kind: RecommendationRule
name: service-rule
uid: 67807cd9-b4c9-4d63-8493-d330ccace364
spec:
adoptionType: StatusAndAnnotation
completionStrategy:
completionStrategyType: Once
targetRef:
apiVersion: v1
kind: Service
name: nginx
namespace: crane-system
type: Service
status:
action: Delete
description: It is a Orphan Service, Pod count is 0
lastUpdateTime: "2023-06-12T11:52:23Z"
```

在该示例中:

- 推荐的 TargetRef 指向了 Service:nginx
- 推荐类型为 Service 推荐
- action 是 Delete,这里只是给出建议

## 实现原理

Service 推荐按以下步骤完成一次推荐过程:

1. 扫描集群中所有 LoadBalancer 类型的 Service
2. 如果 Service 对应的 endpoints 中有 Address 或者 NotReadyAddresses,则不是限制的 Service
3. 依据 Service 推荐中流量相关 metric 检测 Service 是否小于阈值水位,如果小于水位则判定为闲置节点

## 如何验证推荐结果的准确性

以下是判断节点资源阈值水位的 Prom query,验证时把 node 替换成实际的节点名

```go
// Container network cumulative count of bytes received
queryFmtNetReceiveBytes = `sum(rate(container_network_receive_bytes_total{namespace="%s",pod=~"%s",container!=""}[3m]))`
// Container network cumulative count of bytes transmitted
queryFmtNetTransferBytes = `sum(rate(container_network_transmit_bytes_total{namespace="%s",pod=~"%s",container!=""}[3m]))`
```

## 支持的资源类型

只支持 Service 类型,目前只会对 LoadBalancer 类型的 Service 进行分析。

## 参数配置

| 配置项 | 默认值 | 描述 |
|----------|-----|---------------------------------|
| net-receive-bytes | 0 | Service 对应 Pods 接受到的网络请求 bytes,默认不检查 |
| net-receive-percentile | 0.99 | 计算接受到的网络请求时的 Percentile |
| net-transfer-bytes | 0 | Service 对应 Pods 传输的网络请求 bytes,默认不检查 |
| net-transfer-percentile | 0.99 | 计算传输的网络请求时的 Percentile |

注意,当 pod 配置了 liveness/readness probe 后,kubelet 的探测会带来一定的容器流量,因此流量的阈值需要设置的稍微大一些,可结合具体监控数据配置。

如何更新推荐的配置请参考:[**推荐框架**](/zh-cn/docs/tutorials/recommendation/recommendation-framework)
Original file line number Diff line number Diff line change
Expand Up @@ -13,39 +13,46 @@ weight: 15
## 推荐示例

```yaml
kind: Recommendation
apiVersion: analysis.crane.io/v1alpha1
kind: Recommendation
metadata:
name: idlenodes-rule-idlenode-5jxn9
namespace: crane-system
annotations:
analysis.crane.io/last-start-time: "2023-06-09 09:46:33"
analysis.crane.io/message: Success
analysis.crane.io/run-number: "111"
creationTimestamp: "2023-05-31T11:06:10Z"
generateName: idlenodes-rule-idlenode-
generation: 111
labels:
analysis.crane.io/recommendation-rule-name: idlenodes-rule
analysis.crane.io/recommendation-rule-recommender: IdleNode
analysis.crane.io/recommendation-rule-uid: 8921a198-7082-11ed-8b7b-246e960a8d8c
analysis.crane.io/recommendation-rule-uid: 25bf5a49-e78f-4f42-8e67-36c0b1b9bb5b
analysis.crane.io/recommendation-target-kind: Node
analysis.crane.io/recommendation-target-name: worker-node-1
analysis.crane.io/recommendation-target-namespace: ""
analysis.crane.io/recommendation-target-version: v1
beta.kubernetes.io/arch: amd64
beta.kubernetes.io/instance-type: bareMetal
beta.kubernetes.io/os: linux
name: idlenodes-rule-idlenode-px2ck
namespace: crane-system
ownerReferences:
- apiVersion: analysis.crane.io/v1alpha1
blockOwnerDeletion: false
controller: false
kind: RecommendationRule
name: idlenodes-rule
uid: 8921a198-7082-11ed-8b7b-246e960a8d8c
controller: false
blockOwnerDeletion: false
uid: 25bf5a49-e78f-4f42-8e67-36c0b1b9bb5b
spec:
adoptionType: StatusAndAnnotation
completionStrategy:
completionStrategyType: Once
targetRef:
apiVersion: v1
kind: Node
name: worker-node-1
apiVersion: v1
type: IdleNode
completionStrategy: {}
status:
targetRef: {}
action: Delete
lastUpdateTime: '2022-11-30T07:46:57Z'
description: Node is owned by DaemonSet
lastUpdateTime: "2023-06-09T09:46:33Z"
```

在该示例中:
Expand All @@ -60,4 +67,36 @@ status:

1. 扫描集群中所有节点和节点上的 Pod
2. 如果节点上所有 Pod 都属于 DaemonSet,则判定为闲置节点
3. 依据 IdleNode 的其他配置检测节点是否小于阈值水位,如果小于水位则判定为闲置节点

## 如何验证推荐结果的准确性

以下是判断节点资源阈值水位的 Prom query,验证时把 node 替换成实际的节点名

```go
// NodeCpuRequestUtilizationExprTemplate is used to query node cpu request utilization by promql, param is node name, node name which prometheus scrape
NodeCpuRequestUtilizationExprTemplate = `sum(kube_pod_container_resource_requests{node="%s", resource="cpu", unit="core"} * on (node) group_left() max(kube_node_labels{label_beta_kubernetes_io_instance_type!~"eklet", label_node_kubernetes_io_instance_type!~"eklet"}) by (node)) by (node) / sum(kube_node_status_capacity{node="%s", resource="cpu", unit="core"} * on (node) group_left() max(kube_node_labels{label_beta_kubernetes_io_instance_type!~"eklet", label_node_kubernetes_io_instance_type!~"eklet"}) by (node)) by (node) `
// NodeMemRequestUtilizationExprTemplate is used to query node memory request utilization by promql, param is node name, node name which prometheus scrape
NodeMemRequestUtilizationExprTemplate = `sum(kube_pod_container_resource_requests{node="%s", resource="memory", unit="byte", namespace!=""} * on (node) group_left() max(kube_node_labels{label_beta_kubernetes_io_instance_type!~"eklet", label_node_kubernetes_io_instance_type!~"eklet"}) by (node)) by (node) / sum(kube_node_status_capacity{node="%s", resource="memory", unit="byte"} * on (node) group_left() max(kube_node_labels{label_beta_kubernetes_io_instance_type!~"eklet", label_node_kubernetes_io_instance_type!~"eklet"}) by (node)) by (node) `
// NodeCpuUsageUtilizationExprTemplate is used to query node memory usage utilization by promql, param is node name, node name which prometheus scrape
NodeCpuUsageUtilizationExprTemplate = `sum(label_replace(irate(container_cpu_usage_seconds_total{instance="%s", container!="POD", container!="",image!=""}[1h]), "node", "$1", "instance", "(^[^:]+)") * on (node) group_left() max(kube_node_labels{label_beta_kubernetes_io_instance_type!~"eklet", label_node_kubernetes_io_instance_type!~"eklet"}) by (node)) by (node) / sum(kube_node_status_capacity{node="%s", resource="cpu", unit="core"} * on (node) group_left() max(kube_node_labels{label_beta_kubernetes_io_instance_type!~"eklet", label_node_kubernetes_io_instance_type!~"eklet"}) by (node)) by (node) `
// NodeMemUsageUtilizationExprTemplate is used to query node memory usage utilization by promql, param is node name, node name which prometheus scrape
NodeMemUsageUtilizationExprTemplate = `sum(label_replace(container_memory_usage_bytes{instance="%s", namespace!="",container!="POD", container!="",image!=""}, "node", "$1", "instance", "(^[^:]+)") * on (node) group_left() max(kube_node_labels{label_beta_kubernetes_io_instance_type!~"eklet", label_node_kubernetes_io_instance_type!~"eklet"}) by (node)) by (node) / sum(kube_node_status_capacity{node="%s", resource="memory", unit="byte"} * on (node) group_left() max(kube_node_labels{label_beta_kubernetes_io_instance_type!~"eklet", label_node_kubernetes_io_instance_type!~"eklet"}) by (node)) by (node) `
```

## 支持的资源类型

只支持 Node,由于 Node 是 Cluster Scope 资源,因此 IdleNode 类型的 Recommendation 均在 crane-system namespace。

## 参数配置

| 配置项 | 默认值 | 描述 |
|----------|------|------------------------------------------|
| cpu-request-utilization | 0 | 高于该值利用率的节点不是闲置节点,0.5代表50%,默认不检查 |
| cpu-usage-utilization | 0 | 高于该值 request 使用率的节点不是闲置节点,0.5代表50%,默认不检查 |
| cpu-percentile | 0.99 | 计算 cpu 负载时的 Percentile |
| memory-request-utilization | 0 | 高于该值利用率的节点不是闲置节点,0.5代表50%,默认不检查 |
| memory-usage-utilization | 0 | 高于该值 request 使用率的节点不是闲置节点,0.5代表50%,默认不检查 |
| memory-percentile | 0.99 | 计算 memory 负载时的 Percentile |

如何更新推荐的配置请参考:[**推荐框架**](/zh-cn/docs/tutorials/recommendation/recommendation-framework)
37 changes: 37 additions & 0 deletions site/content/zh/docs/Tutorials/Recommendation/pv-recommendation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
---
title: "PV 推荐"
description: "PV 推荐功能介绍"
weight: 15
---

PV 推荐通过扫描集群中 PV 的运行状况,帮助用户找到闲置的 Kubernetes PV。

## 动机

通常在 Kubernetes 中我们会使用 PV + Workload 来自动创建和管理存储卷并将存储卷挂载到应用上,在日常的运营中难免会出现空闲或者空跑的存储卷,浪费了大量成本, PV 推荐尝试帮助用户找到这部分 PV 来实现成本优化。

## 推荐示例

```yaml

```

在该示例中:

- 推荐的 TargetRef 指向了 PV:
- 推荐类型为 PV 推荐
- action 是 Delete,这里只是给出建议

## 实现原理

PV 推荐按以下步骤完成一次推荐过程:

1. 扫描集群中所有 PV,找到 PV 对应的 Pod 列表
2. 如果 PV 没有对应的 PVC,则判断为闲置 PV
3. 如果没有 Pod 关联这个 PV 和 PVC,则判断为闲置 PVC

## 参数配置

目前 PV 推荐没有参数配置。

如何更新推荐的配置请参考:[**推荐框架**](/zh-cn/docs/tutorials/recommendation/recommendation-framework)
Original file line number Diff line number Diff line change
Expand Up @@ -128,6 +128,8 @@ patchData=`kubectl get recommend workloads-rule-replicas-rckvb -n default -o jso
- [**副本数推荐**](/zh-cn/docs/tutorials/recommendation/replicas-recommendation): 通过 HPA 算法分析应用的真实用量推荐更合适的副本数量
- [**HPA 推荐**](/zh-cn/docs/tutorials/recommendation/hpa-recommendation): 扫描集群中的 Workload,针对适合适合水平弹性的 Workload 推荐 HPA 配置
- [**闲置节点推荐**](/zh-cn/docs/tutorials/recommendation/idlenode-recommendation): 扫描集群中的闲置节点
- [**Service 推荐**](/zh-cn/docs/tutorials/recommendation/service-recommendation): 扫描集群中的闲置 Service
- [**PV 推荐**](/zh-cn/docs/tutorials/recommendation/pv-recommendation): 扫描集群中的闲置 PV

### Recommender 框架

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
---
title: "闲置节点推荐"
description: "闲置节点推荐功能介绍"
weight: 15
---

Service 推荐通过扫描集群中 Service 的运行状况,帮助用户找到闲置的 Kubernetes Service。

## 动机

通常在 Kubernetes 中我们会使用 Service + Workload 来自动创建和管理负载均衡并将负载均衡挂载到应用上,在日常的运营中难免会出现空闲和低利用率的负载均衡,浪费了大量成本,Service 推荐尝试帮助用户找到这部分 Service 来实现成本优化。

## 推荐示例

```yaml
apiVersion: analysis.crane.io/v1alpha1
kind: Recommendation
metadata:
annotations:
analysis.crane.io/last-start-time: "2023-06-12 11:52:23"
analysis.crane.io/message: Success
analysis.crane.io/run-number: "7823"
creationTimestamp: "2023-06-12T09:44:23Z"
labels:
analysis.crane.io/recommendation-rule-name: service-rule
analysis.crane.io/recommendation-rule-recommender: Service
analysis.crane.io/recommendation-rule-uid: 67807cd9-b4c9-4d63-8493-d330ccace364
analysis.crane.io/recommendation-target-kind: Service
analysis.crane.io/recommendation-target-name: nginx
analysis.crane.io/recommendation-target-namespace: crane-system
analysis.crane.io/recommendation-target-version: v1
name: service-rule-service-cnwt5
namespace: crane-system
ownerReferences:
- apiVersion: analysis.crane.io/v1alpha1
blockOwnerDeletion: false
controller: false
kind: RecommendationRule
name: service-rule
uid: 67807cd9-b4c9-4d63-8493-d330ccace364
spec:
adoptionType: StatusAndAnnotation
completionStrategy:
completionStrategyType: Once
targetRef:
apiVersion: v1
kind: Service
name: nginx
namespace: crane-system
type: Service
status:
action: Delete
description: It is a Orphan Service, Pod count is 0
lastUpdateTime: "2023-06-12T11:52:23Z"
```

在该示例中:

- 推荐的 TargetRef 指向了 Service:nginx
- 推荐类型为 Service 推荐
- action 是 Delete,这里只是给出建议

## 实现原理

Service 推荐按以下步骤完成一次推荐过程:

1. 扫描集群中所有 LoadBalancer 类型的 Service
2. 如果 Service 对应的 endpoints 中有 Address 或者 NotReadyAddresses,则不是限制的 Service
3. 依据 Service 推荐中流量相关 metric 检测 Service 是否小于阈值水位,如果小于水位则判定为闲置节点

## 如何验证推荐结果的准确性

以下是判断节点资源阈值水位的 Prom query,验证时把 node 替换成实际的节点名

```go
// Container network cumulative count of bytes received
queryFmtNetReceiveBytes = `sum(rate(container_network_receive_bytes_total{namespace="%s",pod=~"%s",container!=""}[3m]))`
// Container network cumulative count of bytes transmitted
queryFmtNetTransferBytes = `sum(rate(container_network_transmit_bytes_total{namespace="%s",pod=~"%s",container!=""}[3m]))`
```

## 支持的资源类型

只支持 Service 类型,目前只会对 LoadBalancer 类型的 Service 进行分析。

## 参数配置

| 配置项 | 默认值 | 描述 |
|----------|-----|---------------------------------|
| net-receive-bytes | 0 | Service 对应 Pods 接受到的网络请求 bytes,默认不检查 |
| net-receive-percentile | 0.99 | 计算接受到的网络请求时的 Percentile |
| net-transfer-bytes | 0 | Service 对应 Pods 传输的网络请求 bytes,默认不检查 |
| net-transfer-percentile | 0.99 | 计算传输的网络请求时的 Percentile |

注意,当 pod 配置了 liveness/readness probe 后,kubelet 的探测会带来一定的容器流量,因此流量的阈值需要设置的稍微大一些,可结合具体监控数据配置。

如何更新推荐的配置请参考:[**推荐框架**](/zh-cn/docs/tutorials/recommendation/recommendation-framework)

0 comments on commit 94f37f4

Please sign in to comment.