-
Notifications
You must be signed in to change notification settings - Fork 75
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #1104 from sun7927/doc
added the documents for audit and failover
- Loading branch information
Showing
4 changed files
with
208 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
--- | ||
sidebar_position: 6 | ||
sidebar_label: "Fast Failover" | ||
--- | ||
|
||
# Application Fast Failover | ||
|
||
When the stateful application (i.e. Pod with HwameiStor volume) runs into a problem, especially caused by the node issue, | ||
it's important to reschedule the Pod to another health node and keep running. | ||
|
||
However, due to the design of the Kubernetes' StatefulSet and Deployment, | ||
it will wait a long time (e.g. 5 mins) before rescheduling the Pod. | ||
Especially, it will never reschedule the Pod automatically for the StatefulSet Pod. | ||
This will cause the application stop, and even cause a huge business loss. | ||
|
||
HwameiStor provides a feature of fast failover to solve this problem. When identifying the application issue, | ||
it will reschedule the Pod immediately without waiting for a very long time. | ||
HwameiStor will fail the Pod over to another health node, and ensure the required data volumes are also located at the node. | ||
So, the application can continue to work. | ||
|
||
# How to use | ||
|
||
HwameiStor provides the fast failover considering the two cases: | ||
|
||
* Node Failure | ||
|
||
When a node fails, all the Pods on this node can't work any more。As to the Pod using HwameiStor volume, | ||
it's necessary to reschedule to another health node with the associated data volume replica. | ||
User can trigger the fast failover for this node by: | ||
``` | ||
Add a label to this node: | ||
kubectl label node <nodeName> hwameistor.io/failover=start | ||
When the fast failover completes, the label will be modified as: | ||
hwameistor.io/failover=completed | ||
``` | ||
|
||
* Pod Failure | ||
|
||
When a Pod fails, user can trigger the fast failover for it by: | ||
``` | ||
Add a lable to this Pod: | ||
kubectl label pod <podName> hwameistor.io/failover=start | ||
When the fast failover completes, the old Pod will be deleted and then the new one will be created on a new node. | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,58 @@ | ||
--- | ||
sidebar_position: 7 | ||
sidebar_label: "System Audit" | ||
--- | ||
|
||
# System Audit | ||
|
||
It's important to record the information about the system operation history. HwameiStor provides a feature of audit to record the operations on all the system resources, including Cluster, Node, StoragePool, Volume, etc... | ||
|
||
The audit information is easier for user to understant and parse for various purposes. | ||
|
||
# How to use | ||
|
||
HwameiStor designs a new CRD for every resource as below: | ||
|
||
```yaml | ||
apiVersion: hwameistor.io/v1alpha1 | ||
kind: Event | ||
name: | ||
spec: | ||
resourceType: <Cluster | Node | StoragePool | Volume> | ||
resourceName: | ||
records: | ||
- action: | ||
actionContent: #in JSON format | ||
time: | ||
state: | ||
stateContent: #in JSON format | ||
``` | ||
|
||
For instance, let's look a audit information of a volume: | ||
|
||
```yaml | ||
apiVersion: hwameistor.io/v1alpha1 | ||
kind: Event | ||
metadata: | ||
creationTimestamp: "2023-08-08T15:52:55Z" | ||
generation: 5 | ||
name: volume-pvc-34e3b086-2d95-4980-beb6-e175fd79a847 | ||
resourceVersion: "10221888" | ||
uid: d3ebaffb-eddb-4c84-93be-efff350688af | ||
spec: | ||
resourceType: Volume | ||
resourceName: pvc-34e3b086-2d95-4980-beb6-e175fd79a847 | ||
records: | ||
- action: Create | ||
actionContent: '{"requiredCapacityBytes":5368709120,"volumeQoS":{},"poolName":"LocalStorage_PoolHDD","replicaNumber":2,"convertible":true,"accessibility":{"nodes":["k8s-node1","k8s-master"],"zones":["default"],"regions":["default"]},"pvcNamespace":"default","pvcName":"mysql-data-volume","volumegroup":"db890e34-a092-49ac-872b-f2a422439c81"}' | ||
time: "2023-08-08T15:52:55Z" | ||
- action: Mount | ||
actionContent: '{"allocatedCapacityBytes":5368709120,"replicas":["pvc-34e3b086-2d95-4980-beb6-e175fd79a847-krp927","pvc-34e3b086-2d95-4980-beb6-e175fd79a847-wm7p56"],"state":"Ready","publishedNode":"k8s-node1","fsType":"xfs","rawblock":false}' | ||
time: "2023-08-08T15:53:07Z" | ||
- action: Unmount | ||
actionContent: '{"allocatedCapacityBytes":5368709120,"usedCapacityBytes":33783808,"totalInode":2621120,"usedInode":3,"replicas":["pvc-34e3b086-2d95-4980-beb6-e175fd79a847-krp927","pvc-34e3b086-2d95-4980-beb6-e175fd79a847-wm7p56"],"state":"Ready","publishedNode":"k8s-node1","fsType":"xfs","rawblock":false}' | ||
time: "2023-08-08T16:03:03Z" | ||
- action: Delete | ||
actionContent: '{"requiredCapacityBytes":5368709120,"volumeQoS":{},"poolName":"LocalStorage_PoolHDD","replicaNumber":2,"convertible":true,"accessibility":{"nodes":["k8s-node1","k8s-master"],"zones":["default"],"regions":["default"]},"pvcNamespace":"default","pvcName":"mysql-data-volume","volumegroup":"db890e34-a092-49ac-872b-f2a422439c81","config":{"version":1,"volumeName":"pvc-34e3b086-2d95-4980-beb6-e175fd79a847","requiredCapacityBytes":5368709120,"convertible":true,"resourceID":2,"readyToInitialize":true,"initialized":true,"replicas":[{"id":1,"hostname":"k8s-node1","ip":"10.6.113.101","primary":true},{"id":2,"hostname":"k8s-master","ip":"10.6.113.100","primary":false}]},"delete":true}' | ||
time: "2023-08-08T16:03:38Z" | ||
``` |
45 changes: 45 additions & 0 deletions
45
...urus-plugin-content-docs/current/quick_start/advanced_features/fast_failover.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,45 @@ | ||
--- | ||
sidebar_position: 6 | ||
sidebar_label: "应用故障恢复" | ||
--- | ||
|
||
# 快速故障恢复 | ||
|
||
针对 Kubernetes 中的有状态应用(挂载了 HwameiStor PVC 的 Pod ),当 Pod 或者 PVC 出现问题时,尤其是 Kubernetes 节点出现问题时, | ||
需要及时发现并重新调度,将 Pod 调度到其他健康的节点,并能成功挂载 PVC。 | ||
由于 Kubernetes 调度机制的限制,需要先等待比较长的时间(e.g. 5分钟)才能确定可以重新调度 Pod。 | ||
此外,由于 Pod 挂载了 PVC,还需额外等待较长时间(e.g. 6分钟)。 | ||
如果是 Statefulset 的 Pod,Kubernetes 不会进行重新调度,Deployment 的 Pod 可以。 | ||
这种情况将导致应用中断比较长时间,无法继续正常提供业务。 | ||
|
||
HwameiStor 为解决这类故障,提供了应用故障快速快速的能力。 | ||
在发现应用出现故障时,在很短的时间内将应用调度至另外的健康节点,同时保证在新节点上有应用所需的数据卷副本,从而保证业务应用正常运行。 | ||
|
||
# 使用方式 | ||
|
||
HwameiStor 为两类情况提供了应用故障快速恢复机制: | ||
|
||
* 节点出现故障 | ||
|
||
在这种情况下,该节点上的应用均无法正常运行。对于使用 HwameiStor 数据卷的应用,需要及时地将 Pod 重新调度到新的健康节点。 | ||
用户可以通过下列方式进行故障恢复: | ||
``` | ||
为该节点打标签(Label): | ||
kubectl label node <nodeName> hwameistor.io/failover=start | ||
当故障恢复完成后,上面的标签会变成: | ||
hwameistor.io/failover=completed | ||
``` | ||
|
||
* 应用 Pod 出现故障 | ||
|
||
在这种情况下,用户可以通过下列方式对 Pod 进行故障恢复: | ||
``` | ||
为该 Pod 打标签(Label): | ||
kubectl label pod <podName> hwameistor.io/failover=start | ||
当故障恢复完成后,旧的 Pod 会被删除,新的 Pod 会在新的节点上启动并正常运行。 | ||
``` |
56 changes: 56 additions & 0 deletions
56
...aurus-plugin-content-docs/current/quick_start/advanced_features/system_audit.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,56 @@ | ||
--- | ||
sidebar_position: 7 | ||
sidebar_label: "系统审计日志" | ||
--- | ||
|
||
# 审计日志 | ||
|
||
为了记录 HwameiStor 集群系统的使用和操作历史信息,HwameiStor 提供了系统审计日志。该审计日志具有 HwameiStor 系统语义,易于用户查阅、解析。 | ||
审计日志针对 HwameiStor 系统中的每类资源,记录其使用操作信息。该资源包括:Cluster、Node、StoragePool、Volume,等等。 | ||
|
||
# 使用方式 | ||
|
||
审计日志通过 CRD 的方式存入系统中,为每一个资源创建一个 CR 来记录其操作历史。该 CRD 如下: | ||
|
||
```yaml | ||
apiVersion: hwameistor.io/v1alpha1 | ||
kind: Event | ||
name: | ||
spec: | ||
resourceType: <Cluster | Node | StoragePool | Volume> | ||
resourceName: | ||
records: | ||
- action: | ||
actionContent: #in JSON format | ||
time: | ||
state: | ||
stateContent: #in JSON format | ||
|
||
``` | ||
|
||
```yaml | ||
apiVersion: hwameistor.io/v1alpha1 | ||
kind: Event | ||
metadata: | ||
creationTimestamp: "2023-08-08T15:52:55Z" | ||
generation: 5 | ||
name: volume-pvc-34e3b086-2d95-4980-beb6-e175fd79a847 | ||
resourceVersion: "10221888" | ||
uid: d3ebaffb-eddb-4c84-93be-efff350688af | ||
spec: | ||
resourceType: Volume | ||
resourceName: pvc-34e3b086-2d95-4980-beb6-e175fd79a847 | ||
records: | ||
- action: Create | ||
actionContent: '{"requiredCapacityBytes":5368709120,"volumeQoS":{},"poolName":"LocalStorage_PoolHDD","replicaNumber":2,"convertible":true,"accessibility":{"nodes":["k8s-node1","k8s-master"],"zones":["default"],"regions":["default"]},"pvcNamespace":"default","pvcName":"mysql-data-volume","volumegroup":"db890e34-a092-49ac-872b-f2a422439c81"}' | ||
time: "2023-08-08T15:52:55Z" | ||
- action: Mount | ||
actionContent: '{"allocatedCapacityBytes":5368709120,"replicas":["pvc-34e3b086-2d95-4980-beb6-e175fd79a847-krp927","pvc-34e3b086-2d95-4980-beb6-e175fd79a847-wm7p56"],"state":"Ready","publishedNode":"k8s-node1","fsType":"xfs","rawblock":false}' | ||
time: "2023-08-08T15:53:07Z" | ||
- action: Unmount | ||
actionContent: '{"allocatedCapacityBytes":5368709120,"usedCapacityBytes":33783808,"totalInode":2621120,"usedInode":3,"replicas":["pvc-34e3b086-2d95-4980-beb6-e175fd79a847-krp927","pvc-34e3b086-2d95-4980-beb6-e175fd79a847-wm7p56"],"state":"Ready","publishedNode":"k8s-node1","fsType":"xfs","rawblock":false}' | ||
time: "2023-08-08T16:03:03Z" | ||
- action: Delete | ||
actionContent: '{"requiredCapacityBytes":5368709120,"volumeQoS":{},"poolName":"LocalStorage_PoolHDD","replicaNumber":2,"convertible":true,"accessibility":{"nodes":["k8s-node1","k8s-master"],"zones":["default"],"regions":["default"]},"pvcNamespace":"default","pvcName":"mysql-data-volume","volumegroup":"db890e34-a092-49ac-872b-f2a422439c81","config":{"version":1,"volumeName":"pvc-34e3b086-2d95-4980-beb6-e175fd79a847","requiredCapacityBytes":5368709120,"convertible":true,"resourceID":2,"readyToInitialize":true,"initialized":true,"replicas":[{"id":1,"hostname":"k8s-node1","ip":"10.6.113.101","primary":true},{"id":2,"hostname":"k8s-master","ip":"10.6.113.100","primary":false}]},"delete":true}' | ||
time: "2023-08-08T16:03:38Z" | ||
``` |