Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions apps/images/portworx.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
15 changes: 15 additions & 0 deletions apps/portworx.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
---
apiVersion: v1
kind: App
name: "Portworx"
keywords:
- Storage
- Available
availableVersions:
- '2.9'
shortDescription: "Portworx is an end-to-end storage and data management solution for Kubernetes"
description: |
Portworx provides a fully integrated solution for persistent storage, data protection, disaster recovery, data security, cross-cloud and data migrations, and automated capacity management for applications running on Kubernetes.
icon: https://raw.githubusercontent.com/sysdiglabs/promcat-resources/master/apps/images/portworx.svg
website: https://portworx.com/
available: true
61 changes: 61 additions & 0 deletions resources/portworx/ALERTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# Alerts
## No Quorum
Portworx No Quorum.

## Node Status Not OK
Portworx Node Status Not OK.

## Offline Nodes
Portworx Offline Nodes.

## Nodes Storage Full or Down
Portworx Nodes Storage Full or Down.

## Offline Storage Nodes
Portworx Offline Storage Nodes.

## Unhealthy Node KVDB
Portworx Unhealthy Node KVDB.

## Cache read hit rate is low
Portworx Cache read hit rate is low.

## Cache write hit rate is low
Portworx Cache write hit rate is low.

## High Read Latency In Disk
Portworx High Read Latency In Disk.

## High Write Latency In Disk
Portworx High Write Latency In Disk.

## Low Cluster Capacity
Portworx Low Cluster Capacity.

## Disk Full In 48H
Portworx Disk Full In 48H.

## Disk Full In 12H
Portworx Disk Full In 12H.

## Pool Status Not Online
Portworx Node Status Not Online.

## High Write Latency In Pool
Portworx High Write Latency In Pool.

## Pool Full In 48H
Portworx Pool Full In 48H.

## Pool Full In 12H
Portworx Pool Full In 12H.

## High Write Latency In Volume
Portworx High Write Latency In Volume.

## High Read Latency In Volume
Portworx High Read Latency In Volume.

## License Expiry
Portworx License Expiry.

2 changes: 2 additions & 0 deletions resources/portworx/INSTALL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# Prerequisites
Portworx instruments Prometheus metrics and annotates the pods with Prometheus annotations.
12 changes: 12 additions & 0 deletions resources/portworx/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# Portworx
Portworx provides a fully integrated solution for persistent storage, data protection, disaster recovery, data security, cross-cloud and data migrations, and automated capacity management for applications running on Kubernetes.

# Prometheus and exporters
Portworx already has a Prometheus endpoint with all the metrics exposed on the port 9001 (port 17001 if deployed in Openshift). In Kubernetes the pod is already annotated, so with the Sysdig agent you can scrape the endpoint right away.

# Metrics
- Portworx cluster statistics
- Portworx volumes statistics

# Attributions
Configuration files, dashboards and alerts are maintained by [Sysdig team](https://sysdig.com/).
164 changes: 164 additions & 0 deletions resources/portworx/alerts.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,164 @@
apiVersion: v1
kind: Alert
app: Portworx
version: 1.0.0
appVersion:
- '2.9'
descriptionFile: ALERTS.md
configurations:
- kind: Prometheus
data: |-
groups:
- name: Portworx
rules:
- alert: '[Portworx] No Quorum'
expr: "px_cluster_status_quorum != 1"
for: 5m
labels:
severity: critical
annotations:
description: Portworx No Quorum.
- alert: '[Portworx] Node Status Not OK'
expr: "px_node_status_node_status != 2"
for: 5m
labels:
severity: critical
annotations:
description: Portworx Node Status Not OK.
- alert: '[Portworx] Offline Nodes'
expr: "px_cluster_status_nodes_offline > 0"
for: 5m
labels:
severity: critical
annotations:
description: Portworx Offline Nodes.
- alert: '[Portworx] Nodes Storage Full or Down'
expr: "px_cluster_status_nodes_storage_down > 0"
for: 5m
labels:
severity: critical
annotations:
description: Portworx Nodes Storage Full or Down.
- alert: '[Portworx] Offline Storage Nodes'
expr: "px_cluster_status_storage_nodes_offline > 0"
for: 5m
labels:
severity: critical
annotations:
description: Portworx Offline Storage Nodes.
- alert: '[Portworx] Unhealthy Node KVDB'
expr: "px_kvdb_health_state_node_view == 2"
for: 5m
labels:
severity: critical
annotations:
description: Portworx Unhealthy Node KVDB.
- alert: '[Portworx] Cache read hit rate is low'
expr: "px_px_cache_read_hits/( px_px_cache_read_hits + px_px_cache_read_miss)< 0.80"
for: 5m
labels:
severity: warning
annotations:
description: Portworx Cache read hit rate is low.
- alert: '[Portworx] Cache write hit rate is low'
expr: "px_px_cache_write_hits/( px_px_cache_write_hits + px_px_cache_write_miss)< 0.80"
for: 5m
labels:
severity: warning
annotations:
description: Portworx Cache write hit rate is low.
- alert: '[Portworx] High Read Latency In Disk'
expr: |
px_disk_stats_read_latency_seconds{ disk=~$disk} > 0.100
for: 5m
labels:
severity: warning
annotations:
description: Portworx High Read Latency In Disk.
- alert: '[Portworx] High Write Latency In Disk'
expr: |
px_disk_stats_write_latency_seconds{ disk=~$disk} > 0.250
for: 5m
labels:
severity: warning
annotations:
description: Portworx High Write Latency In Disk.
- alert: '[Portworx] Low Cluster Capacity'
expr: |
(sum (px_cluster_disk_available_bytes))/(sum (px_cluster_disk_total_bytes))< 0.10
for: 5m
labels:
severity: critical
annotations:
description: Portworx Low Cluster Capacity.
- alert: '[Portworx] Disk Full In 48H'
expr: |
predict_linear(px_cluster_disk_available_bytes[48h], 48 * 3600) < 0
for: 5m
labels:
severity: warning
annotations:
description: Portworx Disk Full In 48H.
- alert: '[Portworx] Disk Full In 12H'
expr: |
predict_linear(px_cluster_disk_available_bytes[12h], 12 * 3600) < 0
for: 5m
labels:
severity: warning
annotations:
description: Portworx Disk Full In 12H.
- alert: '[Portworx] Pool Status Not Online'
expr: "px_pool_stats_status{ pool=~$pool} != 1"
for: 5m
labels:
severity: warning
annotations:
description: Portworx Node Status Not Online.
- alert: '[Portworx] High Write Latency In Pool'
expr: |
px_pool_stats_write_latency_seconds{ pool=~$pool} > 0.250
for: 5m
labels:
severity: warning
annotations:
description: Portworx High Write Latency In Pool.
- alert: '[Portworx] Pool Full In 48H'
expr: |
predict_linear(px_pool_stats_available_bytes{ pool=~$pool}[48h], 48 * 3600) < 0
for: 5m
labels:
severity: warning
annotations:
description: Portworx Pool Full In 48H.
- alert: '[Portworx] Pool Full In 12H'
expr: |
predict_linear(px_pool_stats_available_bytes{ pool=~$pool}[12h], 12 * 3600) < 0
for: 5m
labels:
severity: warning
annotations:
description: Portworx Pool Full In 12H.
- alert: '[Portworx] High Write Latency In Volume'
expr: |
px_volume_write_latency_seconds{ pvc=~$pvc} > 0.250
for: 5m
labels:
severity: warning
annotations:
description: Portworx High Write Latency In Volume.
- alert: '[Portworx] High Read Latency In Volume'
expr: |
px_volume_read_latency_seconds{ pvc=~$pvc} > 0.100
for: 5m
labels:
severity: warning
annotations:
description: Portworx High Read Latency In Volume.
- alert: '[Portworx] License Expiry'
expr: |
min(px_node_status_license_expiry) < 30
for: 5m
labels:
severity: warning
annotations:
description: Portworx License Expiry.
28 changes: 28 additions & 0 deletions resources/portworx/dashboards.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
apiVersion: v1
kind: Dashboard
app: Portworx
version: 1.0.0
appVersion:
- '2.9'
configurations:
- name: Cluster
kind: Sysdig
image: portworx/images/portworx_cluster_sysdig.png
description: |
This dashboard offers information on:
* Health
* Network
* Disk Stats
* Pool Stats
* Cache
file: include/Portworx_Cluster.json
- name: Volumes
kind: Sysdig
image: portworx/images/portworx_volumes_sysdig.png
description: |
This dashboard offers information on:
* Volume Status
* Volume Capacity & Usage
* Volume Replication
* Volume IOPS
file: include/Portworx_Volumes.json
7 changes: 7 additions & 0 deletions resources/portworx/description.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
apiVersion: v1
kind: Description
app: Portworx
version: 1.0.0
appVersion:
- '2.9'
descriptionFile: README.md
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading