Define initial metrics to export to Prometheus as MVP #5

lohanspies · 2020-07-13T09:26:16Z

Prometheus MVP Metrics

The Sovrin Network name being monitored
Should be able to get this from the pool being connected to
Node alias name
Detect when a node is inaccessible and produce standard output for that situation.

Should generate a timeout when trying to pull validator_info from inaccessible nodes.

Detect any nodes that are accessible but that are "unreachable" to some or all of the other Indy nodes.
- That indicates that the internal port to the node is not accessible, even though the public port is accessible.

"Reachable_nodes": [
            [
              "Node1",
              0
            ],
            [
              "Node3",
              null
            ],
            [
              "Node4",
              null
            ]
          ],
          "Unreachable_nodes": [
            [
              "Node2",
              null
            ]
          ],
          "Reachable_nodes_count": 3,
          "Unreachable_nodes_count": 1,

The number of transaction per Indy ledger, especially the domain ledger.

"transaction-count": {
              "ledger": 21,
              "pool": 4,
              "config": 0,
              "audit": 1042
            },

The average read and write times for the node.

"throughput": {
              "0": 0.0017547843
            },
            "master throughput": 0.0017547843,
            "total requests": 16,
            "avg backup throughput": null,
            "master throughput ratio": null,
            "average-per-second": {
              "read-transactions": 0.0338584473,
              "write-transactions": 0.0001539895
            },

The average throughput time for the node.

"throughput": {
              "0": 0.0017547843
            },
            "master throughput": 0.0017547843,
            "total requests": 16,
            "avg backup throughput": null,
            "master throughput ratio": null,
            "average-per-second": {
              "read-transactions": 0.0338584473,
              "write-transactions": 0.0001539895
            },

The uptime of the node (time is last restart).

    "transaction-count": {
              "ledger": 21,
              "pool": 4,
              "config": 0,
              "audit": 1042
            },
            "uptime": 103903
          },

The time since last freshness check (should be less than 5 minutes).

          "Freshness_status": {
            "1": {
              "Last_updated_time": "2020-07-06 23:55:07+00:00",
              "Has_write_consensus": true
            },
            "0": {
              "Last_updated_time": "2020-07-06 23:57:33+00:00",
              "Has_write_consensus": true
            },
            "2": {
              "Last_updated_time": "2020-07-06 23:57:33+00:00",
              "Has_write_consensus": true
            }
          }

Node IP address information

"Node_info": {
          "Name": "Node4",
          "Mode": "participating",
          "Client_port": 9708,
          "Client_ip": "0.0.0.0",
          "Client_protocol": "tcp",
          "Node_port": 9707,
          "Node_ip": "0.0.0.0",

Total nodes in pool information

"Pool_info": {
          "Read_only": false,
          "Total_nodes_count": 4,

The text was updated successfully, but these errors were encountered:

kiview · 2020-07-13T11:11:22Z

Since Prometheus will only gather numeric metrics, there are some things to consider when modeling the metrics.

The Sovrin Network name being monitored

Should be a label attached to all metrics.

Node alias name

Should also be a label, for each node we get (with the node being the top-level objects in the duct we are currently fetching).

Detect when a node is inaccessible and produce standard output for that situation.

This would happen outside of the exporter, either in Prometheus through Altermanager, or in Grafana.

The number of transaction per Indy ledger, especially the domain ledger.

Should work as Gauge transactions_total with a label per ledger.

The average read and write times for the node.

Here I wonder how the values are measured. Ideally, we could just record the total requests in a Gauge and let Prometheus infer the other metrics. Else having histograms for throughput might be fine, we just have to be careful with regards to statistically wrong double aggregations.

The uptime of the node (time is last restart).

Clearly a gauge with a label per node.

The time since last freshness check (should be less than 5 minutes).

Diff against time of the and record as Gauge?

Node IP address information

This could be a label, same as the node name.

Total nodes in pool information

Gauge with pool name as label.

kiview · 2020-07-13T18:32:55Z

One question regarding freshness status:

When I have a test network with 4 nodes, I get 3 freshness values, as you have posted above:

          "Freshness_status": {
            "1": {
              "Last_updated_time": "2020-07-06 23:55:07+00:00",
              "Has_write_consensus": true
            },
            "0": {
              "Last_updated_time": "2020-07-06 23:57:33+00:00",
              "Has_write_consensus": true
            },
            "2": {
              "Last_updated_time": "2020-07-06 23:57:33+00:00",
              "Has_write_consensus": true
            }
          }

What does these numbers as keys (0,1,2) represent and how should we interpret them?

Refactor PoolCollection

WadeBarnes · 2022-11-10T15:00:05Z

These metrics should be available on the auto-provisioned dashboards supplied with the monitoring stack. If anything else is needed or anything is missing a separate issue can be opened.

kiview mentioned this issue Jul 13, 2020

Add Prometheus exporter #6

Closed

11 tasks

WadeBarnes pushed a commit to WadeBarnes/indy-node-monitor that referenced this issue Jun 27, 2021

Merge pull request hyperledger#5 from WadeBarnes/feature/fastapi

03c5167

Refactor PoolCollection

WadeBarnes closed this as completed Nov 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Define initial metrics to export to Prometheus as MVP #5

Define initial metrics to export to Prometheus as MVP #5

lohanspies commented Jul 13, 2020

kiview commented Jul 13, 2020

kiview commented Jul 13, 2020

WadeBarnes commented Nov 10, 2022

Define initial metrics to export to Prometheus as MVP #5

Define initial metrics to export to Prometheus as MVP #5

Comments

lohanspies commented Jul 13, 2020

kiview commented Jul 13, 2020

kiview commented Jul 13, 2020

WadeBarnes commented Nov 10, 2022