Health Workstream Metrics & Monitoring #1

NickyHickman · 2020-09-14T11:24:00Z

Metrics and Monitoring for Indy Networks

Purpose of this document:

Develop core metrics for Indy node and network operators against standards, policies, principles and contracts within the governance frameworks
Provide business and governance requirements to health workstream
Support process improvement of network operations
- Fault Monitoring and alerting
- Steward Support Services
Support security - identify vulnerabilities, anomalies and alert for specific types of attack
Identify metrics to support development roadmap of a coherent and interoperable network of networks

**Background and Rationale **

The Sovrin Steward Council Health Workstream has been exploring data available via Indy Scan and Indy Node Monitor in order to measure individual node and network health and improve network operations.
Many new Indy nodes and networks are under development with their own requirements to monitor nodes and their networks as a whole
Measuring in a consistent way helps network users at upper layers of ToIP stack choose the right networks to fit their business model, ecosystem or jurisdiction
Sovrin, Trust over IP and the Hyperledger communities have been working towards the development of SSI as a utility layer: an interoperable network of networks.
Measuring in a consistent way enables overall n/w of n/w’s monitoring and management

Who are these metrics for?

For this document we have only focused on the needs of Sovrin as a community of Stewards (Indy Node Operators) and as an Indy Network Operator and Governance Authority.

However there are other other roles within the ecosystem that may use these metrics or may define their own metrics. For example an Agency at layer 2 might select different networks based on specific networks for specific types of transactions. Or an ecosystem might select a specific sub-set of the network of networks based on measures of decentralization or performance. An ecosystem that is concerned with IoT for example will require very high throughput and capacity vs an ecosystem that is all about KYC for private banking will require lower capacity and performance but higher consensus and freshness. Equally we may find other metrics that are useful in the future as we learn more from these data.

We have sub-divided the our uses of these metrics into 4 groups

Node Operators. Focuses on node health and performance, fault monitoring, local security and conformance within the network
Network Operations. This focuses on network health, fault monitoring, security, and technical roadmap
**Public Dashboard. ** It's important to demonstrate to network users, and others the health of the network and to be able to measure against performance of other SSI networks. This builds confidence, trust and accountability
**Business. ** Whether a for profit of not for profit model, every network operator needs to be sustainable

Categories of Metrics

We have grouped the metrics into 4 Categories

Network Health including Availability, Performance and Quality. Four of these metrics build on the Work of the Hyperledger Performance & Scale Working Group’s white paper on metrics October 2018.
**Business **primarily focused on uptake and usage, this category of measures over time will enable us to build patterns of usage and behaviours which in turn will help inform security as well as measures for business performance in building market adoption
Governance Framework Compliance this measures essential components of SSI such as Diversity and Decentralization, they are relevant to the role of Governance Authorities and attempt to measure some of the qualities of SSI.

Table of proposed metrics

Category	Measure	Node Operators	Network Operations	Public Dashboard	Business	How? / Notes
Network Health: Capacity	Availability % Uptime	1	1	1	1	Monitor availability across nodes Alerting Current status (dashboard) Steward response time Trends Correlate with events - upgrades, etc.
Network Health: Capacity	Capacity % utilisation		1		1	Needed to support security controls and enable minimum permissible pricing as well as dimensioning / planning
Network Health: Performance	Read Latency	1	1			Hyperledger Metric: = Time when response received – submit time
Network Health: Performance	Read Throughput	1	1			Hyperledger Metric: = Total read operations / total time in seconds
Network Health: Performance	Transaction Latency	1	1			Hyperledger Metric: = (Confirmation time @ network threshold) – submit time
Network Health: Performance	Transaction Throughput	1	1	1		Hyperledger Metric = Total committed transactions / total time in seconds @ #committed nodes
Network Health: Quality	Consensus	1	1	1		Monitor consensus across nodes; Possible - monitor view change events
Network Health: Quality	Freshness	1	1	1		Freshness timestamp reported by each validator node.
Network Health: Quality	Reputation		1		1	Future: Score nodes and (when in n/w of n/w’s) score network using Open Reputation where the node is the entity. Could also apply to transaction endorsers etc https://github.com/SmithSamuelM/Papers/blob/master/whitepapers/open-reputation-low-level-whitepaper.pdf
Business	Usage # Writes				1	Indy Monitor - Track writes across ledgers
Business	Usage # Transaction Authors & Endorsers				1	Track TA and TE DIDs
Business	Uptake: # new TAs & TEs				1	Track TA and TE DIDs
Governance Framework Compliance	Diversity: Geo-location of Stewards and Nodes			1	1	Diversity can be measured https://link.springer.com/chapter/10.1007/978-3-642-45030-3_13, but diversity is measured against attributes we therefore need to identify elements which must be diversified both at a technical and organizational level, assign attributes (claims) against them and then use these types of metrics to measure diversity, this can be set at different levels as the network grows
Governance Framework Compliance	Diversity: Server / host type for Nodes			1	1	See above: Geo IP lookup
Governance Framework Compliance	Sustainability: % Churn rate of Stewards		1		1	Monitor Steward lifecycle in HubSpot
Governance Framework Compliance	Sustainability: Av. Cost / year to run a node	1	1		1	Qualitative annual Survey
Governance Framework Compliance	Decentralization: level of hierarchy or influence			1	1	Measure heirarchy in networks https://arxiv.org/abs/1202.0191 Level of influence (there is a clever mathematical formula that enables you to measure levels of influence in networks, and a good deal of research in this domain eg http://dss.in.tum.de/files/bichler-research/2008_kiss_identification_of_influencers.pd f .

Future metrics to consider:

In future, several factors several factors suggest that further metrics may be required:

High volumes of usage on individual networks may require a % capacity metric. This could also support networking within a network of networks, pricing and security measures.
Operation of a network of networks (or grid) to build a global public utility layer may require the application of many of the above metrics by a governance authority at the ‘grid’ level (horizontally at layer 1 in the ToIP Stack vs vertically slice from layer 4 down)

NickyHickman · 2020-09-14T11:27:08Z

@lohanspies @kiview - here is my starting document for the what we measure and why

lohanspies · 2020-09-16T12:06:31Z

@NickyHickman moved the document into the indy-health folder. Suggest we track changes towards a v1 release document on health and metrics for Indy Networks.

swcurran · 2020-09-16T15:21:16Z

That's a great list and the majority of those metrics are available today -- we could be a couple of dev weeks away from having all of this...

most of the data is available today via indy-node-monitor's fetch-validator-status
others are available in indy-vdr
the business ones are available (worst case) from indy-scan or BC's indy ledger browser

Once retrieved, all can be passed to a log collector for visualization.

NickyHickman · 2020-09-16T17:45:37Z

thanks for feedback - thinking it would be good to start with a core 3-5 max 7 key metrics for the public dashboard. ultimately I would love us to have a 'Net Trust Score' for the network - setting that standard etc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Health Workstream Metrics & Monitoring #1

Health Workstream Metrics & Monitoring #1

NickyHickman commented Sep 14, 2020

NickyHickman commented Sep 14, 2020

lohanspies commented Sep 16, 2020

swcurran commented Sep 16, 2020

NickyHickman commented Sep 16, 2020

Health Workstream Metrics & Monitoring #1

Health Workstream Metrics & Monitoring #1

Comments

NickyHickman commented Sep 14, 2020

Metrics and Monitoring for Indy Networks

NickyHickman commented Sep 14, 2020

lohanspies commented Sep 16, 2020

swcurran commented Sep 16, 2020

NickyHickman commented Sep 16, 2020