Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Health Workstream Metrics & Monitoring #1

Open
NickyHickman opened this issue Sep 14, 2020 · 4 comments
Open

Health Workstream Metrics & Monitoring #1

NickyHickman opened this issue Sep 14, 2020 · 4 comments

Comments

@NickyHickman
Copy link

Metrics and Monitoring for Indy Networks

Purpose of this document:

  • Develop core metrics for Indy node and network operators against standards, policies, principles and contracts within the governance frameworks
  • Provide business and governance requirements to health workstream
  • Support process improvement of network operations
    • Fault Monitoring and alerting
    • Steward Support Services
  • Support security - identify vulnerabilities, anomalies and alert for specific types of attack
  • Identify metrics to support development roadmap of a coherent and interoperable network of networks

**Background and Rationale **

  • The Sovrin Steward Council Health Workstream has been exploring data available via Indy Scan and Indy Node Monitor in order to measure individual node and network health and improve network operations.
  • Many new Indy nodes and networks are under development with their own requirements to monitor nodes and their networks as a whole
  • Measuring in a consistent way helps network users at upper layers of ToIP stack choose the right networks to fit their business model, ecosystem or jurisdiction
  • Sovrin, Trust over IP and the Hyperledger communities have been working towards the development of SSI as a utility layer: an interoperable network of networks.
  • Measuring in a consistent way enables overall n/w of n/w’s monitoring and management

Who are these metrics for?

For this document we have only focused on the needs of Sovrin as a community of Stewards (Indy Node Operators) and as an Indy Network Operator and Governance Authority.

However there are other other roles within the ecosystem that may use these metrics or may define their own metrics. For example an Agency at layer 2 might select different networks based on specific networks for specific types of transactions. Or an ecosystem might select a specific sub-set of the network of networks based on measures of decentralization or performance. An ecosystem that is concerned with IoT for example will require very high throughput and capacity vs an ecosystem that is all about KYC for private banking will require lower capacity and performance but higher consensus and freshness. Equally we may find other metrics that are useful in the future as we learn more from these data.

We have sub-divided the our uses of these metrics into 4 groups

  1. Node Operators. Focuses on node health and performance, fault monitoring, local security and conformance within the network
  2. Network Operations. This focuses on network health, fault monitoring, security, and technical roadmap
  3. **Public Dashboard. ** It's important to demonstrate to network users, and others the health of the network and to be able to measure against performance of other SSI networks. This builds confidence, trust and accountability
  4. **Business. ** Whether a for profit of not for profit model, every network operator needs to be sustainable

Categories of Metrics

We have grouped the metrics into 4 Categories

  1. Network Health including Availability, Performance and Quality. Four of these metrics build on the Work of the Hyperledger Performance & Scale Working Group’s white paper on metrics October 2018.
  2. **Business **primarily focused on uptake and usage, this category of measures over time will enable us to build patterns of usage and behaviours which in turn will help inform security as well as measures for business performance in building market adoption
  3. Governance Framework Compliance this measures essential components of SSI such as Diversity and Decentralization, they are relevant to the role of Governance Authorities and attempt to measure some of the qualities of SSI.

Table of proposed metrics

Category Measure Node Operators Network Operations Public Dashboard Business How? / Notes
Network Health: Capacity Availability % Uptime 1 1 1 1 Monitor availability across nodes

Alerting

Current status (dashboard)

Steward response time

Trends

Correlate with events - upgrades, etc.

Network Health: Capacity Capacity % utilisation 1 1 Needed to support security controls and enable minimum permissible pricing as well as dimensioning / planning
Network Health: Performance Read Latency 1 1 Hyperledger Metric: = Time when response received – submit time
Network Health: Performance Read Throughput 1 1 Hyperledger Metric: = Total read operations / total time in seconds
Network Health: Performance Transaction Latency 1 1 Hyperledger Metric: = (Confirmation time @ network threshold) – submit time
Network Health: Performance Transaction Throughput 1 1 1 Hyperledger Metric = Total committed transactions / total time in seconds @ #committed nodes
Network Health: Quality Consensus 1 1 1 Monitor consensus across nodes;

Possible - monitor view change events

Network Health: Quality Freshness 1 1 1 Freshness timestamp reported by each validator node.
Network Health: Quality Reputation 1 1 Future: Score nodes and (when in n/w of n/w’s) score network using Open Reputation where the node is the entity. Could also apply to transaction endorsers etc https://github.com/SmithSamuelM/Papers/blob/master/whitepapers/open-reputation-low-level-whitepaper.pdf
Business Usage # Writes 1 Indy Monitor - Track writes across ledgers
Business Usage # Transaction Authors & Endorsers 1 Track TA and TE DIDs
Business Uptake: # new TAs & TEs 1 Track TA and TE DIDs
Governance Framework Compliance Diversity: Geo-location of Stewards and Nodes 1 1 Diversity can be measured https://link.springer.com/chapter/10.1007/978-3-642-45030-3_13, but diversity is measured against attributes we therefore need to identify elements which must be diversified both at a technical and organizational level, assign attributes (claims) against them and then use these types of metrics to measure diversity, this can be set at different levels as the network grows
Governance Framework Compliance Diversity: Server / host type for Nodes 1 1 See above: Geo IP lookup
Governance Framework Compliance Sustainability: % Churn rate of Stewards 1 1 Monitor Steward lifecycle in HubSpot
Governance Framework Compliance Sustainability: Av. Cost / year to run a node 1 1 1 Qualitative annual Survey
Governance Framework Compliance Decentralization: level of hierarchy or influence 1 1 Measure heirarchy in networks https://arxiv.org/abs/1202.0191

Level of influence (there is a clever mathematical formula that enables you to measure levels of influence in networks, and a good deal of research in this domain eg http://dss.in.tum.de/files/bichler-research/2008_kiss_identification_of_influencers.pd f .

Future metrics to consider:

In future, several factors several factors suggest that further metrics may be required:

  1. High volumes of usage on individual networks may require a % capacity metric. This could also support networking within a network of networks, pricing and security measures.
  2. Operation of a network of networks (or grid) to build a global public utility layer may require the application of many of the above metrics by a governance authority at the ‘grid’ level (horizontally at layer 1 in the ToIP Stack vs vertically slice from layer 4 down)
@NickyHickman
Copy link
Author

@lohanspies @kiview - here is my starting document for the what we measure and why

@lohanspies
Copy link
Contributor

@NickyHickman moved the document into the indy-health folder. Suggest we track changes towards a v1 release document on health and metrics for Indy Networks.

@swcurran
Copy link
Collaborator

That's a great list and the majority of those metrics are available today -- we could be a couple of dev weeks away from having all of this...

  • most of the data is available today via indy-node-monitor's fetch-validator-status
  • others are available in indy-vdr
  • the business ones are available (worst case) from indy-scan or BC's indy ledger browser

Once retrieved, all can be passed to a log collector for visualization.

@NickyHickman
Copy link
Author

thanks for feedback - thinking it would be good to start with a core 3-5 max 7 key metrics for the public dashboard. ultimately I would love us to have a 'Net Trust Score' for the network - setting that standard etc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants