You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This Product Requirement Document outlines a proposal for the setup and integration of Consul, Prometheus, and Grafana on AWS for real-time monitoring of the Masa Protocol using Docker for deployment and Terraform for managing AWS infrastructure.
Goal
The rapid creation of a resilient, extensible, real-time monitoring system.
Audience
Masa Protocol Team
Background and Context
Problem Statement
At Masa we’re looking to build an event driven data architecture as a means to gather data from our nodes. This approach provides resilience, flexibility, and scalability. However, it comes with some challenges in the short term:
events are granular and need to be processed post ingestion into coherent datasets.
datasets need to be visualized and made available to relevant parties.
this process needs to be low enough latency to enable quick responses to novel issues.
In essence, the proposed stack allows Masa to get access to critical protocol information while our more general event system is still maturing.
Integration of Prometheus for metrics collection and alerting.
Integration of Consul for node discoverability
Real-time monitoring dashboard using Grafana.
Docker containers for deploying Consul, Prometheus, and Grafana.
Infrastructure setup using Terraform.
Secure communication between Prometheus and Protocol nodes using TLS certificates.
Deliverables
A fully functional monitoring stack deployed on AWS.
Terraform scripts for automating infrastructure setup.
Docker configurations for Consul, Prometheus, and Grafana.
Documentation for setup, configuration, and usage of the monitoring stack.
Out-of-Scope
Excluded Features
Any advanced analytics on the collected metrics.
Integration with third-party monitoring tools not mentioned in this document.
Support for any Masa services outside of the Masa Protocol.
Testing and Validation
Testing Strategy
Perform unit tests for individual components.
Conduct integration tests to ensure proper communication between Consul, Prometheus, and Grafana.
Execute end-to-end tests to validate the entire monitoring stack.
Validation Criteria
All tests pass without errors.
Metrics are accurately collected and displayed in Grafana dashboards.
Secure communication is verified with mTLS (Optional if y'all don't think it's necessary)
User Stories
Protocol Monitoring
Title: Utilize Prometheus for Node Monitoring
As an: Oracle Developer
I want: to integrate Prometheus to collect and store metrics from all services
So that: I can monitor system performance, identify issues in real-time, and ensure system reliability
Acceptance Criteria:
Prometheus is deployed and configured to scrape metrics from all services.
Metrics are accessible via a centralized dashboard.
Alerts are configured for key performance indicators (TBD)
Node Discovery
Title: Implement Consul for Node Discovery
As an: Oracle Developer
I want: to use Consul for dynamic node discovery and health checks
So that: services can automatically be discovered and relayed to Prometheus
Acceptance Criteria:
Consul is deployed and configured in the production environment
Services register with Consul upon startup through Oracle Analytics SDK
Health checks are configured and working, with failing services automatically deregistered.
Separation of Concerns
Title: Consolidated/Abstracted Node Analytics
As a: Data Lead
I want: to have analytics separated from general oracle function
So that: modification of oracle functionality does not break information services
Acceptance Criteria:
Consul and Prometheus are initialized and maintained within the Analytics SDK
Oracles reference the SDK for Analytics data delivery
Further Notes
Future improvements include:
Integration and refactoring of Prometheus data for more complex analysis using Thanos and S3 OR
An external time series data store to persist Prometheus data.
Extended health monitoring of individual nodes
Increased flexibility of node configuration using Ansible (or an alternative)
The text was updated successfully, but these errors were encountered:
Overview
Summary
This Product Requirement Document outlines a proposal for the setup and integration of Consul, Prometheus, and Grafana on AWS for real-time monitoring of the Masa Protocol using Docker for deployment and Terraform for managing AWS infrastructure.
Goal
The rapid creation of a resilient, extensible, real-time monitoring system.
Audience
Masa Protocol Team
Background and Context
Problem Statement
At Masa we’re looking to build an event driven data architecture as a means to gather data from our nodes. This approach provides resilience, flexibility, and scalability. However, it comes with some challenges in the short term:
In essence, the proposed stack allows Masa to get access to critical protocol information while our more general event system is still maturing.
In-Scope
Features and Functionality
Deliverables
Out-of-Scope
Excluded Features
Testing and Validation
Testing Strategy
Validation Criteria
User Stories
Protocol Monitoring
Title: Utilize Prometheus for Node Monitoring
As an: Oracle Developer
I want: to integrate Prometheus to collect and store metrics from all services
So that: I can monitor system performance, identify issues in real-time, and ensure system reliability
Acceptance Criteria:
Node Discovery
Title: Implement Consul for Node Discovery
As an: Oracle Developer
I want: to use Consul for dynamic node discovery and health checks
So that: services can automatically be discovered and relayed to Prometheus
Acceptance Criteria:
Separation of Concerns
Title: Consolidated/Abstracted Node Analytics
As a: Data Lead
I want: to have analytics separated from general oracle function
So that: modification of oracle functionality does not break information services
Acceptance Criteria:
Further Notes
The text was updated successfully, but these errors were encountered: