Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: smartnode notification functionality for bounty BA022308 (PR 1 of 3) #449

Merged
merged 18 commits into from
Mar 14, 2024

Conversation

activescott
Copy link
Contributor

@activescott activescott commented Feb 22, 2024

This is an implementation of Smartnode Notification Functionality bounty BA022308.
The approach is based on an ongoing discord discussion thread and @jshufro was kind enough to provide some implementation guidance.

This PR is related to:

Below is a screenshot of the TUI:
Screenshot 2024-03-04 at 16 34 13

Below is a screenshot of what some alerts look like with a configured Discord webhook URL:
Screenshot 2024-02-21 at 15 35 46

Below is a screenshot of rocketpool node status output:
Screenshot 2024-02-23 at 12 47 31

Todo

This is a draft to get some initial feedback. What's included and remaining todos are (across both of these PRs):

  • Alertmanager container managed by rocketpool-cli

  • Support for Discord Notifications (Configuration and docs in TUI)

  • Support for notifications to all Alertmanager receivers (email, msteams, opsgenie, pagerduty, pushover, slack, sns, telegram, victorops, webhook, wechat, webex) using yml configuration.

  • Alertmanager client implementation to implement remaining ephemeral alerts. NOTE: Alertmanager client will handle things such as notification rate limiting (if an alert is sent by the node, resolves, and resent rapidly) and auto-resolving (e.g. once the node stop sending an alert to alertmanager, it will be automatically resolved based on a configurable time-based threshold which can be set per alert and globally).

  • Configured alerts from bounty:

    • 1. Client sync complete
    • 2. Client(s) lost sync (ClientSyncStatusBeacon, ClientSyncStatusExecution, implemented via new prometheus metrics named rocketpool_node_sync_progress using the same sync status calculation that rocketpool node sync uses).
    • 3. ** Disk free space is running low (say, 15% free remaining) and the user should consider pruning if using Geth or Nethermind (LowDiskSpaceWarning, LowDiskSpaceCritical provide a warning level and a critical level both with thresholds)
    • 4. node automatically staked a minipool, or attempted to (success or failure)
    • 5. node automatically promoted a vacant minipool, or attempted to (success or failure)
    • 6. node automatically reduced a minipool’s bond, or attempted to (success or failure)
    • 7. node detected a fee recipient change
    • 8. node automatically distributed a minipool’s balance
    • 9. You have a block proposal this epoch (UpcomingProposal using the same metric shown on the grafana dashboard)
    • 10. You submitted a block proposal
    • 11. You are scheduled for the next sync committee (UpcomingSyncCommittee using the same metric shown on the grafana dashboard)
    • 12. You have entered a sync committee (ActiveSyncCommittee using the same metric shown on the grafana dashboard)
    • 13. There is a new Smartnode update available (RPUpdatesAvailable)
  • Additional features beyond what is mentioned in bounty:

    • support for user-defined rules in /alerting/rules/*.yml using Prometheus rule configuration.
    • Alert: Rocket Pool OS Updates Available (OSUpdatesAvailable)
    • rocketpool node status shows current actively firing alerts
    • (optional) Alert: Staked RPL collateral falls below minimum level to claim RPL rewards (10% of borrowed ETH)
  • Basic TUI: Ability to configure alertmanager port & a discord webhook receiver/notification

  • Advanced TUI: Ability to enable/disable specific receivers and groups of alerts for specific receivers + new TUI page

    • Add the 7 custom alerts that are fired from shared/services/alerting/alerting.go to the TUI configuration for enablement.
  • Native Mode users support:

    • [-] instructions on creating and managing a systemd service for Native Mode users I'd like to recommend dropping this. The implementation is heavily related to setting up the Prometheus/Grafana stack and the current native mod docs direct the user to the docker section to set those up. Largely this is going to be installing Prometheus and Alertmanager and configuring them. If someone is advanced enough to do that I doubt they'll bother with our docs.
    • Anything else for native mode users? Does something need disabled in docker config or something??
  • a pull request to our documentation guides repository (GitHub - rocket-pool/docs.rocketpool.net: Rocket Pool Documentation & Guide Hub) with complete and thorough documentation describing its configuration and usage for Docker, Hybrid, and Native Mode users alike. – see feat: docs for smartnode notification functionality for bounty BA022308 (PR 3 of 3) docs.rocketpool.net#73

Open Questions

  • There is "first-class UI" configuration in TUI for the basics of alert manager including a Discord Webhook URL for notifications and ports similar to other containers. Currently each alert is configured in an Alertmanager rules config file with configurable thresholds at ~/.rocketpool/alerting/rules/default.yml. I can add each alert to the TUI, but this is of questionable value in IMHO considering the documentation and flexibility of the alertmanager rules file. Does the GMC want each notification in the TUI or is the rules yml acceptable?
    • NOTE: The bounty mentions Notifications must be configurable, meaning they need to have first-class support as a dedicated page in the service config TUI with parameters to enable and disable each notification, and for adjustable thresholds where appropriate.
    • As discussed below we will implement TUI for this. Task exists above.

@activescott
Copy link
Contributor Author

@jshufro Will appreciate your review of this before I move onto the additional "harder" alerts. Specifically looking for any feedback on technical direction/factoring and the remaining Todos above which I believe, once completed, will complete the bounty.

Feel free to tag others from GMC or dev team that might be appropriate to weigh in here.

@jshufro
Copy link
Contributor

jshufro commented Feb 22, 2024

@jshufro Will appreciate your review of this before I move onto the additional "harder" alerts. Specifically looking for any feedback on technical direction/factoring and the remaining Todos above which I believe, once completed, will complete the bounty.

Feel free to tag others from GMC or dev team that might be appropriate to weigh in here.

Will give this a read over tomorrow!

Copy link
Contributor

@jshufro jshufro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very strong start.

I do think we want more flexibility in the TUI, at a minimum the ability to enable/disable specific receivers and groups of alerts for specific receivers.

You don't want node operators editing yaml themselves... it's not pretty.

rocketpool-cli/service/config/settings-metrics.go Outdated Show resolved Hide resolved
rocketpool/node/collectors/node-collector.go Show resolved Hide resolved
rocketpool/node/collectors/node-collector.go Outdated Show resolved Hide resolved
shared/services/config/alertmanager-config.go Show resolved Hide resolved
@activescott
Copy link
Contributor Author

I do think we want more flexibility in the TUI, at a minimum the ability to enable/disable specific receivers and groups of alerts for specific receivers.

You don't want node operators editing yaml themselves... it's not pretty.

Okay how about I add a TUI entry for each "built in" alert with the choices "Do not alert" and "Discord" for now. I can use that to template the default rules file. Then other receivers can be added to the choices later (as well as an "All" I think?). Ok?

@jshufro
Copy link
Contributor

jshufro commented Feb 23, 2024

I do think we want more flexibility in the TUI, at a minimum the ability to enable/disable specific receivers and groups of alerts for specific receivers.
You don't want node operators editing yaml themselves... it's not pretty.

Okay how about I add a TUI entry for each "built in" alert with the choices "Do not alert" and "Discord" for now. I can use that to template the default rules file. Then other receivers can be added to the choices later (as well as an "All" I think?). Ok?

Sounds like a good approach. You might want to make a new TUI page sooner rather than later. It sounds like it's going to get 'busy' and having an alerts/notifications dedicated page will buy you more real-estate.

I'm kind of envisioning a checkbox style grid:

+-----------------+----------+----------+-----------+
|                 | discord  | email    | pagerduty |
| proposal alerts |    x     |     x    |           |
| hardware alerts |          |    x     |    x      |
| slashing alerts |          |          |      x    |
| ...             |          |          |           |
+-----------------+----------+----------+-----------+

something like this. Ideally hiding receivers that aren't configured. It's a big ask, but anything you can do directionally towards this would be helpful.

Copy link
Contributor

@jshufro jshufro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few comments

shared/types/api/node.go Show resolved Hide resolved
rocketpool/node/collectors/node-collector.go Show resolved Hide resolved
rocketpool/node/distribute-minipools.go Outdated Show resolved Hide resolved
rocketpool/node/manage-fee-recipient.go Outdated Show resolved Hide resolved
rocketpool/node/node.go Show resolved Hide resolved
rocketpool/node/promote-minipools.go Outdated Show resolved Hide resolved
rocketpool/node/reduce-bonds.go Outdated Show resolved Hide resolved
rocketpool/node/stake-prelaunch-minipools.go Outdated Show resolved Hide resolved
shared/services/alerting/alerting.go Outdated Show resolved Hide resolved
shared/services/requirements.go Outdated Show resolved Hide resolved
@@ -205,3 +205,23 @@ func (layout *standardLayout) mapParameterizedFormItems(params ...*parameterized
layout.parameters[param.item] = param
}
}

// Sets up a handler to return to the specified homePage when the user presses escape on the layout.
func (layout *standardLayout) setupEscapeReturnHomeHandler(md *mainDisplay, homePage *page) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: I found 10 copies of this code throughout the codebase and replaced them all with this function.

activescott and others added 2 commits March 3, 2024 17:00
@activescott activescott marked this pull request as ready for review March 4, 2024 01:09
@activescott
Copy link
Contributor Author

@jshufro When ya'll are back this PR and the corresponding PR in rocket-pool/smartnode-install#119 are ready to have a "final" review and some testing. I believe this is now feature complete, I just need to get a PR together for docs.

activescott added a commit to activescott/docs.rocketpool.net that referenced this pull request Mar 4, 2024
@activescott activescott changed the title feat: smartnode notification functionality for bounty BA022308 (PR 1 of 2) feat: smartnode notification functionality for bounty BA022308 (PR 1 of 3) Mar 4, 2024
@activescott
Copy link
Contributor Author

I believe that the combination of the 3 PRs mentioned above now meets the requirements of the bounty. While I think there are things that could be added (e.g. more delivery channels, more alerts, more TUI configuration options) this has become a significant body of work and a meaningful improvement for node operators so I'd like to follow up additional functionality in subsequent milestones.

I do have some notes above pointing out some things that went beyond the requirements in the bounty and any somewhat "grey areas" (e.g. native mode) in the interest of transparency. Happy to discuss anything at all.

What else can I do to move this forward?

@@ -14,8 +14,13 @@ require (
github.com/ferranbt/fastssz v0.1.3
github.com/gdamore/tcell/v2 v2.6.0
github.com/glendc/go-external-ip v0.1.0
github.com/go-openapi/errors v0.21.0
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: I needed the openapi dependencies for the client to call alertmanager. The other version bumps were done by go. I didn't notice any problems with those bumps but they were really just done by the go mod... command (get or tidy, I forget which).

@jshufro
Copy link
Contributor

jshufro commented Mar 7, 2024

What else can I do to move this forward?

I hope/think it's only stalled because of conference travel. @0xfornax should be able to approve and merge. @jclapis is the final verdict on whether the bounty is fulfilled but he's on paternity leave so we may have to ask the GMC if they're willing to take fornax/my word instead.

I just got back from Denver so I'll test a bit and do another review pass. @shfryn fyi

@0xfornax
Copy link
Member

0xfornax commented Mar 8, 2024

Greetings @activescott and @jshufro! I've started testing/reviewing the 3 PRs.

Comment on lines +319 to +325
const maxItems = 3
for i, alert := range status.Alerts {
fmt.Println(alert.ColorString())
if i == maxItems-1 {
break
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const maxItems = 3
for i, alert := range status.Alerts {
fmt.Println(alert.ColorString())
if i == maxItems-1 {
break
}
}
const maxItems = 3
for i, alert := range status.Alerts {
if i == maxItems {
break
}
fmt.Println(alert.ColorString())
}

upside- no need to reason about off-by-one errors
downside- one extra iteration if the compiler doesn't catch and optimize it.

this is the softest suggestion i think i've ever given though, so feel free to ignore

@0xfornax 0xfornax merged commit ef9ec7e into rocket-pool:master Mar 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants