Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sdm_health_exporter.py - v0.0.1 #10

Merged
merged 2 commits into from
Feb 24, 2022

Conversation

cigoldstein
Copy link
Contributor

This script serves as an example exporter that can monitor the
health of resources ("Infrastructure") and nodes ("Gateways/Relays").

The script uses the following workflow:

  • Make an API call to strongDM's API to retrive information about resources and
    nodes. The frequency of the API call is configurable by updating the "update_interval"
    variable in "main()"

  • Collect data about any resource or node that is tagged with <alert_tag> in strongDM.
    This tag is configurable by updating the "alert_tag" variable in "main()"

  • Export metrics to a prometheus endpoint as a "Gauge" (0 for healthy, 1 for unhealthy)

This script serves as an example exporter that can monitor the
health of resources ("Infrastructure") and nodes ("Gateways/Relays").

The script uses the following workflow:

- Make an API call to strongDM's API to retrive information about resources and
nodes. The frequency of the API call is configurable by updating the "update_interval"
variable in "main()"

- Collect data about any resource or node that is tagged with <alert_tag> in strongDM.
This tag is configurable by updating the "alert_tag" variable in "main()"

- Export metrics to a prometheus endpoint as a "Gauge" (0 for healthy, 1 for unhealthy)
Copy link
Contributor

@camposer camposer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much @cigoldstein

This is really good stuff. Many other customers have asked for the same functionality, so definitively very useful!

We left some comments there for your consideration. Once you make the changes, we'll be notified and merge the code.

Thanks again for your contrib

Comment on lines 54 to 55
api_id = os.environ['SDM_API_ID']
api_secret = os.environ['SDM_API_SECRET']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename:

  • SDM_API_ID to SDM_API_ACCESS_KEY
  • SDM_API_SECRET to SDM_API_SECRET_KEY


# node health is returned as "started" or "stopped"
# make sure that the "state" value we received is expected
if node.state not in ("started", "stopped"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't really care about any other value than started, so not sure if we want to keep this validation. In case you really want to keep it, we need to include new:

if node.state not in ("started", "stopped", "new"):

- Renamed API key/secret environment variables
- Removed validation check for states being "started" or "stopped"
- Added the "health" and "state" prometheus labels for resources and
nodes respectively to make it easy to see the current health in prometheus
@cigoldstein
Copy link
Contributor Author

I've incorporated the comments above and also added some additional Prometheus labels. Thank you for accepting this into the contrib repo!

@camposer
Copy link
Contributor

Nice job @cigoldstein !

Thanks for the contribution, we're pretty sure it's going to be highly appreciated by the strongDM community.

@camposer camposer merged commit 4839040 into strongdm:main Feb 24, 2022
hunter-stradley pushed a commit that referenced this pull request Apr 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants