Skip to content
This repository has been archived by the owner on Nov 1, 2022. It is now read-only.

docs: improve monitoring documentation #384

Merged
merged 3 commits into from
Apr 20, 2020
Merged

docs: improve monitoring documentation #384

merged 3 commits into from
Apr 20, 2020

Conversation

hiddeco
Copy link
Member

@hiddeco hiddeco commented Apr 20, 2020

Fixes #381.
Fixes #62.
Documents #383.

@hiddeco hiddeco added the docs Issue or PR related to documentation label Apr 20, 2020
@stefanprodan
Copy link
Member

I think we should include the list:

	// Unknown is mapped to 0
	v1.HelmReleasePhaseChartFetchFailed: -4,
	v1.HelmReleasePhaseFailed:           -3,
	v1.HelmReleasePhaseRollbackFailed:   -2,
	v1.HelmReleasePhaseRolledBack:       -1,
	v1.HelmReleasePhaseRollingBack:      1,
	v1.HelmReleasePhaseInstalling:       2,
	v1.HelmReleasePhaseUpgrading:        3,
	v1.HelmReleasePhaseChartFetched:     4,
	v1.HelmReleasePhaseSucceeded:        5,

@stefanprodan
Copy link
Member

We could also include an example of how to alert on failed releases, similar to flux docs: https://docs.fluxcd.io/en/1.19.0/references/monitoring/

@hiddeco
Copy link
Member Author

hiddeco commented Apr 20, 2020

@stefanprodan yes, that was also my initial though but I was having a hard time finding the right formatting for all those values. Given the document is simple and table based at the moment.

@onedr0p
Copy link

onedr0p commented Apr 20, 2020

@hiddeco You can include See phases below link and list them outside the table

@sa-spag
Copy link
Contributor

sa-spag commented Apr 20, 2020

If you want to mention Prometheus alert rules, I suggest the following:

  • the Helm Operator struggles to process its release queue in a timely manner:
alert: HelmOperatorLowThroughput
expr: flux_helm_operator_release_queue_length_count > 0
for: 30m
  • a HelmRelease was automatically rolled back:
alert: HelmReleaseRolledBack
expr: flux_helm_operator_release_phase_info == -1
  • a HelmRelease is subject to an error:
alert: HelmReleaseError
expr: flux_helm_operator_release_phase_info < -1

Note that these does not cover Helm releases statuses as exposed by sstarcher/helm-exporter (you might want to include that notice in the docs).

Copy link
Member

@stefanprodan stefanprodan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 💯

@hiddeco hiddeco merged commit 6134f6f into master Apr 20, 2020
@hiddeco hiddeco deleted the docs/metrics branch April 20, 2020 18:54
@hiddeco hiddeco added this to the 1.1.0 milestone Apr 20, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
docs Issue or PR related to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Clarify meaning of metrics on monitoring.md Recommendations for alerting on failed releases
4 participants