Discussion: Action trigger APIs #5

seaneagan · 2020-07-07T15:33:51Z

The OnCondition fields currently in #2 seem error prone as there are many non-sensical condition states to use as triggers for certain actions e.g.:

install failure triggering a rollback
test failure (even after install) triggering a rollback
install/upgrade/test success triggering an uninstall/rollback.
reconciliation success/failure triggering anything

To stoke discussion, here is a sketch of an API for specifying triggers for actions which I think matches use cases more closely, with default values shown:

spec:
  install:
    # remediation strategy for install is always uninstall
    maxRetries: 5 # each retry implies remediation
    remediateLastRetry: false # set to false to leave last retry in place for debugging purposes
  upgrade:
    remediationStrategy: rollback # uninstall also supported
    maxRetries: 0
    remediateLastRetry: false
  test:
    enable: true
    interval: null
    # takes into account preceding install / upgrade remediation config
    remediateFailures: true 
    remediateDelayedFailures: false # failures due to interval or e.g. `helm.fluxcd.io/testAt` annotation
  rollback: # how, not when
    timeout: 300
  uninstall: # how, not when
    timeout: 300

hiddeco · 2020-07-07T16:06:15Z

The OnCondition fields currently in 3 seem error prone as there are many non-sensical condition states to use as triggers for certain actions e.g

In my opinion this is a user configuration problem as we provide sane defaults, cluster operators can guard (their cluster users) against those human errors by e.g. putting OPA rules in place to enforce validation on the HelmRelease resources.

I do understand that your design may be a tad more friendly to newcomers, but I do not think this is worth the limitations (in both configuration, and extension possibilities) it introduces.

seaneagan · 2020-07-07T17:32:22Z

In my opinion this is a user configuration problem as we provide sane defaults, cluster operators can guard (their cluster users) against those human errors by e.g. putting OPA rules in place to enforce validation on the HelmRelease resources.

But if there is an API design which enables use cases while avoiding putting this burden on cluster operators and users who will encounter these rule failures and have to determine themselves how to fix, that seems preferable.

I do understand that your design may be a tad more friendly to newcomers

I think it's important to optimize APIs for the most common use cases, which includes not just sane defaults, but also making common non-default configurations easy to discover/read/write/compose.

but I do not think this is worth the limitations (in both configuration, and extension possibilities) it introduces.

One thing to consider is that adding features (extension) is much easier than removing features. More importantly, I'm not sure I agree that the OnCondition API is more extensible than a use case targeted API such as the above API sketch. With a use case targeted API, new use cases can be accounted for by carefully adding new fields over time as new use cases are revealed. However with the OnCondition API, only conditions can be taken into account, so we may find ourselves needing to pollute the condition space solely for consumption by the OnCondtion APIs. Also the needs of the OnCondition API may be in conflict with other use cases for conditions and standards for conditions driven by the community (see kubernetes/community#4521 and kubernetes/enhancements#1624 ). For example, here are a couple use cases which the current OnCondition API doesn't support which the above API sketch does:

rollback / uninstall of test failures. Since the OnCondition API only supports an OR of the conditions, one can only specify to do rollbacks on test failure, not upgrade success AND test failure, which is what would be needed to avoid the controller attempting to rollback an install after a test failure, which won't work (nothing to rollback to). Similarly one cannot specify to do uninstalls on install success AND test failure, if they use just test failure, they would uninstall upgrades whereas they may prefer a rollback for that, and may even accidentally configure both a rollback and an uninstall.
leaving failed releases alone after last retry for debugging purposes. There is no condition which represents how many retries are left, so one can only specify to always or never rollback/uninstall after upgrade/install failure, without distinguishing between whether there are retries left.

Are there any use cases you have in mind which the OnCondition API supports which the above API sketch for example doesn't?

hiddeco · 2020-07-17T08:52:13Z

Please continue the conversation in fluxcd/flux2#102.

hiddeco added enhancement New feature or request area/helm Helm related issues and pull requests area/ux In pursuit of a delightful user experience labels Jul 13, 2020

hiddeco mentioned this issue Jul 23, 2020

Conditional remediation on Helm actions #41

Closed

hiddeco closed this as completed Jul 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discussion: Action trigger APIs #5

Discussion: Action trigger APIs #5

seaneagan commented Jul 7, 2020 •

edited

Loading

hiddeco commented Jul 7, 2020

seaneagan commented Jul 7, 2020 •

edited

Loading

hiddeco commented Jul 17, 2020

Discussion: Action trigger APIs #5

Discussion: Action trigger APIs #5

Comments

seaneagan commented Jul 7, 2020 • edited Loading

hiddeco commented Jul 7, 2020

seaneagan commented Jul 7, 2020 • edited Loading

hiddeco commented Jul 17, 2020

seaneagan commented Jul 7, 2020 •

edited

Loading

seaneagan commented Jul 7, 2020 •

edited

Loading