Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: Action trigger APIs #5

Closed
seaneagan opened this issue Jul 7, 2020 · 3 comments
Closed

Discussion: Action trigger APIs #5

seaneagan opened this issue Jul 7, 2020 · 3 comments
Labels
area/helm Helm related issues and pull requests area/ux In pursuit of a delightful user experience enhancement New feature or request

Comments

@seaneagan
Copy link
Contributor

seaneagan commented Jul 7, 2020

The OnCondition fields currently in #2 seem error prone as there are many non-sensical condition states to use as triggers for certain actions e.g.:

  • install failure triggering a rollback
  • test failure (even after install) triggering a rollback
  • install/upgrade/test success triggering an uninstall/rollback.
  • reconciliation success/failure triggering anything

To stoke discussion, here is a sketch of an API for specifying triggers for actions which I think matches use cases more closely, with default values shown:

spec:
  install:
    # remediation strategy for install is always uninstall
    maxRetries: 5 # each retry implies remediation
    remediateLastRetry: false # set to false to leave last retry in place for debugging purposes
  upgrade:
    remediationStrategy: rollback # uninstall also supported
    maxRetries: 0
    remediateLastRetry: false
  test:
    enable: true
    interval: null
    # takes into account preceding install / upgrade remediation config
    remediateFailures: true 
    remediateDelayedFailures: false # failures due to interval or e.g. `helm.fluxcd.io/testAt` annotation
  rollback: # how, not when
    timeout: 300
  uninstall: # how, not when
    timeout: 300
@hiddeco
Copy link
Member

hiddeco commented Jul 7, 2020

The OnCondition fields currently in 3 seem error prone as there are many non-sensical condition states to use as triggers for certain actions e.g

In my opinion this is a user configuration problem as we provide sane defaults, cluster operators can guard (their cluster users) against those human errors by e.g. putting OPA rules in place to enforce validation on the HelmRelease resources.

I do understand that your design may be a tad more friendly to newcomers, but I do not think this is worth the limitations (in both configuration, and extension possibilities) it introduces.

@seaneagan
Copy link
Contributor Author

seaneagan commented Jul 7, 2020

In my opinion this is a user configuration problem as we provide sane defaults, cluster operators can guard (their cluster users) against those human errors by e.g. putting OPA rules in place to enforce validation on the HelmRelease resources.

But if there is an API design which enables use cases while avoiding putting this burden on cluster operators and users who will encounter these rule failures and have to determine themselves how to fix, that seems preferable.

I do understand that your design may be a tad more friendly to newcomers

I think it's important to optimize APIs for the most common use cases, which includes not just sane defaults, but also making common non-default configurations easy to discover/read/write/compose.

but I do not think this is worth the limitations (in both configuration, and extension possibilities) it introduces.

One thing to consider is that adding features (extension) is much easier than removing features. More importantly, I'm not sure I agree that the OnCondition API is more extensible than a use case targeted API such as the above API sketch. With a use case targeted API, new use cases can be accounted for by carefully adding new fields over time as new use cases are revealed. However with the OnCondition API, only conditions can be taken into account, so we may find ourselves needing to pollute the condition space solely for consumption by the OnCondtion APIs. Also the needs of the OnCondition API may be in conflict with other use cases for conditions and standards for conditions driven by the community (see kubernetes/community#4521 and kubernetes/enhancements#1624 ). For example, here are a couple use cases which the current OnCondition API doesn't support which the above API sketch does:

  1. rollback / uninstall of test failures. Since the OnCondition API only supports an OR of the conditions, one can only specify to do rollbacks on test failure, not upgrade success AND test failure, which is what would be needed to avoid the controller attempting to rollback an install after a test failure, which won't work (nothing to rollback to). Similarly one cannot specify to do uninstalls on install success AND test failure, if they use just test failure, they would uninstall upgrades whereas they may prefer a rollback for that, and may even accidentally configure both a rollback and an uninstall.
  2. leaving failed releases alone after last retry for debugging purposes. There is no condition which represents how many retries are left, so one can only specify to always or never rollback/uninstall after upgrade/install failure, without distinguishing between whether there are retries left.

Are there any use cases you have in mind which the OnCondition API supports which the above API sketch for example doesn't?

@hiddeco hiddeco added enhancement New feature or request area/helm Helm related issues and pull requests area/ux In pursuit of a delightful user experience labels Jul 13, 2020
@hiddeco
Copy link
Member

hiddeco commented Jul 17, 2020

Please continue the conversation in fluxcd/flux2#102.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/helm Helm related issues and pull requests area/ux In pursuit of a delightful user experience enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants