-
Notifications
You must be signed in to change notification settings - Fork 16.9k
[incubator/sparkoperator] introduce imagePullSecret at sparkoperator deployments #18162
[incubator/sparkoperator] introduce imagePullSecret at sparkoperator deployments #18162
Conversation
Signed-off-by: Roland Johann <roland.johann@phenetic.io>
Signed-off-by: Roland Johann <roland.johann@phenetic.io>
Signed-off-by: Roland Johann <roland.johann@phenetic.io>
Hi @rolandjohann. Thanks for your PR. I'm waiting for a helm member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: liyinan926, rolandjohann The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/ok-to-test |
/retest |
@rolandjohann
|
@zanhsieh saw that, too. There is already a hook triggered CRD cleaner, and I didn't introduce any additional CRD - just define the The log indicates to me that the chart install errors because of failure at the previous run where the chart delete has been forced and the CRDs are still there. I faced similar issues when we enabled PSPs and the CRD cleanup job pod can't be scheduled because of policy violations. The log is not verbose about the failure of apply of the previous run and I don't have access to k8s to actually see what's going wrong there. Please correct me if I'm wrong: The changes introduced by me recently can't lead to failing tests here. |
I'm not sure how to proceed from here. Maybe someone with sufficient permissions can inspect the cluster to determine what's the root cause of the failing tests. Can someone point me to the repository of the test container code, so we can execute them manually? |
@rolandjohann https://github.com/helm/charts/blob/master/incubator/sparkoperator/templates/webhook-cleanup-job.yaml All three above do not implement CRD clean at all. |
/retest |
@vsliouniaev |
@zanhsieh from my experience with the tests in this repo, the cluster used for all PRs is the same. If the clean-up job failed to execute, then CRDs may already exist in the cluster. To solve this you will need to do the following two separate things: 1. Set up the chartIn order to solve this problem in the stable/prometheus-operator chart I had to do the following:
Here is how this is done for one of the CRDs in the prometheus-operator chart:
And then we have a separate values yaml we 2. Set up a hack PR in cases where Helm breaksHelm does not properly support CRDs, despite arguments that it does and all the hooks that are available for them. It may work 95% of the time, but will still fail in those 5% of cases. There should at least be an escape-hatch provided in the chart to install it by provisioning the CRDs first, then the rest of the components. This will become more obvious when the chart begins to include resources for which it creates CRDs. Since this is likely going to still bite you in this repo you should also create a hack PR that will simply execute a post-install / pre-install job to delete the CRDs in the cluster. If you ever get stuck in a loop like you're in now you can re-run tests on that PR and get it to clean up the test cluster for you. It has been suggested to me that it may be easier to move the charts out of this repo and maintain them externally, adding the repo to helm hub. |
@vsliouniaev thanks for the insights. Is there a chance that we can inspect logs of k8s resources to get the root cause of the failing test? The helm chart should be uninstalled regarding the log, so the CRDs should be deleted as well. Is it possible to manually fix the test clusters broken state so we can run the tests for this PR? Maybe we create a separate issue for that to align how to fix the tests. |
I don't really understand what you mean. The error in the test run is:
There is absolutely nothing to guarantee that helm will run everything end-to-end. In an ideal scenario a delete will trigger the hooks, which will most likely perform the operation you intend. But that cannot be guaranteed to occur 100% of the time.
Yes, I believe I outlined how to get this done in part 2 here: #18162 (comment) |
@vsliouniaev @rolandjohann |
The stable/prometheus-operator chart has started suffering from this bug during CI #9241
|
* upstream/master: (134 commits) [stable/external-dns] Enable RBAC resources by default (helm#18398) [incubator/sparkoperator] introduce imagePullSecret at sparkoperator deployments (helm#18162) [stable/prometheus-operator] Remove CRD hooks from CI entirely (helm#18271) [stable/mcrouter] upgrade requirements (helm#18313) [stable/fluent-bit] Upgrade fluentbit to 1.3.2 (helm#18328) Update deployment.yaml (helm#18376) [stable/sysdig] Fix 1.16 compatibility errors introduced in v1.4.19 (helm#18230) [stable/bookstack] Fix mariadb values in README (helm#18385) [stable/memcached] - Add servicemonitor for Prometheus-operator (helm#18386) [stable/prometheus-cloudwatch-exporter] Update cloudwatch-exporter version. (helm#18003) [stable/prometheus-cloudwatch-exporter] Add myself to owners/maintainers. (helm#18093) [incubator/vault] Api version backwards compatibility (helm#18378) [stable/memcached] upgrade metrics exporter, fix PDB (helm#18387) [stable/home-assistant] Fix wildcard home-assistant ingress (helm#18384) [stable/karma] Upgrade to 0.48 (helm#17715) [stable/joomla] Release 6.1.11 updating components versions (helm#18364) [stable/prometheus-node-exporter] update NOTES.txt to show correct port (helm#17931) [stable/apm-server] Use a non-root user (helm#18366) [stable/airflow] Fix template directive value of nodePort for web service (helm#18059) [stable/spinnaker] Support serviceConfigs for local offline installation. (helm#18365) ...
* master: (135 commits) [stable/external-dns] Enable RBAC resources by default (helm#18398) [incubator/sparkoperator] introduce imagePullSecret at sparkoperator deployments (helm#18162) [stable/prometheus-operator] Remove CRD hooks from CI entirely (helm#18271) [stable/mcrouter] upgrade requirements (helm#18313) [stable/fluent-bit] Upgrade fluentbit to 1.3.2 (helm#18328) Update deployment.yaml (helm#18376) [stable/sysdig] Fix 1.16 compatibility errors introduced in v1.4.19 (helm#18230) [stable/bookstack] Fix mariadb values in README (helm#18385) [stable/memcached] - Add servicemonitor for Prometheus-operator (helm#18386) [stable/prometheus-cloudwatch-exporter] Update cloudwatch-exporter version. (helm#18003) [stable/prometheus-cloudwatch-exporter] Add myself to owners/maintainers. (helm#18093) [incubator/vault] Api version backwards compatibility (helm#18378) [stable/memcached] upgrade metrics exporter, fix PDB (helm#18387) [stable/home-assistant] Fix wildcard home-assistant ingress (helm#18384) [stable/karma] Upgrade to 0.48 (helm#17715) [stable/joomla] Release 6.1.11 updating components versions (helm#18364) [stable/prometheus-node-exporter] update NOTES.txt to show correct port (helm#17931) [stable/apm-server] Use a non-root user (helm#18366) [stable/airflow] Fix template directive value of nodePort for web service (helm#18059) [stable/spinnaker] Support serviceConfigs for local offline installation. (helm#18365) ...
…deployments (helm#18162) * introduce imagePullSecret at sparkoperator deployments Signed-off-by: Roland Johann <roland.johann@phenetic.io> * bump chart version Signed-off-by: Roland Johann <roland.johann@phenetic.io> * document additional var at README.md Signed-off-by: Roland Johann <roland.johann@phenetic.io>
…deployments (helm#18162) * introduce imagePullSecret at sparkoperator deployments Signed-off-by: Roland Johann <roland.johann@phenetic.io> * bump chart version Signed-off-by: Roland Johann <roland.johann@phenetic.io> * document additional var at README.md Signed-off-by: Roland Johann <roland.johann@phenetic.io>
Which issue this PR fixes
imagePullSecrets
at deployments so we can usesparkoperator
images from private docker reposSpecial notes for your reviewer:
Checklist
[Place an '[x]' (no spaces) in all applicable fields. Please remove unrelated fields.]
[stable/chart]
)