Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cooperative Pod handling - Prevent restart loops and increase user friendliness #733

Open
thecooldrop opened this issue Mar 25, 2022 · 1 comment
Labels
enhancement New feature or request not-stale

Comments

@thecooldrop
Copy link

Describe the solution you'd like
Currently there seems to be an issue in this operator, which makes it very very hard to deploy it into clusters which have other mutating operators within it. Namely this operator does not tolerate Pod changes which are made by other operators and it causes the Jenkins master pod to go into restart-loop if there is even a slightest mismatch between the provisioned Pod and the containers described in the Jenkins CRD.

An example of this behavior when we are using other operators or controllers which inject additional environmen variables into the provisioned Pods. As far as I am aware this happens if New Relic operators are used, but it also happens in many other scenarios where outside systems make changes to the provisioned Pods.

My concrete problem example is that we are using AWS EKS and we are assigning AWS roles to the ServiceAccount which is bound to the Jenkins master Pod. When roles are assumed via this mechanism the AWS injects additional environment variables and volume mounts into the Pod which assumed such a role. This causes the Jenkins Operator to observe a change in volume mounts and environment variables and it terminates the Jenkins master Pod.

This is a highly confusing situation because the users are usually not aware of additional environment variables or volume mount changes introduced by other controllers deployed in their clusters. Since the user is not aware that there are differences it is practically impossible to debug without looking deeply into the source code of this operator and then introducing the additional variables into the Jenkins CRD so that the operator does not detect any changes.

My proposal here is that the comparison rules between actual and expected Jenkins master Pod be relaxed. I think that following proposals may help:

  • If there are differences currently only the expected state is logged out, but not the actual state. This makes it hard to find the difference since users have to download their own resource definitions and then compare. If there are differences, then both expected and actual states should be logged out to make debugging easier.
  • If environment variable ENV_VAR is included in Jenkins master Pod, but is not included in Jenkins CRD, then it should not be included in the comparison. This way if some other operator injects environment variables into Pods we should not care about them, as long as they do not overwrite any variables managed by Jenkins Operator.
  • The principle outlined above for environment variables should be applied to all other resources. Namely as long as all desired configurations from Jenkins CRD are present in the Jenkins Pod then it is all okay and no restarts should be done.

The changes above would make this operator work better in conjuction with other operators of which there are ever more.

While thinking about this I realized that these issues may be resolved by switching from Jenkins as a Pod to Jenkins as a Deployment. If that is the case then I would appreciate having a list of open tasks which are related to migrating the operator from Pod-based to Deployment-based.

@thecooldrop thecooldrop added the enhancement New feature or request label Mar 25, 2022
@stale
Copy link

stale bot commented Apr 27, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If this issue is still affecting you, just comment with any updates and we'll keep it open. Thank you for your contributions.

@stale stale bot added the stale label Apr 27, 2022
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale May 8, 2023
@brokenpip3 brokenpip3 added not-stale and removed stale labels May 9, 2023
@brokenpip3 brokenpip3 reopened this May 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request not-stale
Projects
None yet
Development

No branches or pull requests

2 participants