Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Operator continuously update resources in AKS #4035

Closed
vutkin opened this issue Dec 2, 2020 · 7 comments · Fixed by #4076
Closed

[Bug] Operator continuously update resources in AKS #4035

vutkin opened this issue Dec 2, 2020 · 7 comments · Fixed by #4076
Labels

Comments

@vutkin
Copy link

vutkin commented Dec 2, 2020

Describe the bug
In our case we noticed that when we trying to create internal LB in AKS managed by Rancher (this is important) using annotation:

externalBootstrapService:
  metadata:
    annotations:
      service.beta.kubernetes.io/azure-load-balancer-internal: "true"
perPodService:
  metadata:
    annotations:
      service.beta.kubernetes.io/azure-load-balancer-internal: "true"

we see that time needed for creating LB is ~15 min.
The root cause is that Strimzi operator and Rancher (which adds custom annotations like /metadata/annotations/field.cattle.io~1publicEndpoints) simultaneously try to change the same services of LoadBalancer type. Rancher adds some labels and annotations to services and Strimzi operator removes that data. That process caused continuously updating Azure load balancer configuration. As a result, Azure load balancer stays in an "updating" state with a long processing queue.

To Reproduce
Steps to reproduce the behavior:

  1. Create internal LB for external listener
  2. Create AKS cluster managed by Rancher (https://rancher.com/docs/rancher/v2.x/en/cluster-provisioning/hosted-kubernetes-clusters/aks/)
  3. Create Internal LB by adding annotations: creating takes more than 15 min

Expected behavior
LB creation takes less than 2 min.

Environment (please complete the following information):

  • Strimzi version: 0.18
  • Installation method: Helm chart
  • Kubernetes cluster: Kubernetes 1.18
  • Infrastructure: Azure AKS/Rancher

YAML files and logs

image

Additional context
Suggestion: add ability ignore changes for annotations/labels for resources watched by operator:

example:
ignore_changes = [
      metadata[0].annotations["cattle.io/*"],
      metadata[0].annotations["field.cattle.io~1publicEndpoints"]
    ]
@vutkin vutkin added the bug label Dec 2, 2020
@vutkin
Copy link
Author

vutkin commented Dec 2, 2020

Same behavior I saw in this issue: #2558

@scholzj
Copy link
Member

scholzj commented Dec 2, 2020

Rancher seems to be the only distro I met so far doing all kind of weird stuff like this which nobody else does :-o.

I'm not sure we can so easily just ignore them. What we do, is we really just patch the service which removes these unexpected annotations. If needed, we can detect that these annotations are the only difference and not patch it in that case. But we still need to patch it if something changes and we need to update the service. That should decrease the frequency of it happening. But not remove it completely. So I wonder what is the meaning of these annotations and how important it is and whether this would still be causing issues. But I guess we can give it a try and see if it helps.

@vutkin
Copy link
Author

vutkin commented Dec 2, 2020

Rancher seems to be the only distro I met so far doing all kind of weird stuff like this which nobody else does :-o.

I think tools like FluxCD/ArgoCD do same: add internal annotations.

Sounds good, how we could do this?

@scholzj
Copy link
Member

scholzj commented Dec 2, 2020

If you are interested in looking into it and contributing it, I can try to guide you. If not, I will try to find some time for it and fix it my self.

@vutkin
Copy link
Author

vutkin commented Dec 3, 2020

Hi @scholzj, Unfortunately I am not strong in Java...

@mluiten
Copy link
Contributor

mluiten commented Dec 7, 2020

@scholzj if you can point me in the right direction, I would be eager to take a look if I can try and make an improvement.

Why does Strimzi want to remove annotations that have nothing to do with Strimzi? Is there a whitelist of annotations it should care about?

@scholzj
Copy link
Member

scholzj commented Dec 7, 2020

@mluiten Strimzi does not remove the annotation per se. It just reconciles the resource => i.e. applies our version of it which removes these annotations because it does not have them. We do not have any whitelists or blacklists because so far it was never needed until this case. But for this particular usecase I would assume one can add it here:

protected Future<ReconcileResult<Service>> internalPatch(String namespace, String name, Service current, Service desired) {

There we already handle some similar things in the service spec section such as assigned node ports or ipFamily etc. So I assume here we can have some allow-list for annotations which would be back ported from the original service to not get them deleted in the patch. That is at least where I planned to start ... but of course contributions are always welcomed, so if you wanna look into it you are more then welcomed.

mluiten added a commit to mluiten/strimzi-kafka-operator that referenced this issue Dec 10, 2020
)

Signed-off-by: Menno Luiten <menno.luiten@ah.nl>
mluiten added a commit to mluiten/strimzi-kafka-operator that referenced this issue Dec 10, 2020
…ations (strimzi#4035)

Signed-off-by: Menno Luiten <menno.luiten@ah.nl>
scholzj added a commit that referenced this issue Dec 12, 2020
)

* Patch Rancher Cattle annotations when reconciling services (#4035)

Signed-off-by: Menno Luiten <menno.luiten@ah.nl>

* Moved configuration of load balancer annotation to whitelist in Annotations (#4035)

Signed-off-by: Menno Luiten <menno.luiten@ah.nl>

* Replace regex with startsWith for better performance

Signed-off-by: Menno Luiten <menno.luiten@ah.nl>

* Fix checkstyles

Signed-off-by: Jakub Scholz <www@scholzj.com>

Co-authored-by: Jakub Scholz <www@scholzj.com>
liutao365 pushed a commit to liutao365/strimzi-kafka-operator that referenced this issue Dec 23, 2020
* Connect default logging not expanded (strimzi#4057)

Signed-off-by: Stanislav Knot <sknot@redhat.com>

* Add documentation for CC CORS (strimzi#4030)

* Add documentation for CC CORS

Signed-off-by: Kyle Liberti <kliberti@redhat.com>

* Capitalize CORS references

Signed-off-by: Kyle Liberti <kliberti@redhat.com>

* Update CHANGELOG.MD

Signed-off-by: Kyle Liberti <kliberti@redhat.com>

* Remove note irrelevant to Cruise Control config

Signed-off-by: Kyle Liberti <kliberti@redhat.com>

* Addressing comments

Signed-off-by: Kyle Liberti <kliberti@redhat.com>

* Addressing more comments

Signed-off-by: Kyle Liberti <kliberti@redhat.com>

* Address another typo

Signed-off-by: Kyle Liberti <kliberti@redhat.com>

* Add check for inter.broker.protocol.version and warning to status conditions (strimzi#4058)

* Add check for inter.broker.protocol.version and warning to status conditions

Signed-off-by: Jakub Scholz <www@scholzj.com>

* Fix spotbugs

Signed-off-by: Jakub Scholz <www@scholzj.com>

* Fix the version matching

Signed-off-by: Jakub Scholz <www@scholzj.com>

* Add comment and rename the pattern to make its purpose more clear

Signed-off-by: Jakub Scholz <www@scholzj.com>

* Address some fixes to avoid tests to delete CO in STs (strimzi#4068)

Signed-off-by: Jakub Stejskal <xstejs24@gmail.com>

* Make it possible to roll Kafka or ZooKeeper pods individually (strimzi#4070)

* Make it possible to roll Kafka or ZooKeeper pods individually

Signed-off-by: Jakub Scholz <www@scholzj.com>

* CHANGELOG.md

Signed-off-by: Jakub Scholz <www@scholzj.com>

* Remove commented out code

Signed-off-by: Jakub Scholz <www@scholzj.com>

* Avoid changing custom resource status because of HashSet ordering (strimzi#4069)

Signed-off-by: Jakub Scholz <www@scholzj.com>

* [systemtest] Tests for deploying Kafka and KafkaConnect without CRBs and enabled/disabled RackAware (strimzi#4045)

* add tests for missing CRBs and some improvements to KafkaConnectResource and utils

Signed-off-by: Lukas Kral <lukywill16@gmail.com>

* fixup! add tests for missing CRBs and some improvements to KafkaConnectResource and utils

Signed-off-by: Lukas Kral <lukywill16@gmail.com>

* change name of ST, do some cleanup in KafkaConnectResource (addressing Sam's comments)

Signed-off-by: Lukas Kral <lukywill16@gmail.com>

* fixes for failing tests after my changes

Signed-off-by: Lukas Kral <lukywill16@gmail.com>

* fixup! fixes for failing tests after my changes

Signed-off-by: Lukas Kral <lukywill16@gmail.com>

* Do not use ownerReference in UO and TO bindings into a different namespace (strimzi#4080)

Signed-off-by: Jakub Scholz <www@scholzj.com>

* Add test for chages in strimzi#3987 (strimzi#4087)

Signed-off-by: Tom Bentley <tbentley@redhat.com>

* Remove owner referneces from ClusterRoleBindings (strimzi#4077)

* Remove ownerReferneces from ClusterRoleBindings

Signed-off-by: Jakub Scholz <www@scholzj.com>

* Review comments

Signed-off-by: Jakub Scholz <www@scholzj.com>

* Review comments II

Signed-off-by: Jakub Scholz <www@scholzj.com>

* Fix log message search in STs

Signed-off-by: Jakub Scholz <www@scholzj.com>

* [DOC] new rack awareness image (strimzi#4074)

Signed-off-by: prmellor <pmellor@redhat.com>

* [DOC] Add the guide for running multiple Connect instances also to the deploying guide (strimzi#4079)

* Add the guide for running multiple Connect instances also to the deploying guide

Signed-off-by: Jakub Scholz <www@scholzj.com>

* Fix comment

Signed-off-by: Jakub Scholz <www@scholzj.com>

* [systemtest] Tests for NetworkPolicy enhancements (strimzi#4085)

* add tests for np enhancements, create ST for NPs etc.

Signed-off-by: Lukas Kral <lukywill16@gmail.com>

* Jakub's comment

Signed-off-by: Lukas Kral <lukywill16@gmail.com>

* Jakub's comment vol.2

Signed-off-by: Lukas Kral <lukywill16@gmail.com>

* Add forbidden prefix exceptions to CC docs (strimzi#4095)

Signed-off-by: Kyle Liberti <kliberti@redhat.com>

* strimzi#4035: Patch Rancher Cattle annotations when reconciling services (strimzi#4076)

* Patch Rancher Cattle annotations when reconciling services (strimzi#4035)

Signed-off-by: Menno Luiten <menno.luiten@ah.nl>

* Moved configuration of load balancer annotation to whitelist in Annotations (strimzi#4035)

Signed-off-by: Menno Luiten <menno.luiten@ah.nl>

* Replace regex with startsWith for better performance

Signed-off-by: Menno Luiten <menno.luiten@ah.nl>

* Fix checkstyles

Signed-off-by: Jakub Scholz <www@scholzj.com>

Co-authored-by: Jakub Scholz <www@scholzj.com>

* CO should be more verbose when resource name is missing (strimzi#4075)

* CO should be more verbose when resource name is missing

Signed-off-by: Stanislav Knot <sknot@redhat.com>

* comments

Signed-off-by: Stanislav Knot <sknot@redhat.com>

* comments

Signed-off-by: Stanislav Knot <sknot@redhat.com>

* priority

Signed-off-by: Stanislav Knot <sknot@redhat.com>

* comments

Signed-off-by: Stanislav Knot <sknot@redhat.com>

* test

Signed-off-by: Stanislav Knot <sknot@redhat.com>

* comments

Signed-off-by: Stanislav Knot <sknot@redhat.com>

* Move Question issues to Discussions (strimzi#4100)

Signed-off-by: Jakub Scholz <www@scholzj.com>

* Update Helm Chart index.yaml file with 0.20.1 release (strimzi#4106)

Signed-off-by: Jakub Scholz <www@scholzj.com>

* [DOC] Fix version field in the sample Kafka YAML (strimzi#4107)

Signed-off-by: Jakub Scholz <www@scholzj.com>

* feat: Update Jackson version to 2.11.0 (strimzi#4102)

* feat: Update Jackson version to 2.11.0

Update Jackson version from 2.10.2 to 2.11.0
to resolve security vulnerability CVE-2020-25649

Signed-off-by: Salma Saeed <salma.saeed@ibm.com>

* refactor: Change jackson to version 2.10.5.1

Signed-off-by: Salma Saeed <salma.saeed@ibm.com>

* refactor: Update jackson-databind to 2.10.5.1

Update jackson-databind to 2.10.5.1 for security fix.

Signed-off-by: Salma Saeed <salma.saeed@ibm.com>

* Make fetching CMs async (strimzi#4084)

* Make fetching CMs async

Signed-off-by: Stanislav Knot <sknot@redhat.com>

* comments

Signed-off-by: Stanislav Knot <sknot@redhat.com>

* avoid using composite fut

Signed-off-by: Stanislav Knot <sknot@redhat.com>

* refactor

Signed-off-by: Stanislav Knot <sknot@redhat.com>

* fix

Signed-off-by: Stanislav Knot <sknot@redhat.com>

* Replace oc cluster up with minikube for PR jenkins jobs (strimzi#4098)

* Replace oc cluster up with minikube

Signed-off-by: Jakub Stejskal <xstejs24@gmail.com>

* fixup! Replace oc cluster up with minikube

Signed-off-by: Jakub Stejskal <xstejs24@gmail.com>

* fixup! fixup! Replace oc cluster up with minikube

Signed-off-by: Jakub Stejskal <xstejs24@gmail.com>

* fixup! fixup! fixup! Replace oc cluster up with minikube

Signed-off-by: Jakub Stejskal <xstejs24@gmail.com>

* fixup! fixup! fixup! fixup! Replace oc cluster up with minikube

Signed-off-by: Jakub Stejskal <xstejs24@gmail.com>

* Set RBAC for all kube versions

Signed-off-by: Jakub Stejskal <xstejs24@gmail.com>

* Allow network policy on minikube

Signed-off-by: Jakub Stejskal <xstejs24@gmail.com>

* Revert changes for NP

Signed-off-by: Jakub Stejskal <xstejs24@gmail.com>

* Set default version properly

Signed-off-by: Jakub Stejskal <xstejs24@gmail.com>

* Set default versiopn fo result reporting from PR job to github

Signed-off-by: Jakub Stejskal <xstejs24@gmail.com>

* Prometheus operator does not parse configuration (strimzi#4104)

* Prometheus operator does not parse configuration

Signed-off-by: Stanislav Knot <sknot@redhat.com>

* comments

Signed-off-by: Stanislav Knot <sknot@redhat.com>

* [DOC] Updates to schema reference for Kafka config (strimzi#4063)

* [DOC] Updates to schema reference

Signed-off-by: prmellor <pmellor@redhat.com>

* review edits PP

Signed-off-by: prmellor <pmellor@redhat.com>

* [DOC] Clarify Kafka Connect dependencies when configuring MM2 (strimzi#4078)

* [DOC] Clairify Kafka Connect dependencies when configuring MM2

Signed-off-by: prmellor <pmellor@redhat.com>

* review edits JS

Signed-off-by: prmellor <pmellor@redhat.com>

* ZookepeerUpgradeST -> KafkaUpgradeDowngradeST (strimzi#4125)

Signed-off-by: Jakub Stejskal <xstejs24@gmail.com>

* Reflect rename of upgrade tests in azp (strimzi#4127)

Signed-off-by: Jakub Stejskal <xstejs24@gmail.com>

* Disable PrometheusST to see if it fixes regression bundle VII (strimzi#4129)

Signed-off-by: Jakub Scholz <www@scholzj.com>

* Bump up jupiter version (strimzi#4126)

* Bump up jupiter version

Signed-off-by: Stanislav Knot <sknot@redhat.com>

* to chce trochu verit, hosi

Signed-off-by: Stanislav Knot <sknot@redhat.com>

* revert vertx

Signed-off-by: Stanislav Knot <sknot@redhat.com>

* Remove few tests from acceptance tag (strimzi#4132)

* Remove few tests from acceptance profile

Signed-off-by: Jakub Stejskal <xstejs24@gmail.com>

* Remove redundant cc test from acceptance

Signed-off-by: Jakub Stejskal <xstejs24@gmail.com>

* Remove the PrometheusST system test class (strimzi#4131)

* Try to find out what broke the PrometheusST tests

Signed-off-by: Jakub Scholz <www@scholzj.com>

* Try newer version of the Prometheus Operator

Signed-off-by: Jakub Scholz <www@scholzj.com>

* Remove the PrometheusST class which does not work proeprly on the Minikube on Azure and hasanyway little value

Signed-off-by: Jakub Scholz <www@scholzj.com>

* Remove the remains of the -server JVM option (strimzi#4134)

* Remove the remains of the -server JVM option

Signed-off-by: Jakub Scholz <www@scholzj.com>

* Fix system tests

Signed-off-by: Jakub Scholz <www@scholzj.com>

* [DOC] kafka config assembly update for Using Guide (strimzi#4064)

* [DOC] New config procedure for Kafka

Signed-off-by: prmellor <pmellor@redhat.com>

* review edits DL

Signed-off-by: prmellor <pmellor@redhat.com>

* [DOC] link/reference updates for update Kafka config section (strimzi#4066)

Signed-off-by: prmellor <pmellor@redhat.com>

* [DOC] Minor edits - link fix and typos (strimzi#4130)

* [DOC] Minor edits - link fix and typos

Signed-off-by: prmellor <pmellor@redhat.com>

* more typos

Signed-off-by: prmellor <pmellor@redhat.com>

* doc gen for updated java file

Signed-off-by: prmellor <pmellor@redhat.com>

* Add support for Kafka 2.7.0 (strimzi#4115)

* Add support for Kafka 2.7.0-RC5

Signed-off-by: Jakub Scholz <www@scholzj.com>

* Fix failing STs

Signed-off-by: Jakub Scholz <www@scholzj.com>

* Fix ZookeeperUpgrade tests

Signed-off-by: Jakub Scholz <www@scholzj.com>

* Fix imports

Signed-off-by: Jakub Scholz <www@scholzj.com>

* Fix some more STs

Signed-off-by: Jakub Scholz <www@scholzj.com>

* Move to final version

Signed-off-by: Jakub Scholz <www@scholzj.com>

* Use the right Scala version

Signed-off-by: Jakub Scholz <www@scholzj.com>

* Fix YQ version in the GitHub Actions / CodeQL workflow (strimzi#4145)

* Fix YQ version in the GitHub Actions / CodeQL workflow

Signed-off-by: Jakub Scholz <www@scholzj.com>

* Another try to fix the version

Signed-off-by: Jakub Scholz <www@scholzj.com>

* Try to speed up the build with Maven cache

Signed-off-by: Jakub Scholz <www@scholzj.com>

* Update Helm2 to handle change chart repository addresses (strimzi#4146)

Signed-off-by: Jakub Scholz <www@scholzj.com>

* [DOC] Files removed from Kafka config section (strimzi#4065)

Signed-off-by: prmellor <pmellor@redhat.com>

* Add missing updates to the generated docu files with Kafka versions (strimzi#4142)

Signed-off-by: Jakub Scholz <www@scholzj.com>

* Include some more useful targets into make all command (strimzi#4144)

Signed-off-by: Jakub Scholz <www@scholzj.com>

* Add PDF build for our documentation (strimzi#4143)

* Add PDF buold for our documentation

Signed-off-by: Jakub Scholz <www@scholzj.com>

* Add docu_pdfclean to docu_clean target

Signed-off-by: Jakub Scholz <www@scholzj.com>

Co-authored-by: Stanislav Knot <sknot@redhat.com>
Co-authored-by: Kyle Liberti <kliberti@redhat.com>
Co-authored-by: Jakub Scholz <www@scholzj.com>
Co-authored-by: Jakub Stejskal <xstejs24@gmail.com>
Co-authored-by: Lukáš Král <53821852+im-konge@users.noreply.github.com>
Co-authored-by: Tom Bentley <tombentley@users.noreply.github.com>
Co-authored-by: PaulRMellor <47596553+PaulRMellor@users.noreply.github.com>
Co-authored-by: Menno Luiten <mluiten@artifix.net>
Co-authored-by: salmasaeed1 <41479845+salmasaeed1@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants