Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1925524: Migrate to shared informers and watchers removed #386

Conversation

akram
Copy link

@akram akram commented Jun 2, 2021

Important refactoring to fix conccurency issue which appeared after the introduction of the Priority and Fairness API in kubernetes.
This new API made the plugin subject to performance degration while using an important number of connexion ( > 100 ). As the openshift sync plugins uses 5 connections per namespaces that it watches, Jenkins instances watching more that 20 namespaces where affected.

Before the introduction of P&F API, the connections were silently dropped and connections in queue had a chance to get their resources to be synced. After the introduction, this connection recycling was not happening because important number of connections are queued and throttled.

This PR introduces major changes:

  • Migration to kubernetes-client 5.4.0 which requires kubernetes-client-api plugin matching this version.
  • Uses SharedInformers implementations instead of watch API. This introduces an automatic caching system with list refreshes and a cleaner programming style as we receive events already filtered by types in their respective methods (onAdd onModify onDelete)
  • Introduction of checkbox to allow activation/disactivation of synchronisation per resources types (Builds+BuildConfigs, Secrets, ConfigMap and ImageStreams)
  • A cluster mode that uses 5 connexions whatever the number of watched namespaces but also requiring a special permission set.

@openshift-ci openshift-ci bot requested review from sbose78 and waveywaves June 2, 2021 12:12
@openshift-ci openshift-ci bot added approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. labels Jun 2, 2021
@openshift-ci
Copy link

openshift-ci bot commented Jun 2, 2021

@akram: This pull request references Bugzilla bug 1925524, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.8.0) matches configured target release for branch (4.8.0)
  • bug is in the state NEW, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

Requesting review from QA contact:
/cc @jitendar-singh

In response to this:

Bug 1925524: Migrate to shared informers and watchers removed

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot added the bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. label Jun 2, 2021
@akram
Copy link
Author

akram commented Jun 2, 2021

This should fail build as it requires a version of kubernetes-client-api-plugin.
Waiting release of jenkinsci/kubernetes-client-api-plugin cc @Vlatombe

@akram akram force-pushed the migrate-to-shared-informers-and-watchers-removed branch from 35c64d4 to d6a986f Compare June 2, 2021 17:16
@akram
Copy link
Author

akram commented Jun 14, 2021

/retest

1 similar comment
@akram
Copy link
Author

akram commented Jun 14, 2021

/retest

@akram
Copy link
Author

akram commented Jun 17, 2021

/test e2e-aws-jenkins

@akram
Copy link
Author

akram commented Jun 17, 2021

[31mERRO�[0m[2021-06-17T17:49:48Z] 
  * could not run steps: step e2e-aws-jenkins failed: "e2e-aws-jenkins" pre steps failed: "e2e-aws-jenkins" pod "e2e-aws-jenkins-ipi-install-install" failed: the pod ci-op-kyc6s48m/e2e-aws-jenkins-ipi-install-install failed after 2h10m4s (failed containers: test): ContainerFailed one or more containers exited

Container test exited with code 1, reason Error
---
9424147]: "Reflector ListAndWatch" name:k8s.io/client-go/tools/watch/informerwatcher.go:146 (17-Jun-2021 17:15:55.745) (total time: 30000ms):
Trace[1139424147]: [30.000892568s] [30.000892568s] END
E0617 17:16:25.746783      73 reflector.go:138] k8s.io/client-go/tools/watch/informerwatcher.go:146: Failed to watch *v1.ClusterVersion: failed to list *v1.ClusterVersion: Get "https://api.ci-op-kyc6s48m-50467.origin-ci-int-aws.dev.rhcloud.com:6443/apis/config.openshift.io/v1/clusterversions?fieldSelector=metadata.name%3Dversion&limit=500&resourceVersion=0": dial tcp 54.196.242.18:6443: i/o timeout
{"component":"entrypoint","file":"prow/entrypoint/run.go:165","func":"k8s.io/test-infra/prow/entrypoint.Options.ExecuteProcess","level":"error","msg":"Process did not finish before 2h0m0s 

/retest

@akram
Copy link
Author

akram commented Jun 21, 2021

/retest

@adambkaplan
Copy link

/bugzilla refresh

@openshift-ci openshift-ci bot added bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. and removed bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. labels Jun 21, 2021
@openshift-ci
Copy link

openshift-ci bot commented Jun 21, 2021

@adambkaplan: This pull request references Bugzilla bug 1925524, which is invalid:

  • expected the bug to target the "4.9.0" release, but it targets "4.8.0" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@adambkaplan
Copy link

/bugzilla refresh

@openshift-ci
Copy link

openshift-ci bot commented Jun 21, 2021

@adambkaplan: This pull request references Bugzilla bug 1925524, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.9.0) matches configured target release for branch (4.9.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

Requesting review from QA contact:
/cc @jitendar-singh

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot added bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. and removed bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. labels Jun 21, 2021
@akram
Copy link
Author

akram commented Jun 21, 2021

/retest

@akram
Copy link
Author

akram commented Jun 21, 2021

The built image still references plugins from the stock image version instead of using the ones specified by the currently tests sync plugin:

java.io.IOException: Failed to load: OpenShift Sync (1.0.47-SNAPSHOT (private-4e5dee76-root))
 - Update required: Jackson 2 API Plugin (2.12.1) to be updated to 2.12.3 or higher
 - Update required: Kubernetes Client API Plugin (4.13.3-1) to be updated to 5.4.1 or higher
 - Update required: Kubernetes plugin (1.29.7) to be updated to 1.30.0 or higher

Retesting that to see if it fixes the issue.

@akram
Copy link
Author

akram commented Jun 24, 2021

needs openshift/jenkins#1297

@akram
Copy link
Author

akram commented Jun 24, 2021

openshift/jenkins#1297 will not pass e2e tests, because, openshift-sync 1.0.45 cannot start with kubernetes-1.30 plugin, as it requires an older version on kubernetes-client-api-plugin.

I will have to /override this one having done the tests manually and then we will have to update openshift/jenkins#1297 to make it use openshift-sync 1.0.46 after it is released.

That's a net less, but, because of the build dependency, we are are forced to do it.
cc @adambkaplan

@akram
Copy link
Author

akram commented Jun 24, 2021

/override ci/prow/e2e-aws-jenkins

@openshift-ci
Copy link

openshift-ci bot commented Jun 24, 2021

@akram: Overrode contexts on behalf of akram: ci/prow/e2e-aws-jenkins

In response to this:

/override ci/prow/e2e-aws-jenkins

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@jkhelil
Copy link

jkhelil commented Jun 24, 2021

/lgtm

@openshift-ci
Copy link

openshift-ci bot commented Jun 24, 2021

@jkhelil: changing LGTM is restricted to collaborators

In response to this:

/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@jkhelil
Copy link

jkhelil commented Jun 25, 2021

/lgtm

@openshift-ci
Copy link

openshift-ci bot commented Jun 25, 2021

@jkhelil: changing LGTM is restricted to collaborators

In response to this:

/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link

@adambkaplan adambkaplan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Adding lgtm on behalf of @jkhelil

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 29, 2021
@openshift-ci
Copy link

openshift-ci bot commented Jun 29, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: adambkaplan, akram, jkhelil

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit b8948a9 into openshift:master Jun 29, 2021
@openshift-ci
Copy link

openshift-ci bot commented Jun 29, 2021

@akram: All pull requests linked via external trackers have merged:

Bugzilla bug 1925524 has been moved to the MODIFIED state.

In response to this:

Bug 1925524: Migrate to shared informers and watchers removed

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants