Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug 1885358: protect openshift traffic by using dedicated flowschema #966

Merged
merged 1 commit into from Oct 5, 2020

Conversation

tkashem
Copy link
Contributor

@tkashem tkashem commented Oct 4, 2020

Define dedicated flowschema and priority configuration that will protect openshift specific traffic.

  • SAR and tokenreviews from both oas and oauth are very important
  • kcm, other oas and oauth requests, /metrics requests from openshift-monitoring is pretty important
  • openshift controller manager is as important as kcm.
  • control plane operators are important (kas-o, oas-o, auth operator, etcd operator)
  • workloads-low goes below the traffic defined above.

Question:

  • do we want to include cvo in the control plane operators, or are there traffic from any other critical operators that we should protect?

New configuration:
image

After applying the above configuration, we can see that the number of requests from workload-low drops. This implies that traffic from oas, /metrics call from openshift monitoring, traffic from control plane operators are not being throttled anymore.
image

@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 4, 2020
@tkashem
Copy link
Contributor Author

tkashem commented Oct 4, 2020

/retest

@tkashem tkashem changed the title [WIP] flowschema for openshift traffic protect openshift traffic by using dedicated floschema Oct 5, 2020
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 5, 2020
@tkashem tkashem changed the title protect openshift traffic by using dedicated floschema protect openshift traffic by using dedicated flowschema Oct 5, 2020
- '*'
verbs:
- '*'
subjects:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

etcd, kas, kcm, ks, auth, oas

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@deads2k

  • kcm (kube controller manager) and ks (scheduler) are already covered by the default set of flowschema shipped.
  • by auth you mean openshift-authentication-operator? This is included here.
  • oas (openshift apiserver) can't be here, it has a higher priority and will be assigned to aggregated-api-delegated-auth priority configuration.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apiVersion: flowcontrol.apiserver.k8s.io/v1alpha1
kind: PriorityLevelConfiguration
metadata:
name: openshift-system
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be allowed more requests than openshift-control-plane-operators. Even though it's one server, really don't want to run out.

Also, rename aggregated-api-delegated-auth I think. We'll want to tie the oauth-apiserver to it too, right?

Copy link
Contributor Author

@tkashem tkashem Oct 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, rename aggregated-api-delegated-auth

sounds good, I am adding a prefix openshift to it.

We'll want to tie the oauth-apiserver to it too, right?

I have added oauth apiserver.

This should be allowed more requests than openshift-control-plane-operators. Even though it's one server, really don't want to run out.

Yes, makes sense. we expect the operators to have a steady call rate (unless there is a bug that causes a hot loop or something) and openshift api servers are tied to the user traffic. I doubled the concurrency share to 20 and reduced for the operators to 10.

type: Queue
type: Limited
---
apiVersion: flowcontrol.apiserver.k8s.io/v1alpha1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather keep rules like this local to operators that use them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@deads2k Yes, I thought about it. the only concern I have is the matchingPrecedence field. This defines the matching order, so we need to keep in mind all the other values to make sure the right order is maintained.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@deads2k Yes, I thought about it. the only concern I have is the matchingPrecedence field. This defines the matching order, so we need to keep in mind all the other values to make sure the right order is maintained.

We did this with conventions in the CVO. Our payloads are local. I'm ok moving these in 4.7 instead of 4.6.

@tkashem tkashem force-pushed the flowschema branch 2 times, most recently from 0e4eea7 to 8dfa337 Compare October 5, 2020 15:04
@tkashem tkashem changed the title protect openshift traffic by using dedicated flowschema BUG 1885358: protect openshift traffic by using dedicated flowschema Oct 5, 2020
@openshift-ci-robot openshift-ci-robot added bugzilla/severity-urgent Referenced Bugzilla bug's severity is urgent for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. labels Oct 5, 2020
@openshift-ci-robot
Copy link

@tkashem: This pull request references Bugzilla bug 1885358, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.7.0) matches configured target release for branch (4.7.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

In response to this:

BUG 1885358: protect openshift traffic by using dedicated flowschema

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tkashem tkashem changed the title BUG 1885358: protect openshift traffic by using dedicated flowschema protect openshift traffic by using dedicated flowschema Oct 5, 2020
@openshift-ci-robot openshift-ci-robot removed the bugzilla/severity-urgent Referenced Bugzilla bug's severity is urgent for the branch this PR is targeting. label Oct 5, 2020
@openshift-ci-robot
Copy link

@tkashem: No Bugzilla bug is referenced in the title of this pull request.
To reference a bug, add 'Bug XXX:' to the title of this pull request and request another bug refresh with /bugzilla refresh.

In response to this:

protect openshift traffic by using dedicated flowschema

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot removed the bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. label Oct 5, 2020
name: openshift-aggregated-api-delegated-auth
spec:
limited:
assuredConcurrencyShares: 20
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bump this to 100 for me? It should only matter when we start hitting pressure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bump this to 100 for me? It should only matter when we start hitting pressure

actually, when looking at it, this is an ok starting point.

@deads2k
Copy link
Contributor

deads2k commented Oct 5, 2020

/lgtm

@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: deads2k, tkashem

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added lgtm Indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Oct 5, 2020
@openshift-merge-robot openshift-merge-robot merged commit aa190be into openshift:master Oct 5, 2020
@mfojtik
Copy link
Member

mfojtik commented Oct 6, 2020

/bugzilla refresh

@mfojtik mfojtik changed the title protect openshift traffic by using dedicated flowschema bug 1885358: protect openshift traffic by using dedicated flowschema Oct 6, 2020
@openshift-ci-robot
Copy link

@tkashem: All pull requests linked via external trackers have merged:

Bugzilla bug 1885358 has been moved to the MODIFIED state.

In response to this:

bug 1885358: protect openshift traffic by using dedicated flowschema

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants