Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect resource quota set for namespaces where Project quota is enabled #35647

Closed
SheilaghM opened this issue Nov 23, 2021 · 25 comments
Closed
Assignees
Labels
internal priority/2 QA/S release-note Note this issue in the milestone's release notes team/area1
Milestone

Comments

@SheilaghM
Copy link

SheilaghM commented Nov 23, 2021

SURE-3695

Rancher Cluster:
Rancher version: v2.5.10 & v.2.6.2
Number of nodes: 3
Node OS version: Ubuntu 20.04.3 LTS

Downstream Cluster:
Number of Downstream clusters: 1
Node OS: Ubuntu 20.04.3 LTS
RKE/RKE2/K3S version: NA
Kubernetes version: NA
CNI: NA

Longhorn:
Longhorn version:NA
CPU per node: NA
Memory per node: NA
Disk type: HDD/SSD/NVMe
Network bandwidth between the nodes: NA

Other:
Underlying Infrastructure: NA
Any 3rd party software installed on the nodes: NA
-->

Issue description:
The namespace created after utilizing all project limits; the resource quota allocated to new namespace contains quota which is not provided in project quota

Business impact:
Unable to create a new config map on a namespace where no project resource quota assigned for configmap

Repro steps:

Create a Project with below resource limits

    limitsCpu: 200m
    limitsMemory: 128Mi
    requestsCpu: 100m
    requestsMemory: 64Mi
  description: ""
  displayName: quotatest
  enableProjectMonitoring: false
  namespaceDefaultResourceQuota:
    limit:
      limitsCpu: "1000"
      limitsMemory: "2000"
  resourceQuota:
    limit:
      limitsCpu: "1000"
      limitsMemory: "2000"

Create a namespace inside the project
The namespace gets a quota with the below settings

    hard:
      limits.cpu: 1k
      limits.memory: 2k

Create another namespace, but this time the new namespace gets below the quota

    hard:
      configmaps: "0"
      limits.cpu: "0"
      limits.memory: "0"
      persistentvolumeclaims: "0"
      pods: "0"
      replicationcontrollers: "0"
      requests.cpu: "0"
      requests.memory: "0"
      requests.storage: "0"
      secrets: "0"
      services: "0"
      services.loadbalancers: "0"
      services.nodeports: "0"

Create a new config map on second namespace
configmaps "test" is forbidden: exceeded quota: default-995sh, requested: configmaps=1, used: configmaps=0, limited: configmaps=0

Actual behavior:
Unable to create config maps on namespace that doesn't have a project-level configmap quota set

Expected behavior:
Configmap creation should work since the limit is set for only CPU and Memory

@a-blender a-blender self-assigned this Nov 24, 2021
@a-blender a-blender added this to the v2.6.4 milestone Nov 24, 2021
@zube zube bot removed this from the v2.6.4 milestone Nov 24, 2021
@deniseschannon deniseschannon added this to the v2.6.4 milestone Dec 5, 2021
@sowmyav27
Copy link
Contributor

@annablender In our initial checks we were not able to reproduce this on 2.6.2. Have to try a few more usecases and additionally test it on 2.6-head. But will get to it post 2.6.3 release

@zube zube bot closed this as completed Dec 8, 2021
@zube zube bot removed the [zube]: Working label Dec 8, 2021
@zube zube bot removed the [zube]: Working label Dec 8, 2021
@a-blender a-blender assigned gaktive and unassigned gaktive and a-blender Dec 8, 2021
@a-blender a-blender self-assigned this Dec 8, 2021
@zube zube bot reopened this Dec 8, 2021
@a-blender a-blender removed their assignment Dec 8, 2021
@cbron cbron modified the milestones: v2.6.4, v2.6.x Dec 17, 2021
@maxsokolovsky
Copy link
Contributor

Waiting on rancher/dashboard#5016

@maxsokolovsky
Copy link
Contributor

maxsokolovsky commented Feb 18, 2022

@cbron @cmurphy @MbolotSuse @catherineluse @samjustus
There are a few edge cases that trigger behaviors whose intent we need to understand. Many behaviors are not documented, so we don't know if they are expected, correct, and intuitive to the user.

I believe we would need to evaluate those (possibly include someone from the UI team as well, for good measure) and decide what to do. In addition, the backend controller responsible for handling new namespaces and resource quotas, as well as container limits, needs to be significantly refactored and cleaned up to be correct, more readable and performant.

Here are some behaviors and questions about them:

  • If a new namespace exceeds the quota for any one resource, should we reject the quota creation? Or should we create a zeros-for-all quota? Or should we create a quota with zeros only for requested resources that are filled up?
  • When should we ever create a zeros-for-all quota?
  • How do we deal with negative quota values passed in?
  • Should the backend try to emulate what the UI does by calculating requested quota based on already used quota in the project? This is for when the request does not go through the UI.
  • Should we ever reject new namespaces?
  • We also need establish an exact list of resources for which a quota applies. The backend does not currently fully handle a few.
  • Merges of container default limits (project vs namespace) - see Container Default Limits are not properly merged for project and namespace overriding them #36485
  • More questions to be added...

About the crux of this particular issue - I believe we create a zeros-for-all quota when a new namespace would not fit - simply because this is our way of rejecting the namespace. The namespace itself is created immediately on request. Our logic handles creating a Resource Quota in it. Instead of rejecting the quota and deleting the namespace, we just make a fully restrictive quota. Not sure if this is the most intuitive behavior.

@catherineluse
Copy link

I think the zeros-for-all quota is confusing. Unless I'm mistaken, it basically just bricks the new namespace so you can't do anything with it. It would be easier to understand if creating the namespace would fail with an error along the lines of "Can't create a new namespace because the project resource quota is already at capacity. Increase the project quota limit or decrease the limit of other namespaces in the project to free up more capacity."

@cbron
Copy link
Contributor

cbron commented Feb 22, 2022

If a new namespace exceeds the quota for any one resource, should we reject the quota creation? Or should we create a zeros-for-all quota? Or should we create a quota with zeros only for requested resources that are filled up?

Going with last option here, we should only block things that are over quota. If you are over CPU, you should still be able to make a NS with only configMaps. But ideally we alert user and tell them they are over CPU quota, so they aren't confused if they did actually want to allocate CPU.

UI behavior:

  • When adding a namespace with a resource is over quota, the UI will only add zero-out for those resources that are over the limit, this is correct behavior.

UI todo:

  • Need to add warning label when you are over quota, the slider going to zero doesn't tell you why it's at zero.
  • Stop rebalancing quotas on existing namespaces, stay in line with docs - need to repro first
  • Bug around removing quota on a project not actually removing it - need to repro first
  • Bug around negative values - dashboard 5153

Non-UI behavior:

  • Adding namespace with the proper project annotation: when you are under quota you get a quota for the default, which is proper behavior.
  • Adding namespace with the proper project annotation: when you are over quota, or request resources that go over quota, you get a zeroes-for-all (blocking) quota. This we want to unify with the docs and UI behavior of only blocking over-allocated resource.

Non-UI todo:

  • Last bullet above.
  • We also need establish an exact list of resources for which a quota applies. The backend does not currently fully handle a few. This may be longer term.

@maxsokolovsky
Copy link
Contributor

For this particular task, we'll have backend changes to achieve the following behavior:

For scenarios where namespaces are created in kubectl:

  • If requested quota fits the project, then it is granted and associated with the namespace.
  • If requested quota does not fit, then zero quota only for overfilled resources is granted.

It should not matter whether requested quota comes from project defaults or annotation.

@maxsokolovsky
Copy link
Contributor

QA Testing

Root cause

We were creating an all-restrictive quota (zero for all resources) for namespaces that do not
fit the project quota for at least one resource. This was happening only with namespaces created via kubectl.

What was fixed, or what changes have occurred

We now never create the all-restrictive quota limit.
If a new namespace does not fit the project limit, then we create a zero quota only for those resources that are overfilled,
regardless of whether the namespace is created via UI or kubectl.

This is true even if you use kubectl and explicitly specify resource quota limits with a special annotation, like this:
field.cattle.io/resourceQuota: '{"limit":{"limitsCpu":"100m", "limitsMemory":"100Mi", "configMaps": "50"}}'

Areas or cases that should be tested

  • Running the scenarios on the RC.
  • Creating namespaces according to the old behaviors from v2.6.3-patch1, then upgrading to the RC. Existing namespaces and their quota limits must not be affected. But the new
    behaviors are in effect for new namespaces.

What areas could experience regressions?

Please ignore the following until you run through the main scenarios. This is mostly information to keep in mind but no action is needed at this moment.

One issue that will soon disappear is a lack of full synchronization between frontend and backend.
Here is one scenario:

A project has a limit of 500 total and 300 per namespace default (of some resource).
I make a namespace ns1 in the UI that wants and gets 300.
Then I make a namespace ns2 via kubectl that wants 300, but gets 0, because there is only 200 remaining.
Then I make a namespace ns3 in the UI that wants 300 and should get the remaining 200 (this behavior is intentional when a request is made via UI). But the UI calculates the remaining as -100.
The backend does not let -100 through and assigns 0. But there is still 200 available in the project, technically.
This happens because the backend always sets the requested quota annotation on the namespace whether or not a full quota is granted or there has been an error.
But the UI uses it to calculate actual balances.

Steps

Scenario 1

Project resource limits:
CPU: 500m, Memory: 512Mi

Default limits for new namespace:
CPU: 300m, Memory: 64Mi

Make two namespaces, ns1 and ns2, in the UI, use the defaults. Inspect them with kubectl describe ns [ns-name]

ns1 has quota limit
CPU: 300m, Memory: 64Mi

ns2 has quota limit - notice that because you created the namespace in the UI, it gets remainder of the resource, not zero.
CPU: 200m, Memory: 64Mi

Now the CPU resource is completely occupied in the project.
Create a new namespace in the project using kubectl.

apiVersion: v1
kind: Namespace
metadata:
  annotations:
    field.cattle.io/projectId: [your-cluster-ID]:[your-project-ID]
  name: k1

k1 has quota limit
CPU: 0m, Memory: 64Mi

Create another namespace via UI called ns3.
ns3 has quota limit
CPU: 0m, Memory: 64Mi

Scenario 2

Project resource limits:
CPU: 500m, Memory: 512Mi

Default limits for new namespace:
CPU: 250m, Memory: 64Mi

Make two namespaces, ns1 and ns2, in the UI, use the defaults. Inspect them with kubectl describe ns [ns-name]

ns1 has quota limit
CPU: 250m, Memory: 64Mi

ns2 has quota limit
CPU: 250m, Memory: 64Mi

Now the CPU resources is completely filled in the project.
Create a new namespace in the project using kubectl.

apiVersion: v1
kind: Namespace
metadata:
  annotations:
    field.cattle.io/projectId: [your-cluster-ID]:[your-project-ID]
  name: k1

k1 has quota limit
CPU: 0m, Memory: 64Mi

Create another namespace via UI called ns3.
ns3 has quota limit
CPU: 0m, Memory: 64Mi

Scenario 3

Do the same test but for the namespace created with kubectl, do specify some custom limits in an annotation.

apiVersion: v1
kind: Namespace
metadata:
  annotations:
    field.cattle.io/projectId: [your-cluster-ID]:[your-project-ID]
    field.cattle.io/resourceQuota: '{"limit":{"limitsCpu":"100m", "limitsMemory":"100Mi", "configMaps": "50"}}'
  name: k1

namespace k1 has quota limit
CPU: 0m, Memory: 100Mi, configMaps: 50

Scenario 4

Create a project and do not set any resource limits.
Then create a namespace in the project using the UI.
And then create another namespace in the project using kubectl but do specify some resource quota limit:

apiVersion: v1
kind: Namespace
metadata:
  annotations:
    field.cattle.io/projectId: [your-cluster-ID]:[your-project-ID]
    field.cattle.io/resourceQuota: '{"limit":{"limitsCpu":"100m", "limitsMemory":"100Mi", "configMaps": "50"}}'
  name: k1

Observe that both namespaces end up having NO limits.

Scenario 5

Project resource limits:
CPU: 500m, Memory: 512Mi

Default limits for new namespace:
CPU: 300m, Memory: 64Mi

Create a namespace outside of any project. Run kubectl create ns my-ns.
Inspect it and observe that it has no resource quota limits.
Now move it to the project. Observe that the default quota limits are applied to it.

my-ns has quota limit
CPU: 300m, Memory: 64Mi

@brudnak
Copy link
Member

brudnak commented Mar 7, 2022

My checks PASSED

Validation Environment

Component Version / Type
Rancher version v2.6-3f9109c3f3a1ea482f66f7e3700e495d36517e10-head
Rancher cluster type Docker
Docker version 20.10.7, build 20.10.7-0ubuntu5~20.04.2
Helm version v2.16.8-rancher1
Downstream cluster type RKE Linode
Downstream K8s version v1.21.9-rancher1-1
Browser type Google Chrome
Browser version 99.0.4844.51 (Official Build) (x86_64)

Validation steps

  1. Create a project
    • Name: mytest-project
    • Resource Quotas:
      • CPU Limit / Project Limit: 500m
      • Memory Limit / Project Limit: 512Mi
      • CPU Limit / Namespace Default Limit: 300m
      • Memory Limit / Namespace Default Limit: 64Mi
  2. Create two namespaces in the UI
    1. Name: ns1
    2. Name: ns2
  3. Inspect them with
kubectl describe namespace $nameOfNamespace
  1. Create a new namespace in the project using kubectl
    • Launch the Kubectl Shell from the UI
    • Run the following commands
touch new-namespace-spec.yaml
vi new-namespace-spec.yaml
  1. Paste in the following yaml file
apiVersion: v1
kind: Namespace
metadata:
  annotations:
    field.cattle.io/projectId: [your-cluster-ID]:[your-project-ID]
  name: k1
  1. Run the following
kubectl create -f new-namespace-spec.yaml
kubectl describe namespace k1

Results

Expected

For the namespace created via kubectl to abide by the resource quota set to the project

Actual

✅ PASS : The namespace created via kubectl did abide by the resource quota set against the project

Additional Tests

Test Steps Expected Actual Pass/Fail
Additional checks with different values
  1. Create a project
  2. Name: mytest-project2
  3. Resource Quotas
  4. CPU Limit / Project Limit: 500m
  5. Memory Limit / Project Limit: 512Mi
  6. CPU Limit / Namespace Default Limit: 250m
  7. Memory Limit / Namespace Default Limit: 64Mi
  8. Create two namespaces in the UI
  9. anotherns1
  10. anotherns2
  11. Create a new namespace in the project using kubectl
  12. Check that the new namespace has expected resource quotas
  13. kubectl describe namespace YOUR_NS
Namespace created with kubectl SHOULD have CPU Limit: 0m & Memory Limit: 64Mi Namespace created with kubectl DOES have CPU Limit: 0m & Memory Limit: 64Mi
Do the same test but for the namespace created with kubectl, do specify some custom limits in an annotation
  1. Create resource quotas for CPU and Memory limit
  2. Max out the CPU limit with namespaces from the UI
  3. Try to get a namespace with CPU using kubectl with custom limits in an annotation
To not be able to bypass project resource quotas Unable to bypass project resource quota using custom limit in annotation
Create a namespace outside of any project
  1. kubectl create ns my-ns
That no project resource quota should be assigned to a namespace outside of a project No resource quota was assigned to the namespace outside of the project

@maxsokolovsky
Copy link
Contributor

maxsokolovsky commented Mar 16, 2022

Release note

For namespaces created inside projects via kubectl, the controller no longer assigns an all-restrictive quota limit when a resource limit exceeds the remaining amount in the project. Instead, a zero just for the exceeding resource is granted.

@zube zube bot removed the [zube]: Done label Jun 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
internal priority/2 QA/S release-note Note this issue in the milestone's release notes team/area1
Projects
None yet
Development

No branches or pull requests