Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Full or partial Quota usage prevents Live Migration of VM when additional VM exceeds Quota #3124

Closed
rebeccazzzz opened this issue Nov 8, 2022 · 4 comments
Assignees
Labels
area/rancher issues dependes to the upstream rancher highlight Highlight issues/features kind/bug Issues that are defects reported by users or that we know have reached a real release kind/enhancement Issues that improve or augment existing functionality priority/0 Must be fixed in this release reproduce/always Reproducible 100% of the time require/doc Improvements or additions to documentation require/HEP Require Harvester Enhancement Proposal PR severity/2 Function working but has a major issue w/o workaround (a major incident with significant impact)
Milestone

Comments

@rebeccazzzz
Copy link

Describe the bug

Fully or partially utilizing the Quota set on the Project and/or Namespace prevents the Live Migration of VMs.

The example below illustrates the problem and prevents the Live Migration of a VM.

  • Quota: 64GB Memory
  • VMs: 3 x 16GB Memory Nodes (together 48GB)

The expectation is that Live Migration could be done, but unfortunately the below message was seen.

///

(combined from similar events): Error creating pod: pods "virt-launcher-awhs-int-pool1-0c06dbc6-c7wnq-7rdpv" is forbidden: exceeded quota: default-gmp6p, requested: limits.memory=17405661185, used: limits.memory=52216983555, limited: limits.memory=64Gi

///
 
Also, even when a user fully utilizes its quota, the expectation is that the existing VMs are allowed to live migrate to another node.

Expected behavior

  • Actual behavior:
 Live migration not possible when quota is set
  • Expected behavior:
 Live migration should be possible with partial or full quota utilization
@rebeccazzzz rebeccazzzz added kind/bug Issues that are defects reported by users or that we know have reached a real release priority/1 Highly recommended to fix in this release reproduce/needed Reminder to add a reproduce label and to remove this one severity/needed Reminder to add a severity label and to remove this one labels Nov 8, 2022
@rebeccazzzz rebeccazzzz added this to the v1.2.0 milestone Nov 8, 2022
@guangbochen guangbochen added priority/0 Must be fixed in this release area/rancher issues dependes to the upstream rancher severity/2 Function working but has a major issue w/o workaround (a major incident with significant impact) reproduce/always Reproducible 100% of the time require/HEP Require Harvester Enhancement Proposal PR highlight Highlight issues/features and removed reproduce/needed Reminder to add a reproduce label and to remove this one severity/needed Reminder to add a severity label and to remove this one labels Jan 5, 2023
@guangbochen guangbochen changed the title [BUG] Full or partial Quota usage prevents Live Migration of VM when additional VM exceeds Quota [FEATURE] Full or partial Quota usage prevents Live Migration of VM when additional VM exceeds Quota Jan 16, 2023
@guangbochen guangbochen added the kind/enhancement Issues that improve or augment existing functionality label Jan 16, 2023
@chrisho chrisho removed the priority/1 Highly recommended to fix in this release label Jan 16, 2023
@guangbochen guangbochen added the blocker blocker of major functionality label Feb 22, 2023
@WuJun2016 WuJun2016 added the require-ui/medium estimate 3-5 working days label Mar 2, 2023
@chrisho
Copy link
Contributor

chrisho commented Mar 8, 2023

After creating VMs, because the virt-component in the Pod requires a certain memory overhead, resulting in a higher actual quota of Pod memory than expected from the VM configuration, however, there is a possibility that the number of VMs created by the user or the number of simultaneous migrations will not reach the expectation.

Example:
The user configures the ResourceQuota of 10c and 10Gi, and the maintenance quota is 50% each, i.e. VM Available Resource and Maintenance Available Resource are 5c and 5Gi each. Although the user can create 5 VMs at this time, the 5 VMs cannot be migrated at the same time because the total memory used by the 5 VMs is more than 5Gi, so only 4 VMs can be migrated at the same time.
In general, Maintenance Resources will only be overused in accordance with general usage.

solution: https://github.com/chrisho/harvester/blob/maintenance-resource-hep/enhancements/20230228-resource-quota-enhancement.md#vm-overhead-resource

@harvesterhci-io-github-bot
Copy link

harvesterhci-io-github-bot commented Mar 9, 2023

Pre Ready-For-Testing Checklist

* [ ] Is there a workaround for the issue? If so, where is it documented?
The workaround is at:

* [ ] Have the backend code been merged (harvester, harvester-installer, etc) (including backport-needed/*)?
The PR is at:

* [ ] Does the PR include the explanation for the fix or the feature?

* [ ] Does the PR include deployment change (YAML/Chart)? If so, where are the PRs for both YAML file and Chart?
The PR for the YAML change is at:
The PR for the chart change is at:

* [ ] If labeled: area/ui Has the UI issue filed or ready to be merged?
The UI issue/PR is at:

  • If labeled: require/doc, require/knowledge-base Has the necessary document PR submitted or merged?
    The documentation/KB PR is at: Add ResourceQuota doc docs#346

* [ ] If NOT labeled: not-require/test-plan Has the e2e test plan been merged? Have QAs agreed on the automation test case? If only test case skeleton w/o implementation, have you created an implementation issue?
- The automation skeleton PR is at:
- The automation test case PR is at:

* [ ] If the fix introduces the code for backward compatibility Has a separate issue been filed with the label release/obsolete-compatibility?
The compatibility issue is filed at:

@harvesterhci-io-github-bot

Automation e2e test issue: harvester/tests#741

@chrisho chrisho added the require/doc Improvements or additions to documentation label Mar 13, 2023
@guangbochen guangbochen removed the blocker blocker of major functionality label Apr 6, 2023
@lanfon72
Copy link
Member

lanfon72 commented May 25, 2023

Verified this feature has been implemented.

Test Information

  • Environment: Baremetal DL360 3 nodes
  • Harvester Version: v1.2.0-rc1
  • ui-source Option: Auto
  • Rancher Version:
    • v2.6-a4b924d21cd5d90b28ab8009edc18d96542952ac-head, Docker Image: a1d1e43988d2
    • v2.7-fbca7c34e5aeae36b85d8b0e9af12a2f6f13e0ea-head, Docker Image: 73f3b3456e62

Verify Steps

  1. Install Harvester with at least 2 nodes
  2. Import Harvester into Rancher
  3. Access Harvester via Rancher's Virtualization Management
  4. Navigate to Advanced/Settings, update overcommit-config to cpu:100, memory:100
  5. Navigate to Projects/Namespaces, Create Project proj-a with Resource Quotas
    • CPU Limit: 5000 mCPUs/Project Limit, 3000 mCPUs/Namespace Default Limit
    • Memory Limit: 5000 MiB/Project Limit, 4000 mCPUs/Namespace Default Limit
  6. Create Namespace ns-1 under Project proj-a
  7. Create a image for VM creation
  8. Create 2 VMs belong to namespace ns-1
    • Named vm1 with CPU: 1C, Memory 1GiB
    • Named vm2 with CPU: 1C, Memory 2GiB
  9. Migrate vm1 and vm2 to another host concurrently
  10. VMs should be migrated successfully

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/rancher issues dependes to the upstream rancher highlight Highlight issues/features kind/bug Issues that are defects reported by users or that we know have reached a real release kind/enhancement Issues that improve or augment existing functionality priority/0 Must be fixed in this release reproduce/always Reproducible 100% of the time require/doc Improvements or additions to documentation require/HEP Require Harvester Enhancement Proposal PR severity/2 Function working but has a major issue w/o workaround (a major incident with significant impact)
Projects
None yet
Development

No branches or pull requests

8 participants