Skip to content

feat: add KubevirtMigrationAware evictor plugin#591

Open
tiraboschi wants to merge 1 commit into
openshift:mainfrom
tiraboschi:KubevirtMigrationAware
Open

feat: add KubevirtMigrationAware evictor plugin#591
tiraboschi wants to merge 1 commit into
openshift:mainfrom
tiraboschi:KubevirtMigrationAware

Conversation

@tiraboschi
Copy link
Copy Markdown

Description

Adds a new EvictorPlugin that makes the descheduler aware of KubeVirt live-migration state when deciding whether to evict virt-launcher pods.

  • Filter (hard block): prevents eviction of pods whose VMI has a migration in progress (startTimestamp set, endTimestamp absent in migrationState). KubeVirt's own admission webhook provides a complementary safety net at the API layer; this plugin acts upstream of it to avoid the round-trip.

  • PreEvictionFilter (soft block): defers eviction of pods whose VMI completed a migration recently, using a three-layer adaptive cooldown:

    1. Base: max(migrationCooldown, migrationDuration) — heavier VMs (longer migrations) automatically receive longer protection.
    2. Backoff: base × 2^(count−1) where count is the number of migration completions recorded in a configurable sliding history window (default 24h). Each successive migration within the window doubles the cooldown, making repeated churn progressively harder.
    3. Cap: the result is bounded by maxMigrationCooldown (default 6h) to prevent pathological cases from locking a VM indefinitely.

Defaults:

  • migrationCooldown=15m
  • maxMigrationCooldown=6h
  • migrationHistoryWindow=24h
    All three are operator-configurable.

Both extension points read from a dedicated dynamic VMI informer cache (kubevirt.io/v1 VirtualMachineInstances), avoiding API-server calls in the hot eviction path. An UpdateFunc event handler on the same informer records migration completions by VMI UID to drive the backoff history.
The cache warms up at startup with a 30s timeout; failure to sync is a hard error so the descheduler does not start with stale or empty state.

Two Prometheus metrics are registered on first use:

  • descheduler_kubevirt_eviction_blocks_total{reason,node,namespace}
    counter — tracks eviction blocks for alerting and per-node diagnosis.
  • descheduler_kubevirt_effective_cooldown_seconds histogram — shows the
    distribution of applied cooldown durations across backoff buckets
    (15m, 30m, 1h, 2h, 4h, 6h) so operators can tell whether the backoff
    is engaging or VMs are piling up at the cap.

All code paths that cannot retrieve or parse VMI state fail open (allow eviction) so the plugin never blocks unrelated workloads.

Unit tests cover Filter, PreEvictionFilter (base cooldown, adaptive duration, exponential backoff, maxMigrationCooldown cap), migration history recording and pruning, informer event handler, defaults, and validation. No kubevirt imports are required: VMI state is expressed as plain *unstructured.Unstructured objects, exactly as the dynamic informer delivers them at runtime.

Checklist

Please ensure your pull request meets the following criteria before submitting
for review, these items will be used by reviewers to assess the quality and
completeness of your changes:

  • Code Readability: Is the code easy to understand, well-structured, and consistent with project conventions?
  • Naming Conventions: Are variable, function, and structs descriptive and consistent?
  • Code Duplication: Is there any repeated code that should be refactored?
  • Function/Method Size: Are functions/methods short and focused on a single task?
  • Comments & Documentation: Are comments clear, useful, and not excessive? Were comments updated where necessary?
  • Error Handling: Are errors handled appropriately ?
  • Testing: Are there sufficient unit/integration tests?
  • Performance: Are there any obvious performance issues or unnecessary computations?
  • Dependencies: Are new dependencies justified ?
  • Logging & Monitoring: Is logging used appropriately (not too verbose, not too silent)?
  • Backward Compatibility: Does this change break any existing functionality or APIs?
  • Resource Management: Are resources (files, connections, memory) managed and released properly?
  • PR Description: Is the PR description clear, providing enough context and explaining the motivation for the change?
  • Documentation & Changelog: Are README and docs updated if necessary?

@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Apr 30, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: tiraboschi
Once this PR has been reviewed and has the lgtm label, please assign ricardomaraschini for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tiraboschi tiraboschi changed the title Kubevirt migration aware feat: add KubevirtMigrationAware evictor plugin Apr 30, 2026
@tiraboschi tiraboschi force-pushed the KubevirtMigrationAware branch 3 times, most recently from b76f84a to b97edf7 Compare May 4, 2026 16:02
Adds a new EvictorPlugin that makes the descheduler aware of KubeVirt
live-migration state when deciding whether to evict virt-launcher pods.

Filter (hard block): prevents eviction of pods whose VMI has a migration
in progress (startTimestamp set, endTimestamp absent in migrationState).
KubeVirt's own admission webhook provides a complementary safety net at
the API layer; this plugin acts upstream of it to avoid the round-trip.

PreEvictionFilter (soft block): defers eviction of pods whose VMI
completed a migration recently, using a three-layer adaptive cooldown:

  1. Base: max(migrationCooldown, migrationDuration) — heavier VMs
     (longer migrations) automatically receive longer protection.
  2. Backoff: base × 2^(count−1) where count is the number of migration
     completions recorded in a configurable sliding history window
     (default 24h). Each successive migration within the window doubles
     the cooldown, making repeated churn progressively harder.
  3. Cap: the result is bounded by maxMigrationCooldown (default 6h) to
     prevent pathological cases from locking a VM indefinitely.

Defaults: migrationCooldown=15m, maxMigrationCooldown=6h,
migrationHistoryWindow=24h. All three are operator-configurable.

Both extension points read from a dedicated dynamic VMI informer cache
(kubevirt.io/v1 VirtualMachineInstances), avoiding API-server calls in
the hot eviction path. An UpdateFunc event handler on the same informer
records migration completions by VMI UID to drive the backoff history.
The cache warms up at startup with a 30s timeout; failure to sync is a
hard error so the descheduler does not start with stale or empty state.

Two Prometheus metrics are registered on first use:
  - descheduler_kubevirt_eviction_blocks_total{reason,node,namespace}
    counter — tracks eviction blocks for alerting and per-node diagnosis.
  - descheduler_kubevirt_effective_cooldown_seconds histogram — shows the
    distribution of applied cooldown durations across backoff buckets
    (15m, 30m, 1h, 2h, 4h, 6h) so operators can tell whether the backoff
    is engaging or VMs are piling up at the cap.

All code paths that cannot retrieve or parse VMI state fail open
(allow eviction) so the plugin never blocks unrelated workloads.

Unit tests cover Filter, PreEvictionFilter (base cooldown, adaptive
duration, exponential backoff, maxMigrationCooldown cap), migration
history recording and pruning, informer event handler, defaults, and
validation. No kubevirt imports are required: VMI state is expressed as
plain *unstructured.Unstructured objects, exactly as the dynamic informer
delivers them at runtime.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Simone Tiraboschi <stirabos@redhat.com>
@tiraboschi tiraboschi force-pushed the KubevirtMigrationAware branch from b97edf7 to a07081e Compare May 5, 2026 07:41
@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented May 5, 2026

@tiraboschi: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/security a07081e link false /test security

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

tiraboschi added a commit to tiraboschi/cluster-kube-descheduler-operator that referenced this pull request May 6, 2026
Enable the KubevirtMigrationAware plugin as filter and pre-eviction
filter in the KubeVirtRelieveAndMigrate profile. The filter hard-blocks
eviction of VMs with an in-progress migration; the pre-eviction filter
applies an adaptive cooldown after migration completes to prevent
cascading re-evictions.

Introduce three new ProfileCustomizations fields to tune the plugin:
- devMigrationCooldown: base cooldown after a migration completes
- devMaxMigrationCooldown: upper bound on the exponential backoff
- devMigrationHistoryWindow: sliding window for counting past migrations

Requires: openshift/descheduler#591

Signed-off-by: Simone Tiraboschi <stirabos@redhat.com>
tiraboschi added a commit to tiraboschi/cluster-kube-descheduler-operator that referenced this pull request May 6, 2026
Enable the KubevirtMigrationAware plugin as filter and pre-eviction
filter in the KubeVirtRelieveAndMigrate profile. The filter hard-blocks
eviction of VMs with an in-progress migration; the pre-eviction filter
applies an adaptive cooldown after migration completes to prevent
cascading re-evictions.

Introduce three new ProfileCustomizations fields to tune the plugin:
- devMigrationCooldown: base cooldown after a migration completes
- devMaxMigrationCooldown: upper bound on the exponential backoff
- devMigrationHistoryWindow: sliding window for counting past migrations

Requires: openshift/descheduler#591

Signed-off-by: Simone Tiraboschi <stirabos@redhat.com>
tiraboschi added a commit to tiraboschi/cluster-kube-descheduler-operator that referenced this pull request May 6, 2026
Enable the KubevirtMigrationAware plugin as filter and pre-eviction
filter in the KubeVirtRelieveAndMigrate profile. The filter hard-blocks
eviction of VMs with an in-progress migration; the pre-eviction filter
applies an adaptive cooldown after migration completes to prevent
cascading re-evictions.

Introduce three new ProfileCustomizations fields to tune the plugin:
- devMigrationCooldown: base cooldown after a migration completes
- devMaxMigrationCooldown: upper bound on the exponential backoff
- devMigrationHistoryWindow: sliding window for counting past migrations

Requires: openshift/descheduler#591

Signed-off-by: Simone Tiraboschi <stirabos@redhat.com>
tiraboschi added a commit to tiraboschi/cluster-kube-descheduler-operator that referenced this pull request May 6, 2026
Enable the KubevirtMigrationAware plugin as filter and pre-eviction
filter in the KubeVirtRelieveAndMigrate profile. The filter hard-blocks
eviction of VMs with an in-progress migration; the pre-eviction filter
applies an adaptive cooldown after migration completes to prevent
cascading re-evictions.

Introduce three new ProfileCustomizations fields to tune the plugin:
- devMigrationCooldown: base cooldown after a migration completes
- devMaxMigrationCooldown: upper bound on the exponential backoff
- devMigrationHistoryWindow: sliding window for counting past migrations

Requires: openshift/descheduler#591

Signed-off-by: Simone Tiraboschi <stirabos@redhat.com>
tiraboschi added a commit to tiraboschi/cluster-kube-descheduler-operator that referenced this pull request May 6, 2026
Enable the KubevirtMigrationAware plugin as filter and pre-eviction
filter in the KubeVirtRelieveAndMigrate profile. The filter hard-blocks
eviction of VMs with an in-progress migration; the pre-eviction filter
applies an adaptive cooldown after migration completes to prevent
cascading re-evictions.

Introduce three new ProfileCustomizations fields to tune the plugin:
- devMigrationCooldown: base cooldown after a migration completes
- devMaxMigrationCooldown: upper bound on the exponential backoff
- devMigrationHistoryWindow: sliding window for counting past migrations

Requires: openshift/descheduler#591

Signed-off-by: Simone Tiraboschi <stirabos@redhat.com>
tiraboschi added a commit to tiraboschi/cluster-kube-descheduler-operator that referenced this pull request May 6, 2026
Enable the KubevirtMigrationAware plugin as filter and pre-eviction
filter in the KubeVirtRelieveAndMigrate profile. The filter hard-blocks
eviction of VMs with an in-progress migration; the pre-eviction filter
applies an adaptive cooldown after migration completes to prevent
cascading re-evictions.

Introduce three new ProfileCustomizations fields to tune the plugin:
- devMigrationCooldown: base cooldown after a migration completes
- devMaxMigrationCooldown: upper bound on the exponential backoff
- devMigrationHistoryWindow: sliding window for counting past migrations

Requires: openshift/descheduler#591

Signed-off-by: Simone Tiraboschi <stirabos@redhat.com>
tiraboschi added a commit to tiraboschi/cluster-kube-descheduler-operator that referenced this pull request May 6, 2026
Enable the KubevirtMigrationAware plugin as filter and pre-eviction
filter in the KubeVirtRelieveAndMigrate profile. The filter hard-blocks
eviction of VMs with an in-progress migration; the pre-eviction filter
applies an adaptive cooldown after migration completes to prevent
cascading re-evictions.

Introduce three new ProfileCustomizations fields to tune the plugin:
- devMigrationCooldown: base cooldown after a migration completes
- devMaxMigrationCooldown: upper bound on the exponential backoff
- devMigrationHistoryWindow: sliding window for counting past migrations

Requires: openshift/descheduler#591

Signed-off-by: Simone Tiraboschi <stirabos@redhat.com>
tiraboschi added a commit to tiraboschi/cluster-kube-descheduler-operator that referenced this pull request May 7, 2026
Enable the KubevirtMigrationAware plugin as filter and pre-eviction
filter in the KubeVirtRelieveAndMigrate profile. The filter hard-blocks
eviction of VMs with an in-progress migration; the pre-eviction filter
applies an adaptive cooldown after migration completes to prevent
cascading re-evictions.

Introduce three new ProfileCustomizations fields to tune the plugin:
- devMigrationCooldown: base cooldown after a migration completes
- devMaxMigrationCooldown: upper bound on the exponential backoff
- devMigrationHistoryWindow: sliding window for counting past migrations

Requires: openshift/descheduler#591

Signed-off-by: Simone Tiraboschi <stirabos@redhat.com>
tiraboschi added a commit to tiraboschi/cluster-kube-descheduler-operator that referenced this pull request May 13, 2026
Enable the KubevirtMigrationAware plugin as filter and pre-eviction
filter in the KubeVirtRelieveAndMigrate profile. The filter hard-blocks
eviction of VMs with an in-progress migration; the pre-eviction filter
applies an adaptive cooldown after migration completes to prevent
cascading re-evictions.

Introduce three new ProfileCustomizations fields to tune the plugin:
- devMigrationCooldown: base cooldown after a migration completes
- devMaxMigrationCooldown: upper bound on the exponential backoff
- devMigrationHistoryWindow: sliding window for counting past migrations

Requires: openshift/descheduler#591

Signed-off-by: Simone Tiraboschi <stirabos@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant