Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add logs indicating when global migration limits are hit #6950

Merged

Conversation

davidvossel
Copy link
Member

Right now it looks like a migration is silently being ignored when global migration limits are hit. This makes it difficult to debug issues that occur in production when these limits are hit.

This PR simply adds visibility so we at least get log messages indicating when a migration limit is encountered by the migration controller.

NONE

Signed-off-by: David Vossel <davidvossel@gmail.com>
@kubevirt-bot kubevirt-bot added release-note-none Denotes a PR that doesn't merit a release note. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. size/XS labels Dec 14, 2021
Copy link
Member

@stu-gott stu-gott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@kubevirt-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: stu-gott

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubevirt-bot kubevirt-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 14, 2021
@vladikr
Copy link
Member

vladikr commented Dec 15, 2021

/retest
/lgtm

@kubevirt-bot kubevirt-bot added the lgtm Indicates that a PR is ready to be merged. label Dec 15, 2021
@kubevirt-commenter-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs.
Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@@ -746,6 +746,7 @@ func (c *MigrationController) handleTargetPodCreation(key string, migration *vir

// XXX: Make this configurable, think about limit per node, bandwidth per migration, and so on.
if len(runningMigrations) >= int(*c.clusterConfig.GetMigrationConfiguration().ParallelMigrationsPerCluster) {
log.Log.Object(migration).Infof("Waiting to schedule target pod for vmi [%s/%s] migration because total running parallel migration count [%d] is currently at the global cluster limit.", vmi.Namespace, vmi.Name, len(runningMigrations))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you feel about adding events too? Of course, it would not be ideal to trigger it every 5 seconds.

Consider a case where a user can't know how many migrations are in flight.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say that the user should have very little interest in migrations at all. The workload is not interrupted during a migration. Therefore, the alert should be redundant.

Consider a case where a user can't know how many migrations are in flight.

Migration is an administrators' task. The admin should be able to know.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about associating the events with the migration objects, then the admins get the info and not the users, since the migration object is for admins.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you feel about adding events too? Of course, it would not be ideal to trigger it every 5 seconds.

I think the default inhibition in the event client and the merging of events on the cluster-level may be sufficient.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough my example case was not best :) The user is the admin in this case but I admit that the admin should have access to all migrations and therefore see how many are in flight. Therefore the event acts as additional information for individual migration objects.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I don't understand is what action do we expect the admin to take in the case of such an event? Or is it just for statistics, then perhaps we need a metric so admins could plan ahead?

@kubevirt-commenter-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs.
Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

2 similar comments
@kubevirt-commenter-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs.
Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@kubevirt-commenter-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs.
Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@kubevirt-bot kubevirt-bot merged commit cd67738 into kubevirt:main Dec 17, 2021
@enp0s3
Copy link
Contributor

enp0s3 commented Jan 3, 2022

/cherry-pick release-0.44

@kubevirt-bot
Copy link
Contributor

@enp0s3: new pull request created: #7021

In response to this:

/cherry-pick release-0.44

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. lgtm Indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. size/XS
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants