New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add logs indicating when global migration limits are hit #6950
Add logs indicating when global migration limits are hit #6950
Conversation
Signed-off-by: David Vossel <davidvossel@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: stu-gott The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/retest |
|
/retest |
| @@ -746,6 +746,7 @@ func (c *MigrationController) handleTargetPodCreation(key string, migration *vir | |||
|
|
|||
| // XXX: Make this configurable, think about limit per node, bandwidth per migration, and so on. | |||
| if len(runningMigrations) >= int(*c.clusterConfig.GetMigrationConfiguration().ParallelMigrationsPerCluster) { | |||
| log.Log.Object(migration).Infof("Waiting to schedule target pod for vmi [%s/%s] migration because total running parallel migration count [%d] is currently at the global cluster limit.", vmi.Namespace, vmi.Name, len(runningMigrations)) | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you feel about adding events too? Of course, it would not be ideal to trigger it every 5 seconds.
Consider a case where a user can't know how many migrations are in flight.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say that the user should have very little interest in migrations at all. The workload is not interrupted during a migration. Therefore, the alert should be redundant.
Consider a case where a user can't know how many migrations are in flight.
Migration is an administrators' task. The admin should be able to know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about associating the events with the migration objects, then the admins get the info and not the users, since the migration object is for admins.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you feel about adding events too? Of course, it would not be ideal to trigger it every 5 seconds.
I think the default inhibition in the event client and the merging of events on the cluster-level may be sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough my example case was not best :) The user is the admin in this case but I admit that the admin should have access to all migrations and therefore see how many are in flight. Therefore the event acts as additional information for individual migration objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I don't understand is what action do we expect the admin to take in the case of such an event? Or is it just for statistics, then perhaps we need a metric so admins could plan ahead?
|
/retest |
2 similar comments
|
/retest |
|
/retest |
|
/cherry-pick release-0.44 |
|
@enp0s3: new pull request created: #7021 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Right now it looks like a migration is silently being ignored when global migration limits are hit. This makes it difficult to debug issues that occur in production when these limits are hit.
This PR simply adds visibility so we at least get log messages indicating when a migration limit is encountered by the migration controller.