Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using the same flavors in different resources might lead to unschedulable pods #167

Closed
Tracked by #222
alculquicondor opened this issue Mar 30, 2022 · 20 comments · Fixed by #296
Closed
Tracked by #222
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. kind/productionization priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.

Comments

@alculquicondor
Copy link
Contributor

What happened:

When using the same flavors in multiple resources:

  requestableResources:
  - name: "cpu"
    flavors:
    - resourceFlavor: x86
      quota:
        guaranteed: 9
    - resourceFlavor: arm
      quota:
        guaranteed: 12
  - name: "memory"
    flavors:
    - resourceFlavor: x86
      quota:
        guaranteed: 36Gi
    - resourceFlavor: arm
      quota:
        guaranteed: 48Gi

a workload could get admitted for cpu and memory with different flavors.

What you expected to happen:

workloads to get admitted to the same flavor for different resources.

@alculquicondor alculquicondor added the kind/bug Categorizes issue or PR as related to a bug. label Mar 30, 2022
@ahg-g ahg-g added kind/productionization priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Apr 13, 2022
@ahg-g
Copy link
Contributor

ahg-g commented May 1, 2022

This is going to be very tricky to solve.

I think the best solution is to validate that a label key is used by one resource only.

@ahg-g
Copy link
Contributor

ahg-g commented May 1, 2022

The example in the description doesn't make sense in practice and we should make sure that users can't create such setups.

@ahg-g
Copy link
Contributor

ahg-g commented May 1, 2022

I think the best solution is to validate that a label key is used by one resource only.

One issue here is if the ResourceFlavor get updated. But I think we need to make ResourceFlavor immutable any way (to ensure that admitted workloads reference the exact same flavor version assumed by the scheduler), and also prevent deleting a ResourceFlavor if there is any ClusterQueue referencing it.

Updating a ResourceFlavor should be done by creating a new object with a different name, then changing the ClusterQueues to reference the new ResourceFlavor, and then the old ResourceFlavor can be deleted.

The idea is to make it super hard to place the system in a nonsensical state.

@alculquicondor
Copy link
Contributor Author

I think the best solution is to validate that a label key is used by one resource only.

That's not enough to prevent a flavor to be used for different resources.

A potential solution is to include the resource name in the ResourceFlavor object and validate that ClusterQueues only reference the flavor for the corresponding resource.

The example in the description doesn't make sense in practice and we should make sure that users can't create such setups.

Why not? Your nodes might have certain cpu/memory ratio for the flavor and you might want to set the limits accordingly.

@ahg-g
Copy link
Contributor

ahg-g commented May 2, 2022

That's not enough to prevent a flavor to be used for different resources.

Preventing a flavor from being used by different resources doesn't address the issue reported here because different flavors may use the same label and so cause the same conflict. Enforcing a label key to be used by a single resource ensures that it never gets assigned different values because of conflicting assignment across resources.

Why not? Your nodes might have certain cpu/memory ratio for the flavor and you might want to set the limits accordingly.

That is fair, but to address this case I think it is better to have an explicit API where the user can express that there is a dependency between two resources, and in which case they should have the exact same set of flavors, and so in the scheduler we check on those dependent resources together at the same time.

@alculquicondor
Copy link
Contributor Author

Enforcing a label key to be used by a single resource ensures that it never gets assigned different values because of conflicting assignment across resources.

How do you plan to implement this? When do you validate? ResourceFlavor creation or ClusterQueue creation?

That is fair, but to address this case I think it is better to have an explicit API where the user can express that there is a dependency between two resources

I like this. Something like

  memory:
    flavorsFrom: cpu

@ahg-g
Copy link
Contributor

ahg-g commented May 7, 2022

How do you plan to implement this? When do you validate? ResourceFlavor creation or ClusterQueue creation?

If we make ResourceFlavor immutable and disallow deleting it until no CQ references it, then we can validate on CQ creation/update.

@xiaoxubeii
Copy link

Currently, the appropriate flavor should match with CQ labels / node selector / affinity. Different resources can use different flavor, but these selected flavor are in line with the filtering criteria. So is it in line with expectations even if it cannot be scheduled?

One issue here is if the ResourceFlavor get updated. But I think we need to make ResourceFlavor immutable any way (to ensure that admitted workloads reference the exact same flavor version assumed by the scheduler), and also prevent deleting a ResourceFlavor if there is any ClusterQueue referencing it.

I think we need to determine whether it is referenced or not during ResourceFlavor updates and deletions, and if so, disallow the operation.

@xiaoxubeii
Copy link

I can help with that.
/assign

@alculquicondor
Copy link
Contributor Author

Can you please hold? I don't think we have agreed in the high level solution for this problem.

@xiaoxubeii
Copy link

Can you please hold? I don't think we have agreed in the high level solution for this problem.

OK, I will hold the PR until agreement.

@ahg-g
Copy link
Contributor

ahg-g commented Jul 11, 2022

For this one, perhaps the ultimate solution is to have an explicit API to express dependencies;

For the time being I think we can validate that a CQ shouldn't have more than one resource using a label key, for this to work we need to:

  1. Disallow ResourceFlavor updates if being used by a CQ
  2. Validate on CQ create/update that the ResourceFlavors used across resources don't share a label key. This will require looking up the ResourceFlavors the CQ uses on admission, which I think should be fine, CQs are not resources that frequently gets created or updated.

@alculquicondor
Copy link
Contributor Author

alculquicondor commented Jul 11, 2022

Disallow ResourceFlavor updates if being used by a CQ

I'm worried this might cause usability problems. It might be hard for an administrator to identify which CQs are using a resource, so they can remove them. And then they have to wait for running workloads to finish.

But even if that wasn't a concern, I'm not convinced that validating that a label key is not used across resources is a necessary step to fix the issue reported.

For this one, perhaps the ultimate solution is to have an explicit API to express dependencies;

Probably, but how does that look like? I'm not sure my suggestion above is enough

memory:
  flavorsFrom: cpu

because we still need to specify the quota for each flavor.

I think what we need to express is that certain resources have the same flavors and they must be verified together. What if this is just enforced through validation? Two resources can either have completely different flavors or they must share all flavors. And we can just look at flavor names for this.

Then, when the scheduler identifies that some resources are grouped due to having the same resource flavors, it iterates flavors->resources to verify if a workload fits, instead of the current resources->flavors.

@ahg-g
Copy link
Contributor

ahg-g commented Jul 11, 2022

I don't think there are usability concerns; with an API like kueue where we have multiple CRDs with dependencies, I would lean towards being more strict to avoid placing the system in a non-sense state.

Two resources can either have completely different flavors or they must share all flavors. And we can just look at flavor names for this.

I am ok with an implicit API, perhaps order should also be verified to be the same as well just so it is easier to do the flavors->resources iteration.

If you think that the scheduling loop can be reversed, then we can proceed with this solution.

@ahg-g
Copy link
Contributor

ahg-g commented Jul 12, 2022

Although checking only the flavor names isn't enough because one could create flavors of different names but use the same label key...

@alculquicondor
Copy link
Contributor Author

Although checking only the flavor names isn't enough because one could create flavors of different names but use the same label key...

I'm willing to accept that as a user error that we can document. This is on the hands of the administrator, who should be a power users.

I am ok with an implicit API, perhaps order should also be verified to be the same as well just so it is easier to do the flavors->resources iteration.

If you think that the scheduling loop can be reversed, then we can proceed with this solution.

I will tinker with the code for a bit.

@ahg-g
Copy link
Contributor

ahg-g commented Jul 12, 2022

I'm willing to accept that as a user error that we can document. This is on the hands of the administrator, who should be a power users.

We do have enough context and information to prevent it, and I don't think there is a use case where we would want to allow it.

@alculquicondor
Copy link
Contributor Author

We do... but it requires a lot of dealing with finalizers, which might be risky. If anything, I would leave it as a follow up.

@ahg-g
Copy link
Contributor

ahg-g commented Jul 12, 2022

Leaving it as a followup is fine, it may not require more work with finalizers than what we do now if we make flavors immutable (then the question is whether this is too restrictive); in any case, we can proceed without it for now.

@alculquicondor
Copy link
Contributor Author

/assign
For now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. kind/productionization priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
3 participants