Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update GetTopologyHints() for TopologyManager Hint Providers to return a map #80569

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 1 addition & 1 deletion pkg/kubelet/cm/cpumanager/cpu_manager.go
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ type Manager interface {

// GetTopologyHints implements the Topology Manager Interface and is
// consulted to make Topology aware resource alignments
GetTopologyHints(pod v1.Pod, container v1.Container) []topologymanager.TopologyHint
GetTopologyHints(pod v1.Pod, container v1.Container) map[string][]topologymanager.TopologyHint
}

type manager struct {
Expand Down
4 changes: 2 additions & 2 deletions pkg/kubelet/cm/cpumanager/fake_cpu_manager.go
Original file line number Diff line number Diff line change
Expand Up @@ -47,9 +47,9 @@ func (m *fakeManager) RemoveContainer(containerID string) error {
return nil
}

func (m *fakeManager) GetTopologyHints(pod v1.Pod, container v1.Container) []topologymanager.TopologyHint {
func (m *fakeManager) GetTopologyHints(pod v1.Pod, container v1.Container) map[string][]topologymanager.TopologyHint {
klog.Infof("[fake cpumanager] Get Topology Hints")
return []topologymanager.TopologyHint{}
return map[string][]topologymanager.TopologyHint{}
}

func (m *fakeManager) State() state.Reader {
Expand Down
6 changes: 4 additions & 2 deletions pkg/kubelet/cm/cpumanager/topology_hints.go
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ import (
"k8s.io/kubernetes/pkg/kubelet/cm/topologymanager/socketmask"
)

func (m *manager) GetTopologyHints(pod v1.Pod, container v1.Container) []topologymanager.TopologyHint {
func (m *manager) GetTopologyHints(pod v1.Pod, container v1.Container) map[string][]topologymanager.TopologyHint {
// The 'none' policy does not generate topology hints.
if m.policy.Name() == string(PolicyNone) {
return nil
Expand Down Expand Up @@ -60,7 +60,9 @@ func (m *manager) GetTopologyHints(pod v1.Pod, container v1.Container) []topolog
cpuHints := m.generateCPUTopologyHints(available, requested)
klog.Infof("[cpumanager] TopologyHints generated for pod '%v', container '%v': %v", pod.Name, container.Name, cpuHints)

return cpuHints
return map[string][]topologymanager.TopologyHint{
string(v1.ResourceCPU): cpuHints,
}
}

// generateCPUtopologyHints generates a set of TopologyHints given the set of
Expand Down
5 changes: 2 additions & 3 deletions pkg/kubelet/cm/cpumanager/topology_hints_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -130,11 +130,11 @@ func TestGetTopologyHints(t *testing.T) {
name: "Request 11 CPUs, 4 available on Socket 0, 6 available on Socket 1",
pod: *testPod4,
container: *testContainer4,
expectedHints: []topologymanager.TopologyHint{},
expectedHints: nil,
},
}
for _, tc := range tcases {
hints := m.GetTopologyHints(tc.pod, tc.container)
hints := m.GetTopologyHints(tc.pod, tc.container)[string(v1.ResourceCPU)]
if len(tc.expectedHints) == 0 && len(hints) == 0 {
continue
}
Expand All @@ -147,6 +147,5 @@ func TestGetTopologyHints(t *testing.T) {
if !reflect.DeepEqual(tc.expectedHints, hints) {
t.Errorf("Expected in result to be %v , got %v", tc.expectedHints, hints)
}

}
}
31 changes: 25 additions & 6 deletions pkg/kubelet/cm/topologymanager/topology_manager.go
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ type manager struct {

//HintProvider interface is to be implemented by Hint Providers
type HintProvider interface {
GetTopologyHints(pod v1.Pod, container v1.Container) []TopologyHint
GetTopologyHints(pod v1.Pod, container v1.Container) map[string][]TopologyHint
}

//Store interface is to allow Hint Providers to retrieve pod affinity
Expand Down Expand Up @@ -164,16 +164,35 @@ func (m *manager) calculateAffinity(pod v1.Pod, container v1.Container) Topology
// Get the TopologyHints from a provider.
hints := provider.GetTopologyHints(pod, container)

// If hints is nil, overwrite 'hints' with a preferred any-socket affinity.
// If hints is nil, insert a single, preferred any-socket hint into allProviderHints.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this is need? alternative will be to not include it in allProviderHints.
For example if I have cpu provider and device plugin provider and only cpu provider has hint, I will only add them in the allProviderHints. also in Line 177 when you are going per resource if that resource don't have hint is should not be included in the allProviderHints. This is also related to my comment below that it better to have allProviderHints as map[string][] TopologyHint, so we will keep a map of resources that have hints.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we keep it as a map below, then it's probably fine to remove this. As of now, having a {1,1:true} is the same as not having a hint at all (at least for the best effort policy). I can see how it might cause problems though when integrating your Merge() abstraction. At what level in this logic do you pass things down to the Merge()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pictured you passing the entire hints map into merge and having it pop out the single TopologyHint, with the current logic becoming the implementation of the best-effort merge policy.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So my intention was like in the POC code [1]. you see that I am passing a permutation of map[string]TopologyHint and from permutation I create the merged hint.
I assumed that I will have only resources with hints, but like you mention above:
nil - as don't care
{} - as care but not align.
So I wonder in what situation in the best-effort policy we will have resource with empty hints?

[1] https://github.com/moshe010/kubernetes/blob/numa_toplogy_new/pkg/kubelet/cm/topologymanager/topology_manager.go#L182-L183

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now I have this happening in the device manager when the set of available devices is less than the set of requested devices for a specific resource type:
1a2b03b#diff-02911c3198a5635aba854ec2d6844cfdR47

This can happen if some devices becomes unhealthy after the scheduler has admitted the pod, but before the device manager has done its allocation (which happens after the call to
GetTopologyHints()).

Thinking about his more though, this really this should return an error and not an empty list. Maybe we need to modify the interface again to allow us to return errors here.

With that said, in the future, I see the need for {} becoming important once we introduce the GetPreferredAllocations() call discussed in kubernetes/enhancements#1121

Once that is in place, we will need a way of encoding that there are no preferred allocations, and that we'd prefer to block admission of the pod until a preferred allocation can be calculated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

regarding the case of devices becomes unhealthy after the scheduler has admitted the pod I agree with you it should be error and not empty list. We should probably need to change the interface.

Regrading the GetPreferredAllocations(), so I understand you want to use {} to indicate that there are no preferred allocations. So instead of returning {} I think we should generate hints of all allocation and mark the good ones as preferred. (if there are not preferred allocation all the hints will be preferred =false) It up to the policy to decide if to admit or not. So we can use best-effort policy top pick not preferred one. we can use strict policy to failed. We can even add another policy that strict the GPU preferred allocation and best effort other resources like cpu and nic. (it like best-effort but with checking the GPU hint is preferred) This can be done once we pass the all resources hints as a map and implement merge abstraction
What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, of course. You are right. That is actually how it is designed and exactly what I outlined in the proposal / feedback document. Not sure what I was thinking when I wrote this this morning.

So yeah, not sure exactly when we would have a „care but no hints“.

Probably need to rethink this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I agree here that we should always return hints, and look at the case where the resources have become unavailable and we no longer have enough to satisfy the request.

Should the topology manager error and fail the pod? Or should it delegate that to the relevant providers, ie. when device manager comes to do actual allocation it will fail at this stage?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have create PR to extend GetTopologyHints to return error see #81687

if hints == nil || len(hints) == 0 {
klog.Infof("[topologymanager] Hint Provider has no preference for socket affinity")
klog.Infof("[topologymanager] Hint Provider has no preference for socket affinity with any resource")
affinity, _ := socketmask.NewSocketMask()
affinity.Fill()
hints = []TopologyHint{{affinity, true}}
allProviderHints = append(allProviderHints, []TopologyHint{{affinity, true}})
continue
}

// Accumulate the sorted hints into a [][]TopologyHint slice
allProviderHints = append(allProviderHints, hints)
// Otherwise, accumulate the hints for each resource type into allProviderHints.
for resource := range hints {
if hints[resource] == nil {
klog.Infof("[topologymanager] Hint Provider has no preference for socket affinity with resource '%s'", resource)
affinity, _ := socketmask.NewSocketMask()
affinity.Fill()
allProviderHints = append(allProviderHints, []TopologyHint{{affinity, true}})
continue
}

if len(hints[resource]) == 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what case we will get to a situation that we don't any hint for resource? isn't that will blocked on the scheduler level?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The semantics we agreed to enforce are:

  1. nil == don't care --> (1,1: true)
  2. {} == care but has no alignment --> (1,1: false)

Copy link
Contributor Author

@klueska klueska Jul 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to hardcode the (1,1: true) or the (1,1: false) inside the hintprovider, because different policies might want to encode the "don't care" and "care but has no alignment" differently.

The current semantics are admittedly "best-effort" specific and I'd imagine they would be pushed down into the logic of your Merge() abstraction.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, this logic should be pushed down to the merge abstraction.
I am investigating what is the best way to do that. So for "nil == don't care" we can just remove the resource from the allproviderhints (better name would be allresourceshints). and for "{} == care has no alignment" I don't understand in what case we will have a resource with empty hints. Can you elaborate on this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds to me that the hints shouldn't be nil if the plugin cares about Topology as there has to be some hint even if it is not preferred.

A nil hint would indicate to me that there were not enough resources to calculate the hint and there should have been an error.

klog.Infof("[topologymanager] Hint Provider has no possible socket affinities for resource '%s'", resource)
affinity, _ := socketmask.NewSocketMask()
affinity.Fill()
allProviderHints = append(allProviderHints, []TopologyHint{{affinity, false}})
continue
}

allProviderHints = append(allProviderHints, hints[resource])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you going over all the resources just to change them from a map[string][]TopologyHint to [][]TopologyHint. Why not just keeping it as a map[string][]TopologyHint. In my merge abstraction I was expecting the permutation to be map[string]TopologyHint. I was thinking that it might be useful in the future that we will know what is the resource this hint is belong too. One case for that is GPU direct. I will be able to write a policy that align GPU PCI address with NIC PCI address which are have on the same NUMA with the cpus.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine keeping it as a map if that makes your Merge() abstraction better. I was just tring to change the code from its current state as little as possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this due time constraints for the release? not sure when kubernetes 1.16 is release, and I doubt the merge abstraction will be merged into this release. (there are other patches that should be merged first). If that the case we can keep the code as is, but we will need to revisit this later.

Copy link
Contributor Author

@klueska klueska Jul 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's more due to ease of reviewability. The smaller the change, the easier it is to review and verify its correctness relative to the existing code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@moshe010 You could introduce the change in Topology Manager as part of your PR and explain the reasoning

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, sure I will do the changes in my PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on second thought, for the current policies (best effort and strict) I don't need merge signature to be map only for more advance policies in the future. Because we are in time constraints for 1.16 I will make the merge signature to be slice and we will revisit it late.

}
}

// Iterate over all permutations of hints in 'allProviderHints'. Merge the
Expand Down