-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add servergroup for workerpools #170
Conversation
@kon-angelo Label area/control-lane does not exist. |
Awesome. Thanks. Will review it soon :) |
Can we move the PR to draft until the depending MCM change is integrated? gardener/machine-controller-manager@0094e63 |
7839765
to
d71e2c0
Compare
Main changes on 2nd revision:
|
a0edcf2
to
987f12e
Compare
docs/usage-as-end-user.md
Outdated
Currently, the only configuration available is the `serverGroupConfig`. If this option is set, then a new server group will be created with the configured policy and all machines managed by this worker pool will be assigned as members of the group. | ||
|
||
Please note that the available options for `serverGroupConfig` are defined in the provider specific section of `CloudProfile`. | ||
When you specify `serverGroupConfig` in your worker configuration, a new server group will be created with the configured policy and all machines managed by this worker will be assigned as members of the created server group. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As an end-user I would be interested if that happens in-place of whether my existing machines will be terminated/replaced by new machines (i.e., rolling update)?
docs/usage-as-end-user.md
Outdated
+ The `serverGroupConfig` section is optional, but if it is included in the shoot spec, it must contain a valid policy value. | ||
+ The available `policy` values that can be used, are defined in the provider specific section of `CloudProfile`, by your operator. | ||
+ Certain policy values may induce further constraints. Using the `affinity` value is only allowed when the workers utilize a single zone. | ||
+ The `serverGroupConfig` and `serverGroupConfig.policy` fields are immutable upon creation. Users are not allowed to change the policy value and neither add or remove a `serverGroupConfig` section from a worker after it has been created. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from a worker
from a worker pool
docs/usage-as-end-user.md
Outdated
+ The available `policy` values that can be used, are defined in the provider specific section of `CloudProfile`, by your operator. | ||
+ Certain policy values may induce further constraints. Using the `affinity` value is only allowed when the workers utilize a single zone. | ||
+ The `serverGroupConfig` and `serverGroupConfig.policy` fields are immutable upon creation. Users are not allowed to change the policy value and neither add or remove a `serverGroupConfig` section from a worker after it has been created. | ||
Users can on the other hand, add new workers managed by server groups to an existing shoot and migrate their workloads to the new workers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, now reading this, the first sentence (with my comment about in-place) is clarified. Maybe this can be made more clear from the beginning, i.e., that all of this only applies for newly created worker pools while existing pools are immutable.
docs/usage-as-end-user.md
Outdated
```yaml | ||
apiVersion: openstack.provider.extensions.gardener.cloud/v1alpha1 | ||
kind: WorkerPoolProviderConfig | ||
serverGroupConfig: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we call this just serverGroup
?
docs/usage-as-end-user.md
Outdated
+ The `serverGroupConfig` section is optional, but if it is included in the shoot spec, it must contain a valid policy value. | ||
+ The available `policy` values that can be used, are defined in the provider specific section of `CloudProfile`, by your operator. | ||
+ Certain policy values may induce further constraints. Using the `affinity` value is only allowed when the workers utilize a single zone. | ||
+ The `serverGroupConfig` and `serverGroupConfig.policy` fields are immutable upon creation. Users are not allowed to change the policy value and neither add or remove a `serverGroupConfig` section from a worker after it has been created. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would mean if a user wants to change the a workerpool to use a different servergroup policy it would be required to create a new workerpool and tear down the old one?
Technically that is valid as the servergroup policies are only evaluated at creation time of a machine and based on this the placement of the machine is done. So maybe we should mention this?
allErrs = append(allErrs, field.Invalid(parent.Child("serverGroupConfig", "policy"), workerConfig.ServerGroupConfig.Policy, "no matching server group policy in cloudprofile")) | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only when we not use named return values.
allErrs = append(allErrs, field.Invalid(parent.Child("serverGroupConfig", "policy"), workerConfig.ServerGroupConfig.Policy, "no matching server group policy in cloudprofile")) | |
return | |
return append(allErrs, field.Invalid(parent.Child("serverGroupConfig", "policy"), workerConfig.ServerGroupConfig.Policy, "no matching server group policy in cloudprofile")) |
} | ||
} | ||
} | ||
return allErrs | ||
} | ||
|
||
func validateWorkerConfig(parent *field.Path, worker *core.Worker, workerConfig *api.WorkerConfig, cloudProfileConfig *api.CloudProfileConfig) (allErrs field.ErrorList) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a minor thing: We do not really much use named return values.
Do you consider to declare and initialise allErrors
explicitly and return it?
allErrs = append(allErrs, field.Invalid(parent.Child("serverGroupConfig", "policy"), workerConfig.ServerGroupConfig.Policy, "policy field cannot be empty")) | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only when we not use named return values.
allErrs = append(allErrs, field.Invalid(parent.Child("serverGroupConfig", "policy"), workerConfig.ServerGroupConfig.Policy, "policy field cannot be empty")) | |
return | |
return append(allErrs, field.Invalid(parent.Child("serverGroupConfig", "policy"), workerConfig.ServerGroupConfig.Policy, "policy field cannot be empty")) |
var ( | ||
newWorkerCfg, oldWorkerCfg *api.WorkerConfig | ||
err error | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to declare those variables explicitly? I mean we can also declare and initialise them via :=
later.
// a) worker is terminating and all server groups have to be deleted | ||
// b) worker pool is deleted | ||
// c) worker pool's server group configuration (e.g. policy) changed | ||
// d) worker pool no longer requires use of server groups |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is case d) actually possible? I though the servergroup + policy are immutable and can't be changed. I mean anyway the handling is the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Controller fully supports c) and d) cases although these are prohibited in the validation for now. To support these the internal implementation does need to do a rolling node update by generating new MachineClasses. But generally we can disable the immutability checks at any point in the future and the deletion will work.
pkg/controller/worker/machines.go
Outdated
|
||
// Include the given worker pool dependencies into the hash. | ||
if serverGroupDependency != nil { | ||
additionalHashData = append(additionalHashData, serverGroupDependency.Name, serverGroupDependency.ID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity: Wouldn't it be sufficient to just add the servergroup dependency id
to the hash? I mean this one should be unique.
// Source: github.com/gardener/gardener-extension-provider-openstack/pkg/openstack/client (interfaces: Factory,Compute) | ||
|
||
// Package mocks is a generated GoMock package. | ||
package mocks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if we should move all mock packages to pkg/mock
with subdirectories for the client
or any other mock. g/g
is doing it also this way. https://github.com/gardener/gardener/tree/master/pkg/mock
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What i don't like about this:
a) if we need changes to the provider and need to use a new openstack service not mentioned here, we would need a g/g release.
b) I do not consider these interfaces as stable to move them into the central repo. In comparison the mcm-provider-openstack uses a slightly different structure. If we consolidate them we could think about it but its premature now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My intention was not to move the mocks of the openstack extension to the g/g repo. I wanted to suggest to move all the mocks of the openstack extension to a central place like pkg/mocks
in the openstack extension repo. Then the structure would be similar like in the g/g repo. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I misunderstood. This is possible, yes. We can open an issue and give it a try
@@ -12,22 +12,51 @@ | |||
// See the License for the specific language governing permissions and | |||
// limitations under the License. | |||
|
|||
//go:generate mockgen -destination=mocks/client_mocks.go -package=mocks . Factory,Compute |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question as in pkg/openstack/client/mocks/client_mocks.go
@rfranzke maybe can you check the documentation one more time. I used the term |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After a pair review with @kon-angelo
/lgtm
f74ba11
to
3c05bb2
Compare
How to categorize this PR?
/area control-plane
/kind enhancement
/priority normal
/platform openstack
What this PR does / why we need it:
Allows to create worker pool workers in server groups. Server-groups can have a defined affinity that will protect the workers from being scheduled in the same hypervisor, thus improving HA setups.
Which issue(s) this PR fixes:1
Fixes #163
Special notes for your reviewer:
As part of the reconciliation loop we add the capability to create server groups in the
DeployMachineDependencies
function. Background and discussion topics for some of the design choices follow:soft-anti-affinity
andanti-affinity
policies.microversion
which changes the API behavior. The client uses the lowest supported microversion by default, butsoft-*
affinity variants may be introduced in a later microversion like in ConvergedCloud's case. Our implementation parses and uses the highest supported MV, but because this may not be supported in all OS environments, we opted to allow admins to set allowed policies inCloudProfile
.WorkerPoolProviderConfig
is immutable, disabling any update to the servergroup itself.machineclasses
will be created with a new hash. After the rolling node update, the old servergroup will eventually be disposed of. The update operation is only disabled by validation at the moment.Name
to create a labeling scheme. Servergroups are thus named asclusterName-workerPoolName-<random-10char>
. The maximum allowed names are 255 chars so there is plenty of room to increase the random component. But as part of the naming scheme any server group in the OS project that has a prefix asclusterName-workerPoolName
is thought to be managed by gardener and subject to deletion.Merge is pending update in MCM to include the servergroup scheduler changes
Release note: