Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VirtualMachineReplicaSet to scale VMs #453

Merged
merged 36 commits into from Sep 27, 2017

Conversation

@rmohr
Copy link
Member

rmohr commented Sep 19, 2017

Initial simple implementation of a VirtualMachineReplicaSet.

Usage:

apiVersion: kubevirt.io/v1alpha1
kind: VirtualMachineReplicaSet
metadata:
  name: myrs
spec:
  replicas: 3
  selector:
    matchLabels:
      myselector: myselector
  template:
    metadata:
      name: test
      labels:
        myselector: myselector
    spec:
      domain:
        devices:
          consoles:
          - type: pty
        memory:
          unit: MB
          value: 64
        os:
          type:
            os: hvm
        type: qemu

TODO:

  • proposal
  • logging
  • events
  • graceful delete
@rmohr rmohr requested review from davidvossel and stu-gott Sep 19, 2017
@rmohr rmohr force-pushed the rmohr:replicaset branch 3 times, most recently from 68e4af9 to 11a4e87 Sep 19, 2017
Copy link
Member

stu-gott left a comment

Woot! I can't wait to see this land!

apiVersion: kubevirt.io/v1alpha1
kind: VirtualMachineReplicaSet
metadata:
name: replicasetn9z3w

This comment has been minimized.

Copy link
@stu-gott

stu-gott Sep 19, 2017

Member

I'm confused, is this example the template or the manifest? If the former, this should be a generatedName, right?

This comment has been minimized.

Copy link
@rmohr

rmohr Sep 19, 2017

Author Member

It is the name of the VirtualMachineReplicaSet. I can give it a nicer name.

}()
}
}
wg.Wait()

This comment has been minimized.

Copy link
@stu-gott

stu-gott Sep 19, 2017

Member

So this blocks until the changes to VM count have been scheduled?

This comment has been minimized.

Copy link
@rmohr

rmohr Sep 19, 2017

Author Member

It could (I think the 8s ReplicaSet does), here I just do concurrent request on the apiserver, to not add up request delays, so that we can release the controller worker faster. I kind of borrowed that from the ReplicaSet controller from k8s.

Do().
Into(replicasetList)
for _, replicaset := range replicasetList.Items {
replicaset.SetGroupVersionKind(v1.VMReplicaSetGroupVersionKind)

This comment has been minimized.

Copy link
@stu-gott

stu-gott Sep 19, 2017

Member

gross :)

I assume you wouldn't have done this if there was a better way. Are we somehow requesting generic replicasets so they're not properly typed?

This comment has been minimized.

Copy link
@rmohr

rmohr Sep 19, 2017

Author Member

That is actually a strange thing. The k8s clients don't fill that in. I think the proper solution would be to fill them in when we update or create a resource. Have to look into that again.


func (v *rc) Get(name string, options k8smetav1.GetOptions) (replicaset *v1.VirtualMachineReplicaSet, err error) {
replicaset = &v1.VirtualMachineReplicaSet{}
err = v.restClient.Get().

This comment has been minimized.

Copy link
@stu-gott

stu-gott Sep 19, 2017

Member

Implicit assignment needs :=

Occurs a couple times here

This comment has been minimized.

Copy link
@rmohr

rmohr Sep 19, 2017

Author Member

Oh the function has named return types, No need to use ":=". The only reason why the code failed is because of the client-go update.

Do().
Into(replicaset)
replicaset.SetGroupVersionKind(v1.VMReplicaSetGroupVersionKind)
return

This comment has been minimized.

Copy link
@stu-gott

stu-gott Sep 19, 2017

Member

I'm assuming these returns that don't match function signature don't compile.

This comment has been minimized.

Copy link
@rmohr

rmohr Sep 19, 2017

Author Member

had to adjust some functions to work with latest master, not that we updated client-go. Done.

@rmohr rmohr force-pushed the rmohr:replicaset branch from 11a4e87 to a860f3c Sep 19, 2017
Copy link
Member

fabiand left a comment

Nice approach.
I'm just wondering how this implementation compares to a solution which could be achieved if #276 was implemented.

I.e. if VMs could be "attached" to pods. In such a case I was just wondering if we could get around of implementing a custom workload type.

@rmohr did you consider such an approach?

[...]
```

`spec.template` is similar to a `VirtualMachineSpec`. `spec.replicas` specifies

This comment has been minimized.

Copy link
@fabiand

fabiand Sep 19, 2017

Member

How similar is it? Wort specifying what it can/can not contain?

This comment has been minimized.

Copy link
@rmohr

rmohr Sep 19, 2017

Author Member

Actually not just similar, it is the same. Will fix the docu.

This comment has been minimized.

Copy link
@rmohr

rmohr Sep 19, 2017

Author Member

done


```yaml
status:
conditions:

This comment has been minimized.

Copy link
@fabiand

fabiand Sep 19, 2017

Member

Conditions are great!

@rmohr

This comment has been minimized.

Copy link
Member Author

rmohr commented Sep 19, 2017

@fabiand yes i considered reusing it. It is not yet fully from the table. So this can also be seen as a POC for the completely layered approach. Regarding to not having our own cluster level object, I think no matter if we reuse ReplicaSets, or not, we need that.

for i := 0; i < diff; i++ {
go func(idx int) {
defer wg.Done()
deleteCandidate := vms[idx]

This comment has been minimized.

Copy link
@davidvossel

davidvossel Sep 19, 2017

Member

In the future we may want to introduce delete strategies here, things like "delete oldest first"

This comment has been minimized.

Copy link
@rmohr

rmohr Sep 20, 2017

Author Member

Yep, that, or we can come up with a "best" sorting for that use-case. The k8s replica set does that on the pod level. Did not think about it in detail, though.


// vmChangeFunc checks if the supplied VM matches a replica set controller in it's namespace
// and wakes the first controller which matches the VM labels.
func (c *VMReplicaSet) vmChangeFunc(obj interface{}) {

This comment has been minimized.

Copy link
@davidvossel

davidvossel Sep 19, 2017

Member

One common pattern people use when replicating resources, whether it be containers or VMs, is the concept of isolating a resource from the group controller.

If someone manually changed the labels on a VM to isolate it from the replication controller, and then decides to terminate the resource, I don't believe the replication controller would ever get triggered to resize.

Basically, if someone changes a label on a VM and then deletes the VM, the VM replica set would be permanently at n-1 VMs until something else triggered the replica set to get processed.

This comment has been minimized.

Copy link
@rmohr

rmohr Sep 20, 2017

Author Member

The ReplicaSet in k8s reacts to relabeling of Pods. If the labels do not match anymore, it either gets adopted by another matching ReplicaSet, or it stays in an orphaned state. The original ReplicaSet is woken up and will immediately create a new Pod.

This comment has been minimized.

Copy link
@rmohr

rmohr Sep 20, 2017

Author Member

Actually, looks like I was wrong.

That comment indicates that it would wake up both controllers. Might be an artifact of the pre-controller-ref aera. However, the actual code in that function only reacts to label changes if the controller reference changes, or if the controller behind the reference was deleted. To detach a resource for sure from the controller (also prevent re-adoption by another controller), looks like one has to change labels and delete the controller ref. If only the labels are changed, it will still stick with the crating controller. If the controller reference is removed but the labels still stay the same it will be adopted again.

I will take that into account when adding the controller-ref. 👍

This comment has been minimized.

Copy link
@rmohr

rmohr Sep 20, 2017

Author Member

Ok, my original comment was right. The flow is as follows with controller-refs:

  1. if labels changed, inform the owner only
  2. the owner will try to release the object (remove the owner ref)
  3. this causes another update on the resource. Now there is no owner, so all matching remaining controllers are informed
  4. they race for adopting that resource

So changing a label on a Pod in the case of ReplicaSets leads to releasing the object and reclaiming it by another controlelr if one matches. Further the original ReplicaSet creates a new Pod as a replacement for the old pod.

@davidvossel

This comment has been minimized.

Copy link
Member

davidvossel commented Sep 19, 2017

@rmohr I agree that a having a VM specific cluster object to represent the replica set makes sense. I also like the idea of calling it a VirtualMachineReplicaSet. That naming convention makes it pretty obvious what the object does within the context of k8s.

I'm still contemplating the approach of implementing our own replication logic instead of leveraging the k8s controllers to do this for us.

For the sake of argument, how would it work if we re-used the k8s replica set behind the scenes? Below are my thoughts on how I'd approach it..

  • Creating a VirtualMachineReplicaSet (VMRS) would generate an internal k8s replica set
  • The k8s replica set would start the VM pods.
  • The VMRS controller would watch for pods related to itself and dynamically introduce VMs for each pod using the spec in the VMRS. These VMs should have a flag on them that prevents the VM controller from ever generating a POD for them.
  • The VM controller would then see a VM got created, that corresponding POD already exists, and then proceed with assigning the VM to the node.
  • When a POD related to a VMRS is completed/terminated/failed/whatever, the VMRS controller handles cleaning up the corresponding VM definition associated with it.

Basically, with this design the VMRS controller would just do these things.

  • create/update/delete replica sets for corresponding VMRC objects.
  • dynamically generate VM objectes as VMRS related pods appear
  • remove VM objects as VMRC related pods disappear.

Thoughts on the Pros/Cons of something like what I've outlined above vs rolling out our own replication logic?

@rmohr

This comment has been minimized.

Copy link
Member Author

rmohr commented Sep 20, 2017

retest this please

1 similar comment
@rmohr

This comment has been minimized.

Copy link
Member Author

rmohr commented Sep 20, 2017

retest this please

@rmohr rmohr force-pushed the rmohr:replicaset branch from fe1fcb3 to 51299b2 Sep 20, 2017
@rmohr rmohr changed the title [WIP] VirtualMachineReplicaSet to scale VMs VirtualMachineReplicaSet to scale VMs Sep 20, 2017
@rmohr rmohr force-pushed the rmohr:replicaset branch from aaf86d7 to 3919239 Sep 20, 2017
@rmohr

This comment has been minimized.

Copy link
Member Author

rmohr commented Sep 20, 2017

retest this please

2 similar comments
@rmohr

This comment has been minimized.

Copy link
Member Author

rmohr commented Sep 20, 2017

retest this please

@rmohr

This comment has been minimized.

Copy link
Member Author

rmohr commented Sep 20, 2017

retest this please

@davidvossel

This comment has been minimized.

Copy link
Member

davidvossel commented Sep 21, 2017

I've seen the cloud-init tests timeout occasionally. I've increased the timeouts to match that of the console functional tests over here in this pr #444.

rmohr added 10 commits Sep 18, 2017
Signed-off-by: Roman Mohr <rmohr@redhat.com>
Updating resources with KubeVirtClient failed because of the missing
resource name.

Signed-off-by: Roman Mohr <rmohr@redhat.com>
Make sure the VirtualMachineReplicaSetList implements the List interface
from k8s, to allow listing inside the k8s code.

Signed-off-by: Roman Mohr <rmohr@redhat.com>
Signed-off-by: Roman Mohr <rmohr@redhat.com>
VirtualMachineReplicaSet will calcualte the number of missing VMs and
will invoke parallel create and delete calls to the apiserver.
Start with a replica count of zero, scale a few times up and down and
finally return to zero replicas
Signed-off-by: Roman Mohr <rmohr@redhat.com>
Signed-off-by: Roman Mohr <rmohr@redhat.com>
Signed-off-by: Roman Mohr <rmohr@redhat.com>
rmohr added 4 commits Sep 25, 2017
Right now almost the complete event update handlers for the
VirtualMachineReplicaSet consist of  equal code. Add a helper and reuse
it in the handlers.

Signed-off-by: Roman Mohr <rmohr@redhat.com>
Signed-off-by: Roman Mohr <rmohr@redhat.com>
In order to avoid difficulties when running the functional tests use VMs
with ephemerals and independent disks for the created VM sets.

Signed-off-by: Roman Mohr <rmohr@redhat.com>
Once a VM is in running or migrating state, count it as ready, and
update the VirtualMachineReplicaSet readyReplicas status field.

Signed-off-by: Roman Mohr <rmohr@redhat.com>
@rmohr rmohr force-pushed the rmohr:replicaset branch from 77d6665 to fee8cb1 Sep 25, 2017
@rmohr

This comment has been minimized.

Copy link
Member Author

rmohr commented Sep 25, 2017

Everything except copying the unit test, and updating the proposal itself should be done now. Will update the proposal itself asap, including a justification for not just wrapping around the replication controller.


// Check if only 10 are created
controller.Execute()
Expect(invocations).To(Equal(10))

This comment has been minimized.

Copy link
@davidvossel

davidvossel Sep 25, 2017

Member

Are the mock callbacks somehow execute asynchronously? This regression test failed during one of its runs, but passes at other times.

This comment has been minimized.

Copy link
@rmohr

rmohr Sep 26, 2017

Author Member

Seems so. I am now just specifying the expected amount of invocations. That works.

* Support controller references [2]
* Support label changes
* Support adopting orphaned Pods [2]
* Define a well known scale-down order

This comment has been minimized.

Copy link
@davidvossel

davidvossel Sep 25, 2017

Member

Should 'Live Migration' of VMs in a replica set be a milestone?

This comment has been minimized.

Copy link
@rmohr

rmohr Sep 26, 2017

Author Member

Just that it is mentioned that it should support migrations? Like the controller and migrations are implemented right now, live migration works with the controller.

@davidvossel

This comment has been minimized.

Copy link
Member

davidvossel commented Sep 25, 2017

Live migrations are an area where the VM replica set use case diverges from the container use case.

I don't think we have to have live migration completely solved within the context of replica sets before this work can be merged, but it would be great if we had a general idea of how we plan to approach that problem. @rmohr Maybe you could put your thoughts on this topic in the proposal?

rmohr added 3 commits Sep 25, 2017
Don't set the replica count after successful CRUD operations. Instead
wait for the cache to provide the right values, to avoid value
inconsistencies.

Signed-off-by: Roman Mohr <rmohr@redhat.com>
Add a mock workqueue, which allows waiting for an expected number
enqueues. This allows synchronous testing of the controller. The typical
pattern is:

  mockQueue.ExpectAdd(3)
  vmSource.Add(vm)
  vmSource.Add(vm1)
  vmSource.Add(vm2)
  mockQueue.Wait()

This ensoures that informer callbacks which are listening on vmSource
enqueued three times an object. Since enqueing is typically the last
action in listener callbacks, we can assume that the wanted scenario for
a controller is set up, and an execution will process this scenario.

Signed-off-by: Roman Mohr <rmohr@redhat.com>
VirtualMachineReplicaSet tests contain a mock controller, which did not
check if all invocations were looking right. Adding the final check in
TearDown and adjusting the tests.

Signed-off-by: Roman Mohr <rmohr@redhat.com>
@rmohr rmohr force-pushed the rmohr:replicaset branch from fee8cb1 to a035846 Sep 26, 2017
rmohr added 3 commits Sep 26, 2017
Clarify, that the VirtualMachineReplicaSet, does not guarantee that
there will ever only be the amount of specified replicas in the cluster.

Add description of the readyReplicas status field.

Signed-off-by: Roman Mohr <rmohr@redhat.com>
Update the proposal with justifications, why a reimplementation was
chosen over wrapping around the Kubernetes ReplicaSet.

Signed-off-by: Roman Mohr <rmohr@redhat.com>
Instead of counting the number of invocations of a mock method in a
callback, just specify the expected number of invocations. This removes
a race condition error in these tests.

Signed-off-by: Roman Mohr <rmohr@redhat.com>
@rmohr

This comment has been minimized.

Copy link
Member Author

rmohr commented Sep 26, 2017

@smarterclayton added some justifications why this implementation was chosen. Looking forward to hear your thoughts.

@rmohr

This comment has been minimized.

Copy link
Member Author

rmohr commented Sep 26, 2017

@davidvossel added the explanation how I plan that we support live migrations for all types of controllers to the proposal. If we follow the currently layed out architecture (independent of the new discussion on the mailing list https://groups.google.com/forum/#!topic/kubevirt-dev/4XBlPPEcIq4), the live migration story is in my opinion solid an solved, or am I missing something?

rmohr added 2 commits Sep 26, 2017
Remove testing against golang tipa and make sure we test against the
latest 1.8 and 1.9 releases.

Signed-off-by: Roman Mohr <rmohr@redhat.com>
Signed-off-by: Roman Mohr <rmohr@redhat.com>
@@ -98,6 +98,47 @@ unknown VirtualMachine states and graceful deletes, it might decide to already
create new replicas in advance, to make sure that the amount of ready replicas
stays close to the expected replica count.

### Chosen Implementation

This comment has been minimized.

Copy link
@smarterclayton

smarterclayton Sep 26, 2017

Naming wise, this does not behave like a ReplicaSet. I would recommend that you spend some time looking for an alternate naming, because you provide guarantees that far exceed a Kubernetes ReplicaSet. It will be likely confusing for users and admins: "A VMRS is like a ReplicaSet for pods except it's kind of like a StatefulSet except when it isn't".


### Guarantees

The VirtualMachineReplicaSet does **not** guarantee that there will never be

This comment has been minimized.

Copy link
@rmohr

rmohr Sep 26, 2017

Author Member

Naming wise, this does not behave like a ReplicaSet. I would recommend that you spend some time looking for an alternate naming, because you provide guarantees that far exceed a Kubernetes ReplicaSet. It will be likely confusing for users and admins: "A VMRS is like a ReplicaSet for pods except it's kind of like a StatefulSet except when it isn't".

@smarterclayton Isn't the guarantees section now pretty close to the guarantees a ReplicaSet gives? Maybe I just expressed it the wrong way. I wanted to weaken the guarantee to that of a ReplicaSet.

This comment has been minimized.

Copy link
@davidvossel

davidvossel Sep 26, 2017

Member

The VMRS implementation behaves mostly like a RS now, but I see the potential for that to change. It seems possible we'll encounter use cases that require us to strengthen these guarantees in the future. I'm doubtful we'll have to account for the same kind of strict ordering/persistence that SS aim to provide, but it's possible we'll have to diverge from the loose guarantees the k8s RS provides.

That said, at what point would our "enhanced" VMRS feature set confuse people that are already familiar with the k8s RS? For me, as long as we plan to keep the VMRS focused on ephemeral VMs, I don't have a problem with drawing the comparison between VMRS and the k8s RS. The feature set between the two may diverge some, but conceptually they're both working with replicating ephemeral instances of immutable objects.

Also copy over unit tests of copied expectations manager.

Signed-off-by: Roman Mohr <rmohr@redhat.com>
@rmohr

This comment has been minimized.

Copy link
Member Author

rmohr commented Sep 26, 2017

@davidvossel added the tests for the expectations apart from the the discussion if we should keep the name for the controller, I think everything which should be here right now is there.

Make sure that the VirtualMachineTemplate lables are matched by the
VirtualMachineReplicaSet. If they don't match, log it and ignore the
ReplicaSet.

Signed-off-by: Roman Mohr <rmohr@redhat.com>
@davidvossel

This comment has been minimized.

Copy link
Member

davidvossel commented Sep 26, 2017

@davidvossel added the tests for the expectations apart from the the discussion if we should keep the name for the controller, I think everything which should be here right now is there.

Really great work Roman! This is solid.

Code and testing wise, everything looks in order to me at this point. I'm going to wait until we conclude the controller naming discussion before marking "approved"

@smarterclayton

This comment has been minimized.

Copy link

smarterclayton commented Sep 26, 2017

Copy link
Member

davidvossel left a comment

@rmohr you rock man!

As for the naming discussion. I caught up with Roman a bit out of band. We both agree that VMRS may diverge slightly from k8s RS, but the key here is VMRS will always (for the foreseeable future) work with ephemeral VMs.

So, as long as VMRSs are managing ephemeral VMs, I feel comfortable drawing the comparison to the k8s RS.

@davidvossel davidvossel merged commit 4b43f33 into kubevirt:master Sep 27, 2017
3 checks passed
3 checks passed
continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls Coverage increased (+2.6%) to 55.211%
Details
kubevirt-functional-tests/jenkins/pr All is well
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.