Added possibility to change number of vm's for MultipleVmsTest #356

Zyqsempai · 2019-12-05T12:53:26Z

Signed-off-by: bpopovschi zyqsempai@mail.ru

*Issue #343 *

Description of changes:
Added possibility to increase number of VM's through buildkite for TestMultipleVMs_Isolated.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Zyqsempai · 2019-12-05T14:35:07Z

xibz

Hey @Zyqsempai, thank you for taking the time to take on this issue. I had a few comments, but overall the change looks good. There's one that I think requires a little more discussion, so feel free to chime in with your thoughts on the question to how many VMs are we actually testing in parallel.

runtime/service_integ_test.go

xibz · 2019-12-05T18:58:18Z

runtime/service_integ_test.go

+		caseTypeNumber := 1 % 2
+		c := cases[caseTypeNumber]
 		vmWg.Add(1)
 		go func(vmID int, containerCount int32, jailerConfig *proto.JailerConfig) {


I am also curious to how many are we actually running in parallel due to the StopVM call. We may need to call StopVM at a later time and have the VMs run some process to ensure we are getting accurate number of concurrency. And since 100 is a pretty large number, I'm worried that a lot of the VMs are exiting before we can test the true bottleneck of what is parallel. My guess based on the time is we are only running 15-20 or so VMs in parallel. Thoughts on this @sipsma

I don't think this is meant to be a true scale test, so we don't really need to max out the host capacity. AFAICT, 100 was chosen pretty much arbitrarily, and it is an upper bound on the number of VMs that will run concurrently, not necessarily a goal. Even if we're only running ~20 VMs at any one time, that's still more than we're running today. If we assume that more VMs means more likelihood of catching concurrency-related bugs, then a 3-4x increase in the number of running VMs sounds good to me.

At some point in the future we will need to be performing actual scale tests, but this PR doesn't need to be that.

I'm worried that a lot of the VMs are exiting before we can test the true bottleneck of what is parallel

This very well may be the case, but I agree with Noah that it's fine if so. We are primarily looking to find problems that occur as we actually perform actions on/in the VM (spin up/down containers, copy io, etc.). So even if we modified the test case to wait to call StopVM at the end, we would likely just end up with a bunch of VMs sitting around doing nothing for a while, which doesn't really improve coverage IMO. I agree though that if we want scale tests in the future we'd need to be more careful about issues like this.

Ah, I was under the impression we wanted to actually test load. In that case, this looks fine

Zyqsempai · 2019-12-05T21:32:35Z

@sipsma @xibz @nmeyerhans I made a number of vm's equal to 25, I think this is more than enough for this test.

sipsma · 2019-12-05T21:37:59Z

@sipsma @xibz @nmeyerhans I made a number of vm's equal to 25, I think this is more than enough for this test.

@Zyqsempai there may have been a misunderstanding as to the suggestion around the number of VMs; I think it's good to leave NUMBER_OF_VMS as 100. My understanding of the discussion was that even when we set it to 100, there may in practice only be 25 running at a single time because VMs don't necessarily start/stop in perfect synchrony together.

I think leaving it at 100 is good as the test still only took about a minute to run in the CI system, which is reasonable while still increasing our potential for catching race conditions.

Zyqsempai · 2019-12-05T22:14:23Z

@sipsma ok, got it, I tried to find compromise;) turn it back.

runtime/service_integ_test.go

xibz

LGTM! Thanks again @Zyqsempai. Really appreciate your efforts on this :)

sipsma

Please squash the commits into 1, then LGTM!

Signed-off-by: bpopovschi <zyqsempai@mail.ru>

Zyqsempai · 2019-12-06T00:01:13Z

@sipsma Done.

sipsma

LGTM, there was an ephemeral failure of TestMultipleVMs with some errors I haven't seen before on your latest push. It passed on re-run, so I think the updates here may already be working as expected and uncovering new bugs :-) #358

#356 Signed-off-by: Noah Meyerhans <nmeyerha@amazon.com>

…ependabot/go_modules/github.com/containernetworking/cni-0.8.1 Bump github.com/containernetworking/cni from 0.8.0 to 0.8.1

Zyqsempai changed the title ~~Added possibility to change number of vm's for MultipleVMSTEST~~ Added possibility to change number of vm's for MultipleVmsTest Dec 5, 2019

xibz suggested changes Dec 5, 2019

View reviewed changes

xibz reviewed Dec 5, 2019

View reviewed changes

runtime/service_integ_test.go Outdated Show resolved Hide resolved

xibz approved these changes Dec 5, 2019

View reviewed changes

sipsma reviewed Dec 5, 2019

View reviewed changes

Added possibility to change numer of vm's for Multiple vm's test

c218c45

Signed-off-by: bpopovschi <zyqsempai@mail.ru>

Zyqsempai force-pushed the 343-increase-number-of-vms branch from 7e63a27 to c218c45 Compare December 6, 2019 00:00

sipsma mentioned this pull request Dec 6, 2019

TestMultipleVMs failure - "exit status 148" and "race detected during execution of test" #358

Closed

sipsma approved these changes Dec 6, 2019

View reviewed changes

nmeyerhans pushed a commit that referenced this pull request Dec 6, 2019

Merge branch 'pr/356'

6f6d49d

#356 Signed-off-by: Noah Meyerhans <nmeyerha@amazon.com>

nmeyerhans merged commit c218c45 into firecracker-microvm:master Dec 6, 2019

nmeyerhans mentioned this pull request Dec 6, 2019

Increase number of VMs tested in TestMultipleVMs_Isolated #343

Closed

Added possibility to change number of vm's for MultipleVmsTest #356

Added possibility to change number of vm's for MultipleVmsTest #356

Uh oh!

Conversation

Zyqsempai commented Dec 5, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Zyqsempai commented Dec 5, 2019

Uh oh!

xibz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

xibz Dec 5, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nmeyerhans Dec 5, 2019

Choose a reason for hiding this comment

Uh oh!

sipsma Dec 5, 2019

Choose a reason for hiding this comment

Uh oh!

xibz Dec 5, 2019

Choose a reason for hiding this comment

Uh oh!

Zyqsempai commented Dec 5, 2019

Uh oh!

sipsma commented Dec 5, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Zyqsempai commented Dec 5, 2019

Uh oh!

Uh oh!

xibz left a comment

Choose a reason for hiding this comment

Uh oh!

sipsma left a comment

Choose a reason for hiding this comment

Uh oh!

Zyqsempai commented Dec 6, 2019

Uh oh!

sipsma left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Zyqsempai commented Dec 5, 2019 •

edited

Loading

xibz Dec 5, 2019 •

edited

Loading

sipsma commented Dec 5, 2019 •

edited

Loading