Skip to content

Conversation

Zyqsempai
Copy link
Contributor

@Zyqsempai Zyqsempai commented Dec 5, 2019

Signed-off-by: bpopovschi zyqsempai@mail.ru

*Issue #343 *

Description of changes:
Added possibility to increase number of VM's through buildkite for TestMultipleVMs_Isolated.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@Zyqsempai Zyqsempai changed the title Added possibility to change number of vm's for MultipleVMSTEST Added possibility to change number of vm's for MultipleVmsTest Dec 5, 2019
@Zyqsempai
Copy link
Contributor Author

@sipsma PTAL

Copy link
Contributor

@xibz xibz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @Zyqsempai, thank you for taking the time to take on this issue. I had a few comments, but overall the change looks good. There's one that I think requires a little more discussion, so feel free to chime in with your thoughts on the question to how many VMs are we actually testing in parallel.

caseTypeNumber := 1 % 2
c := cases[caseTypeNumber]
vmWg.Add(1)
go func(vmID int, containerCount int32, jailerConfig *proto.JailerConfig) {
Copy link
Contributor

@xibz xibz Dec 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am also curious to how many are we actually running in parallel due to the StopVM call. We may need to call StopVM at a later time and have the VMs run some process to ensure we are getting accurate number of concurrency. And since 100 is a pretty large number, I'm worried that a lot of the VMs are exiting before we can test the true bottleneck of what is parallel. My guess based on the time is we are only running 15-20 or so VMs in parallel. Thoughts on this @sipsma

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is meant to be a true scale test, so we don't really need to max out the host capacity. AFAICT, 100 was chosen pretty much arbitrarily, and it is an upper bound on the number of VMs that will run concurrently, not necessarily a goal. Even if we're only running ~20 VMs at any one time, that's still more than we're running today. If we assume that more VMs means more likelihood of catching concurrency-related bugs, then a 3-4x increase in the number of running VMs sounds good to me.

At some point in the future we will need to be performing actual scale tests, but this PR doesn't need to be that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worried that a lot of the VMs are exiting before we can test the true bottleneck of what is parallel

This very well may be the case, but I agree with Noah that it's fine if so. We are primarily looking to find problems that occur as we actually perform actions on/in the VM (spin up/down containers, copy io, etc.). So even if we modified the test case to wait to call StopVM at the end, we would likely just end up with a bunch of VMs sitting around doing nothing for a while, which doesn't really improve coverage IMO. I agree though that if we want scale tests in the future we'd need to be more careful about issues like this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I was under the impression we wanted to actually test load. In that case, this looks fine

@Zyqsempai
Copy link
Contributor Author

@sipsma @xibz @nmeyerhans I made a number of vm's equal to 25, I think this is more than enough for this test.

@sipsma
Copy link
Contributor

sipsma commented Dec 5, 2019

@sipsma @xibz @nmeyerhans I made a number of vm's equal to 25, I think this is more than enough for this test.

@Zyqsempai there may have been a misunderstanding as to the suggestion around the number of VMs; I think it's good to leave NUMBER_OF_VMS as 100. My understanding of the discussion was that even when we set it to 100, there may in practice only be 25 running at a single time because VMs don't necessarily start/stop in perfect synchrony together.

I think leaving it at 100 is good as the test still only took about a minute to run in the CI system, which is reasonable while still increasing our potential for catching race conditions.

@Zyqsempai
Copy link
Contributor Author

@sipsma ok, got it, I tried to find compromise;) turn it back.

Copy link
Contributor

@xibz xibz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks again @Zyqsempai. Really appreciate your efforts on this :)

Copy link
Contributor

@sipsma sipsma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please squash the commits into 1, then LGTM!

Signed-off-by: bpopovschi <zyqsempai@mail.ru>
@Zyqsempai Zyqsempai force-pushed the 343-increase-number-of-vms branch from 7e63a27 to c218c45 Compare December 6, 2019 00:00
@Zyqsempai
Copy link
Contributor Author

@sipsma Done.

Copy link
Contributor

@sipsma sipsma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, there was an ephemeral failure of TestMultipleVMs with some errors I haven't seen before on your latest push. It passed on re-run, so I think the updates here may already be working as expected and uncovering new bugs :-) #358

nmeyerhans pushed a commit that referenced this pull request Dec 6, 2019
#356

Signed-off-by: Noah Meyerhans <nmeyerha@amazon.com>
@nmeyerhans nmeyerhans merged commit c218c45 into firecracker-microvm:master Dec 6, 2019
fangn2 pushed a commit to fangn2/firecracker-containerd that referenced this pull request Mar 23, 2023
…ependabot/go_modules/github.com/containernetworking/cni-0.8.1

Bump github.com/containernetworking/cni from 0.8.0 to 0.8.1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants