-
Notifications
You must be signed in to change notification settings - Fork 214
Added possibility to change number of vm's for MultipleVmsTest #356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added possibility to change number of vm's for MultipleVmsTest #356
Conversation
@sipsma PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @Zyqsempai, thank you for taking the time to take on this issue. I had a few comments, but overall the change looks good. There's one that I think requires a little more discussion, so feel free to chime in with your thoughts on the question to how many VMs are we actually testing in parallel.
caseTypeNumber := 1 % 2 | ||
c := cases[caseTypeNumber] | ||
vmWg.Add(1) | ||
go func(vmID int, containerCount int32, jailerConfig *proto.JailerConfig) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am also curious to how many are we actually running in parallel due to the StopVM
call. We may need to call StopVM
at a later time and have the VMs run some process to ensure we are getting accurate number of concurrency. And since 100 is a pretty large number, I'm worried that a lot of the VMs are exiting before we can test the true bottleneck of what is parallel. My guess based on the time is we are only running 15-20 or so VMs in parallel. Thoughts on this @sipsma
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is meant to be a true scale test, so we don't really need to max out the host capacity. AFAICT, 100 was chosen pretty much arbitrarily, and it is an upper bound on the number of VMs that will run concurrently, not necessarily a goal. Even if we're only running ~20 VMs at any one time, that's still more than we're running today. If we assume that more VMs means more likelihood of catching concurrency-related bugs, then a 3-4x increase in the number of running VMs sounds good to me.
At some point in the future we will need to be performing actual scale tests, but this PR doesn't need to be that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm worried that a lot of the VMs are exiting before we can test the true bottleneck of what is parallel
This very well may be the case, but I agree with Noah that it's fine if so. We are primarily looking to find problems that occur as we actually perform actions on/in the VM (spin up/down containers, copy io, etc.). So even if we modified the test case to wait to call StopVM at the end, we would likely just end up with a bunch of VMs sitting around doing nothing for a while, which doesn't really improve coverage IMO. I agree though that if we want scale tests in the future we'd need to be more careful about issues like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I was under the impression we wanted to actually test load. In that case, this looks fine
@sipsma @xibz @nmeyerhans I made a number of vm's equal to 25, I think this is more than enough for this test. |
@Zyqsempai there may have been a misunderstanding as to the suggestion around the number of VMs; I think it's good to leave I think leaving it at 100 is good as the test still only took about a minute to run in the CI system, which is reasonable while still increasing our potential for catching race conditions. |
@sipsma ok, got it, I tried to find compromise;) turn it back. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks again @Zyqsempai. Really appreciate your efforts on this :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please squash the commits into 1, then LGTM!
Signed-off-by: bpopovschi <zyqsempai@mail.ru>
7e63a27
to
c218c45
Compare
@sipsma Done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, there was an ephemeral failure of TestMultipleVMs with some errors I haven't seen before on your latest push. It passed on re-run, so I think the updates here may already be working as expected and uncovering new bugs :-) #358
#356 Signed-off-by: Noah Meyerhans <nmeyerha@amazon.com>
…ependabot/go_modules/github.com/containernetworking/cni-0.8.1 Bump github.com/containernetworking/cni from 0.8.0 to 0.8.1
Signed-off-by: bpopovschi zyqsempai@mail.ru
*Issue #343 *
Description of changes:
Added possibility to increase number of VM's through buildkite for TestMultipleVMs_Isolated.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.