Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .buildkite/pipeline.yml
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,7 @@ steps:
queue: "${BUILDKITE_AGENT_META_DATA_QUEUE:-default}"
env:
DOCKER_IMAGE_TAG: "$BUILDKITE_BUILD_NUMBER"
NUMBER_OF_VMS: 100
EXTRAGOARGS: "-v -count=1 -race"
artifact_paths:
- "runtime/logs/*"
Expand Down
2 changes: 2 additions & 0 deletions runtime/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@

# Set this to pass additional commandline flags to the go compiler, e.g. "make test EXTRAGOARGS=-v"
EXTRAGOARGS?=
NUMBER_OF_VMS?=

SOURCES:=$(shell find . -name '*.go')
GOMOD := $(shell go env GOMOD)
Expand Down Expand Up @@ -63,6 +64,7 @@ integ-test-%: logs
--env FICD_DM_POOL=$(FICD_DM_POOL) \
--env GOPROXY=direct \
--env GOSUMDB=off \
--env NUMBER_OF_VMS=$(NUMBER_OF_VMS) \
--workdir="/src/runtime" \
--init \
$(FIRECRACKER_CONTAINERD_TEST_IMAGE):$(DOCKER_IMAGE_TAG) \
Expand Down
29 changes: 15 additions & 14 deletions runtime/service_integ_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,9 @@ const (
defaultVMRootfsPath = "/var/lib/firecracker-containerd/runtime/default-rootfs.img"
defaultVMNetDevName = "eth0"
varRunDir = "/run/firecracker-containerd"

numberOfVmsEnvName = "NUMBER_OF_VMS"
defaultNumberOfVms = 5
)

// Images are presumed by the isolated tests to have already been pulled
Expand Down Expand Up @@ -220,25 +223,21 @@ func TestMultipleVMs_Isolated(t *testing.T) {
netns, err := ns.GetCurrentNS()
require.NoError(t, err, "failed to get a namespace")

// numberOfVmsEnvName = NUMBER_OF_VMS ENV and is configurable from buildkite
numberOfVms, err := strconv.Atoi(os.Getenv(numberOfVmsEnvName))
require.NoError(t, err, "failed to get NUMBER_OF_VMS env")
if numberOfVms == 0 {
numberOfVms = defaultNumberOfVms
}
t.Logf("TestMultipleVMs_Isolated: will run %d vm's", numberOfVms)

cases := []struct {
MaxContainers int32
JailerConfig *proto.JailerConfig
}{
{
MaxContainers: 5,
},
{
MaxContainers: 5,
},
{
MaxContainers: 5,
},
{
MaxContainers: 3,
JailerConfig: &proto.JailerConfig{
NetNS: netns.Path(),
},
},
{
MaxContainers: 3,
JailerConfig: &proto.JailerConfig{
Expand All @@ -265,7 +264,9 @@ func TestMultipleVMs_Isolated(t *testing.T) {
// container ends up in the right VM by assigning each VM a network device with a unique mac address and having each container
// print the mac address it sees inside its VM.
var vmWg sync.WaitGroup
for vmID, c := range cases {
for i := 0; i < numberOfVms; i++ {
caseTypeNumber := i % len(cases)
c := cases[caseTypeNumber]
vmWg.Add(1)
go func(vmID int, containerCount int32, jailerConfig *proto.JailerConfig) {
Copy link
Contributor

@xibz xibz Dec 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am also curious to how many are we actually running in parallel due to the StopVM call. We may need to call StopVM at a later time and have the VMs run some process to ensure we are getting accurate number of concurrency. And since 100 is a pretty large number, I'm worried that a lot of the VMs are exiting before we can test the true bottleneck of what is parallel. My guess based on the time is we are only running 15-20 or so VMs in parallel. Thoughts on this @sipsma

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is meant to be a true scale test, so we don't really need to max out the host capacity. AFAICT, 100 was chosen pretty much arbitrarily, and it is an upper bound on the number of VMs that will run concurrently, not necessarily a goal. Even if we're only running ~20 VMs at any one time, that's still more than we're running today. If we assume that more VMs means more likelihood of catching concurrency-related bugs, then a 3-4x increase in the number of running VMs sounds good to me.

At some point in the future we will need to be performing actual scale tests, but this PR doesn't need to be that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worried that a lot of the VMs are exiting before we can test the true bottleneck of what is parallel

This very well may be the case, but I agree with Noah that it's fine if so. We are primarily looking to find problems that occur as we actually perform actions on/in the VM (spin up/down containers, copy io, etc.). So even if we modified the test case to wait to call StopVM at the end, we would likely just end up with a bunch of VMs sitting around doing nothing for a while, which doesn't really improve coverage IMO. I agree though that if we want scale tests in the future we'd need to be more careful about issues like this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I was under the impression we wanted to actually test load. In that case, this looks fine

defer vmWg.Done()
Expand Down Expand Up @@ -338,7 +339,7 @@ func TestMultipleVMs_Isolated(t *testing.T) {

_, err = fcClient.StopVM(ctx, &proto.StopVMRequest{VMID: strconv.Itoa(vmID), TimeoutSeconds: 5})
require.NoError(t, err, "failed to stop VM %d", vmID)
}(vmID, c.MaxContainers, c.JailerConfig)
}(i, c.MaxContainers, c.JailerConfig)
}

vmWg.Wait()
Expand Down