Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented snapshot loading capabilities #414

Merged
merged 1 commit into from
Jul 15, 2022

Conversation

sondavidb
Copy link
Contributor

@sondavidb sondavidb commented Jun 22, 2022

Signed-off-by: David Son davbson@amazon.com

Taking most of the code from #348 and modifying it a bit, I was able to implement the ability to load snapshots as an option within the Start function in machine.go via WithSnapshot. This fixes the issue addressed in #348, where machine.go exposing LoadSnapshot function in said PR was deemed suboptimal due to its guaranteed failure outside of the scope of Start.

The actual implemention in opts.go initially felt a bit messy, as I felt the extra added imports made it clunky, but @austinvazquez and I looked over it today and decided it was fine given that it ultimately did similar things compared to other options, but feel free to disagree below if you feel otherwise.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@sondavidb sondavidb requested a review from a team as a code owner June 22, 2022 18:12
opts.go Outdated Show resolved Hide resolved
@sondavidb sondavidb force-pushed the snapshot-tester branch 2 times, most recently from 6d4de84 to bfeb612 Compare June 24, 2022 20:38
machine_test.go Outdated Show resolved Hide resolved
machine_test.go Show resolved Hide resolved
opts.go Outdated Show resolved Hide resolved
@sondavidb sondavidb force-pushed the snapshot-tester branch 3 times, most recently from 7662d45 to 0ff1a94 Compare June 27, 2022 23:32
@sondavidb
Copy link
Contributor Author

sondavidb commented Jun 28, 2022

Made changes that included both a config file and an extra handler to load snapshots, so that we can choose when to call it during device setup.

I tried to do @kzys's approach of having LoadSnapshotHandler after AttachDrivesHandler but having either one in front of the other makes Firecracker unhappy and gives me an error saying that whatever handler is called second cannot be called after start config is set (e.g. AttachDrivesHandler -> LoadSnapshotHandler makes Firecracker complain about LoadSnapshotHandler, while LoadSnapshotHandler -> AttachDrivesHandler makes Firecracker complain about LoadSnapshot handler, both for the same reason). Based on this information, I feel like either a. Attaching drives is handled by loading the snapshot, or b. I'm not fully understanding how LoadSnapshot works. Leaning more towards b. as taking a look at line 171 in this file, it seems that snapshots need to have the drives attached before being loaded, but this doesn't make sense based on the testing done, as it does not let you attach drives before the snapshot is loaded (LoadSnapshot errors out), and doing so afterwards is both wrong and does not work (AttachDrives errors out)

In any case, the above seems to run fine, so take that as you will. I'm planning on adding some testing tomorrow to make sure that the proper snapshot is being loaded. This will be done as follows:

  • Create MicroVM
  • ssh into it
  • Run sleep 492 (or some other weird uncommon value)
  • Run CreateSnapshot
  • Close the VM
  • Create another MicroVM loaded from that snapshot
  • ssh into that
  • see if ps -aux | grep "sleep 492" returns more than one line

If said test succeeds, we can assume that said snapshot is properly loaded and forget about it, but I also would like to make sure I fully understand this to ensure we're not skipping over anything crucial.

@sondavidb sondavidb force-pushed the snapshot-tester branch 8 times, most recently from 0ff1a94 to 490d76c Compare June 28, 2022 00:56
machine.go Outdated Show resolved Hide resolved
@sondavidb sondavidb force-pushed the snapshot-tester branch 2 times, most recently from cc89863 to fd17703 Compare July 12, 2022 17:08
machine.go Outdated Show resolved Hide resolved
Makefile Outdated Show resolved Hide resolved
machine_test.go Outdated Show resolved Hide resolved
$(FC_TEST_DATA_PATH)/jailer \
$(FC_TEST_DATA_PATH)/firecracker \
$(FC_TEST_DATA_PATH)/ltag
$(FC_TEST_DATA_PATH)/ltag \
$(FC_TEST_BIN_PATH)/ptp \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We switched from ptp to bridge in firecracker-containerd. Should we do the same here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into this and it didn't work out-of-the-box. Mainly I didn't see the bridge plugin creating the IP route on the host like the ptp plugin does. So packets were not being routed to the bridge. More than likely I am miss configuring though. @sondavidb would you like to give it a shot?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I understand, the bridge plugin doesn't actually create any network interfaces, it only bridges two pre-existing connections. So, to refactor this would probably take moderate effort — entirely doable, but I would also like to finish this PR first to get the feature out, and then create a separate PR to refactor this. Given that it's only used in a test, I think it's not crucial to ship this change with the existing PR. What do you think @Kern--?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both the bridge plugin and ptp create veth pairs, so it should be just a slightly different config to switch, but I don't feel strongly enough about which we use to block on that. It's fine to ship this as is.

Makefile Show resolved Hide resolved
@sondavidb sondavidb force-pushed the snapshot-tester branch 3 times, most recently from b08b234 to b18fab7 Compare July 12, 2022 23:41
@sondavidb sondavidb force-pushed the snapshot-tester branch 7 times, most recently from 3350417 to c8c1ef8 Compare July 13, 2022 23:05
.buildkite/pipeline.yml Outdated Show resolved Hide resolved
machine_test.go Outdated Show resolved Hide resolved
machine_test.go Show resolved Hide resolved
opts.go Outdated Show resolved Hide resolved
@sondavidb sondavidb force-pushed the snapshot-tester branch 4 times, most recently from d1e0fa2 to 7e28b05 Compare July 14, 2022 23:12
machine_test.go Outdated Show resolved Hide resolved
machine_test.go Outdated Show resolved Hide resolved
@sondavidb sondavidb force-pushed the snapshot-tester branch 2 times, most recently from 9e1b109 to c051d3c Compare July 15, 2022 19:36
@kzys kzys mentioned this pull request Jul 15, 2022
@sondavidb
Copy link
Contributor Author

Worth mentioning the install step seems to be flaky — sometimes it doesn't set up the loop device during mounting (e.g. the install step in this build, line 83), but just restarting the test makes it run fine. Unsure of cause but good to document in my opinion.

Signed-off-by: David Son <davbson@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants