Skip to content

Conversation

@georgepisaltu
Copy link
Contributor

@georgepisaltu georgepisaltu commented May 27, 2021

Reason for This PR

VMs restored from a snapshot might have the TSC frequency misaligned with the original, which could lead to problems further down the line. We took this as an opportunity to add snapshot compatibility checks, which start with adding the original TSC in the state file and scaling it for resumed VMs.

Description of Changes

Added compatibility checks for snapshot create/restore, starting with TSC scaling.

  • This functionality can be added in rust-vmm.

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license.

PR Checklist

[Author TODO: Meet these criteria.]
[Reviewer TODO: Verify that these criteria are met. Request changes if not]

  • All commits in this PR are signed (git commit -s).
  • The reason for this PR is clearly provided (issue no. or explanation).
  • The description of changes is clear and encompassing.
  • Any required documentation changes (code and docs) are included in this PR.
  • Any newly added unsafe code is properly documented.
  • Any API changes are reflected in firecracker/swagger.yaml.
  • Any user-facing changes are mentioned in CHANGELOG.md.
  • All added/changed functionality is tested.

@georgepisaltu georgepisaltu self-assigned this May 27, 2021
@dianpopa
Copy link
Contributor

Should we close this one?

pub fn set_tsc_khz(&self, state_tsc: Option<u32>, cpuid: &CpuId) -> Result<()> {
// Check if CPU models are the same.
if let Ok(curr_cpuid) = cpuid::common::get_cpuid(0x1, 0) {
if is_same_model(cpuid, curr_cpuid.eax) {
Copy link
Contributor

@sandreim sandreim Jun 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This allows snapshots to be loaded on the same CPU, even if there is no constant TSC or TSC scaling support. Are we sure the checks we do here guarantee that TSC freq is the same ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we as a process never change the TSC frequency outside the scope of a snapshot save/restore, and the CPU model is the same, I expect the TSC frequency to be the same in the clone as in the original VM. If this is not true, then checking the CPU model is actually useless, since we still need to get/set the TSC frequency regardless.

On variant TSC systems, I don't think users even expect high precision from the TSC, but I might have a bad understanding of this whole story.

@georgepisaltu georgepisaltu force-pushed the tsc_scaling branch 2 times, most recently from 7540cbe to 8f316de Compare June 4, 2021 13:10
@georgepisaltu georgepisaltu added Status: Awaiting review Indicates that a pull request is ready to be reviewed and removed Status: Author labels Jun 4, 2021
Copy link
Contributor Author

@georgepisaltu georgepisaltu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sandreim PTAL!

Copy link
Contributor Author

@georgepisaltu georgepisaltu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sandreim @serban300 I updated the PR, also added unit tests.

The next step is to add integration tests using snapshots generated with the final form of the binary after merging this.

PTAL!

@georgepisaltu georgepisaltu marked this pull request as ready for review June 8, 2021 15:27
@georgepisaltu georgepisaltu force-pushed the tsc_scaling branch 5 times, most recently from 2407cad to 85cce2c Compare June 10, 2021 07:45
@georgepisaltu georgepisaltu force-pushed the tsc_scaling branch 3 times, most recently from ad0eeec to 59501b4 Compare June 14, 2021 10:10
# Resume microvm using current build of FC/Jailer.
# The resume should be successful because the CPU model
# in the snapshot state is the same as this host's.
microvm, _ = builder.build_from_snapshot(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also add an additional issue to cover negative testing on different CPU models.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do, after this PR is merged.

Copy link
Contributor Author

@georgepisaltu georgepisaltu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Signed-off-by: George Pisaltu <gpl@amazon.com>
Signed-off-by: George Pisaltu <gpl@amazon.com>
Signed-off-by: George Pisaltu <gpl@amazon.com>
Signed-off-by: George Pisaltu <gpl@amazon.com>
@georgepisaltu georgepisaltu merged commit 7edec88 into firecracker-microvm:main Jun 15, 2021
@georgepisaltu georgepisaltu deleted the tsc_scaling branch September 30, 2021 09:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Status: Awaiting review Indicates that a pull request is ready to be reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants