- 
                Notifications
    
You must be signed in to change notification settings  - Fork 2.1k
 
Add snapshot compatibility checks for snapshot create/restore #2596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add snapshot compatibility checks for snapshot create/restore #2596
Conversation
| 
           Should we close this one?  | 
    
bdac2af    to
    89eb708      
    Compare
  
            
          
                src/vmm/src/vstate/vcpu/x86_64.rs
              
                Outdated
          
        
      | pub fn set_tsc_khz(&self, state_tsc: Option<u32>, cpuid: &CpuId) -> Result<()> { | ||
| // Check if CPU models are the same. | ||
| if let Ok(curr_cpuid) = cpuid::common::get_cpuid(0x1, 0) { | ||
| if is_same_model(cpuid, curr_cpuid.eax) { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This allows snapshots to be loaded on the same CPU, even if there is no constant TSC or TSC scaling support. Are we sure the checks we do here guarantee that TSC freq is the same ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we as a process never change the TSC frequency outside the scope of a snapshot save/restore, and the CPU model is the same, I expect the TSC frequency to be the same in the clone as in the original VM. If this is not true, then checking the CPU model is actually useless, since we still need to get/set the TSC frequency regardless.
On variant TSC systems, I don't think users even expect high precision from the TSC, but I might have a bad understanding of this whole story.
7540cbe    to
    8f316de      
    Compare
  
    There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sandreim PTAL!
8f316de    to
    f07a80a      
    Compare
  
    f07a80a    to
    ced7adf      
    Compare
  
    There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sandreim @serban300 I updated the PR, also added unit tests.
The next step is to add integration tests using snapshots generated with the final form of the binary after merging this.
PTAL!
2407cad    to
    85cce2c      
    Compare
  
    ad0eeec    to
    59501b4      
    Compare
  
    | # Resume microvm using current build of FC/Jailer. | ||
| # The resume should be successful because the CPU model | ||
| # in the snapshot state is the same as this host's. | ||
| microvm, _ = builder.build_from_snapshot( | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also add an additional issue to cover negative testing on different CPU models.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do, after this PR is merged.
59501b4    to
    fef3adf      
    Compare
  
    fef3adf    to
    d9f8344      
    Compare
  
    There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@serban300 @sandreim PTAL!
Signed-off-by: George Pisaltu <gpl@amazon.com>
d9f8344    to
    7a8f435      
    Compare
  
    Signed-off-by: George Pisaltu <gpl@amazon.com>
Signed-off-by: George Pisaltu <gpl@amazon.com>
Signed-off-by: George Pisaltu <gpl@amazon.com>
7a8f435    to
    4f53367      
    Compare
  
    
Reason for This PR
VMs restored from a snapshot might have the TSC frequency misaligned with the original, which could lead to problems further down the line. We took this as an opportunity to add snapshot compatibility checks, which start with adding the original TSC in the state file and scaling it for resumed VMs.
Description of Changes
Added compatibility checks for snapshot create/restore, starting with TSC scaling.
rust-vmm.License Acceptance
By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license.
PR Checklist
[Author TODO: Meet these criteria.][Reviewer TODO: Verify that these criteria are met. Request changes if not]git commit -s).unsafecode is properly documented.firecracker/swagger.yaml.CHANGELOG.md.