lib: migrate overlay manager state explicitly#853
Conversation
3be5036 to
33e0de9
Compare
07e3c11 to
161729b
Compare
33e0de9 to
78057ac
Compare
78057ac to
2e8e741
Compare
|
The first PHD run on the most recent commit hit known test flake #811 (really need to go fix that...); I've asked for a re-run. |
| } | ||
|
|
||
| #[test] | ||
| fn export_import_round_trip() { |
There was a problem hiding this comment.
take it or leave it: this type of "round trip" test always feels like kind of a good candidate for a property test where we generate a bunch of random things and assert they look the same on both sides. but, i'm not sure if it's really going to gain that much from it, and writing the proptest code to generate arbitrary migration payloads might be too much work...just a thought!
| .flat_map(|(pfn, set)| { | ||
| std::iter::once(OverlayPage { | ||
| manager: Arc::downgrade(manager), | ||
| kind: set.active, | ||
| pfn: *pfn, | ||
| }) | ||
| .chain(set.pending.keys().map(|kind| { | ||
| OverlayPage { | ||
| manager: Arc::downgrade(manager), | ||
| kind: *kind, | ||
| pfn: *pfn, | ||
| } | ||
| })) | ||
| }) | ||
| .collect(); |
| pub struct OverlaySetV1 { | ||
| pub(super) original_contents: Vec<u8>, | ||
| pub(super) active: MigratedOverlayKind, | ||
| pub(super) pending: BTreeMap<MigratedOverlayKind, Vec<u8>>, | ||
| } |
There was a problem hiding this comment.
From a high level, I'm not sure I'm wild about migrating the overlay state like this. We're essentially duplicating some state, vs what we have stored for the msr_hypercall data in HyperVEnlightenmentV1 (that a hypercall page is overlaying a given address). There is still the matter of the covered page data, but that could go along with the msr_hypercall bits, in theory (in terms of which part of the migration payload it should be attached to).
That way, it does less in the way of encoding the OverlayManager as part of the migration payload, rather than it being an implementation detail at the time. (Partially thinking about if/when the kernel vmm makes overlay pages less of a hassle.)
Now, this isn't to say that I'm fundamentally opposed to how things are structured here. If you think this is the right direction for all this, I'll defer. I'm happy to chat about trade-offs as well.
There was a problem hiding this comment.
We discussed some of the tradeoffs here offline. A couple of notes from that conversation:
-
The primary thing I wanted to avoid here (if we can help it) is reordering overlays across a migration. In today's code, the first overlay applied to a PFN is always made active, irrespective of its kind; subsequent overlays are then promoted in order by their kind. However, there are other ways to tackle this problem that don't require anyone to save/restore what overlay is currently active (e.g., make the overlay ordering solely a function of the kinds of overlays present for a page).
-
Once ordering concerns are out of the picture, the Hyper-V stack can simply re-create its
OverlayPages during import, knowing that it doesn't have to care about creating them in any particular order. Another advantage of this is that it's not necessary to save/restore pending page contents--the stack just needs to remember the parameters it used to create a particular page so that it can create the correct contents again during import. -
It's almost possible to avoid transferring a GPA's original contents by disabling active overlays when the Hyper-V stack is paused and re-creating them as needed when it's resumed. Then the original page contents will be copied during the RAM transfer phase. The tricky bit with this is that it means
Lifecyclecallers have to pause vCPUs before they pause device components (otherwise a running CPU might read a page whose overlay is held in abeyance by a paused Hyper-V manager).
I put together a small change that explores some of these ideas. I'll post it as a draft PR shortly.
|
856 is rebased, so closing this. |
(N.B. Stacked on #851.)
Explicitly migrate the state of a Hyper-V stack's
OverlayManagerinstead of having the stack reapply its own overlays. The manager'sOverlaySets are exported and imported more or less as-is, though with some transformations between types for serialization purposes (e.g.Box<u8; PAGE_SIZE>is serialized asVec<u8>) or for compatibility reasons (enum MigratedOverlayKindis distinct fromenum OverlayKindso that the latter can change without rendering the manager unable to recognize overlay kinds from old Propolis versions).Importing manager state produces a set of
OverlayPageregistrations that the higher-level Hyper-V import logic can audit for correctness (there should be exactly as many entries as MSRs with active overlays, and they should be associated with the correct PFNs).Tests: cargo test and PHD; examined the migration logs from an ad hoc migration and verified that the expected overlay state was migrated.
Fixes #850.