-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory dump #7517
Memory dump #7517
Conversation
l.domainModifyLock.Lock() | ||
defer l.domainModifyLock.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems like we needed this lock whenever we're performing a get+set on the domain spec. I'm uncertain if we want to hold this lock during the duration of the core dump. This lock would actually block the virt-handler's reconcile loop potentially if the same VM is synced again while the dump is taking place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you think its safe to remove it I will
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the way you restructured this with the lock not being held during the dump looks good now
memoryDumpInfo := &api.DomainMemoryDumpInfo{ | ||
DumpTimestamp: &now, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happens if the memory dump fails? will that get reported?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I planned to add error handling
case domainUpdate := <-domainUpdateChan: | ||
guestEvent.domainMemoryDumpInfo = domainUpdate.MemoryDumpInfo | ||
eventCallback(domainConn, domainCache, libvirtEvent{}, n, deleteNotificationSent, guestEvent, vmi) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm unsure about how this works. It doesn't appear like the memory dump info is persisted on the Domain, so if this event doesn't get processed as expected by virt-handler, I think we'd never have a chance to get the result of the memory dump again.
The method we have for storing persistent info on the domainSpec is the domainSpec.Metadata
field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a really good point! I am fixing it to get if the memory dump completed from the memoryDump call by updating the info of the last memory dump on the LibvirtDomainManager struct.
// dump to the given pvc | ||
// +nullable | ||
// +optional | ||
MemoryDumpRequest *VirtualMachineMemoryDumpRequest `json:"memoryDumpRequest,omitempty" optional:"true"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if a dump is requested while one is in progress? Should this be a list like VolumeRequests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as long as the previous request hasn't completed the request is denied, there is not really a point to do another memory dump one after the other
// DumpTimestamp is the time when the memory dump occured | ||
DumpTimestamp *metav1.Time `json:"dumpTimestamp,omitempty"` | ||
// VolumeName is the name of the volume the memory was dumped to | ||
VolumeName string `json:"volumeName,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be claimName?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
@davidvossel @mhenriks I updated the PR. This code now works after some manual testing. I'll appreciate a another review |
/restest |
|
||
message MemoryDumpResponse { | ||
Response response = 1; | ||
bool completed = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not have the more flexible string memoryDumpResponse = 2;
(like all other responses) instead of a bool?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont understand the question. Im using the Response like all the other Responses but add a bool that I wanted to indicate if the memory dump completed or not, in the Response there is bool for success and a string message. success is not the same as completed as the request for the dump can succeed since it was triggered successfully and after that the command will return completed if the async request completed..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All I said is that bool is limited (you will rarely see it here, or in responses in general) and you may later need a response which is more informative than false.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, for completion its a pretty much yes or no question, and I do updated the message in the Response part if I had some error and that maybe why the completion is false.. so I think that coveres it..
pkg/hotplug-disk/hotplug-disk.go
Outdated
return "", err | ||
} | ||
} | ||
return directoryPath, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why err and not nil?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is pretty much a copy of GetFileSystemDiskTargetPathFromHostView with a small change :) I can beautify it
pkg/hotplug-disk/hotplug-disk.go
Outdated
return "", err | ||
} | ||
if !exists && create { | ||
err = os.Mkdir(directoryPath, 0750) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if err := os.Mkdir(directoryPath, 0750); err != nil...
with local err is the preferable style for such checks.
tests/storage/memorydump.go
Outdated
// Verify the content is still on the pvc | ||
verifyMemoryDumpOutput(memoryDumpPVC, previousOutput, true) | ||
}, | ||
Entry("calling endpoint directly", memoryDumpVMSubresource, removeMemoryDumpVMSubresource), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Entry("[test_id:8499]calling endpoint directly"
tests/storage/memorydump.go
Outdated
verifyMemoryDumpOutput(memoryDumpPVC, previousOutput, true) | ||
}, | ||
Entry("calling endpoint directly", memoryDumpVMSubresource, removeMemoryDumpVMSubresource), | ||
Entry("using virtctl", memoryDumpVirtctl, removeMemoryDumpVirtctl), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Entry("[test_id:8500]using virtctl"
tests/storage/memorydump.go
Outdated
verifyMemoryDumpOutput(memoryDumpPVC, previousOutput, true) | ||
}) | ||
|
||
It("Run memory dump to a pvc, remove and run memory dump to different pvc", func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[test_id:8503]
tests/storage/memorydump.go
Outdated
Entry("using virtctl", memoryDumpVirtctl, removeMemoryDumpVirtctl), | ||
) | ||
|
||
It("Run multiple memory dumps", func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[test_id:8502]
tests/storage/memorydump.go
Outdated
verifyMemoryDumpOutput(memoryDumpPVC2, previousOutput, true) | ||
}) | ||
|
||
It("Run memory dump, stop vm and remove memory dump", func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[test_id:8506]
tests/storage/memorydump.go
Outdated
verifyMemoryDumpOutput(memoryDumpPVC, previousOutput, true) | ||
}) | ||
|
||
It("Run memory dump, stop vm start vm", func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[test_id:8515]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Played around with this for a bit and it seems to be working very well! Great job!
Have a couple simple questions. Mostly wondering if we want to annotate/label the target PVC at all. It is possible for it to outlive the VM.
// PersistentVolumeClaimVolumeSource represents a reference to a PersistentVolumeClaim in the same namespace. | ||
// Directly attached to the virt launcher | ||
// +optional | ||
PersistentVolumeClaim *PersistentVolumeClaimVolumeSource `json:"persistentVolumeClaim,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this struct would be better since I doubt there will be any targets to memory dump other than pvc?
type MemoryDumpVolumeSource struct {
PersistentVolumeClaimVolumeSource `json:",inline"`
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah might be a good suggestion
@@ -228,6 +228,23 @@ func (v *vm) Migrate(name string, migrateOptions *v1.MigrateOptions) error { | |||
return v.restClient.Put().RequestURI(uri).Body(optsJson).Do(context.Background()).Error() | |||
} | |||
|
|||
func (v *vm) MemoryDump(name string, memoryDumpRequest *v1.VirtualMachineMemoryDumpRequest) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this on VM and not VMI?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm it is true that the memory dump is basically an action that happens on the vmi and there is a check that the vmi exists and running, but the action starts by updating the vm since eventually the memory dump info will be only on the vm and not on the vmi since the output does outlives the vmi. also the counteraction of memory dump - remove-memory-dump is an action solely on the vm and can happen on the vm even if vmi doesnt exist.
You think the subresource should be on the vmi and the function that is being called will update the vm status and not the vmi?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe both VM and VMI should have memorydump endpoints?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which both do the same thing of updating the vm status with the memory dump request?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you refer for doing memory dump without the whole part of updating the VM I guess its possible but then the volume will still be mounted to the VMI. and anyways maybe I can add it later as an improvement not in this PR.
// ClaimName is the name of the pvc the memory was dumped to | ||
ClaimName string `json:"claimName,omitempty"` | ||
// TargetFileName is the name of the memory dump output | ||
TargetFileName string `json:"targetFileName,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it makes sense to somehow also encode some/all of this information as annotations on the target pvc? It doesn't appear that the pvc is annotated at all now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is an option. If it's alright with you, i'll add it in a different PR to make it possible for this PR to be merged soon :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah #7517 (comment) sounds nice, I recently added labels to restore PVCs so we could have some queryable metrics about them
in #7533
/retest |
The command will be called from the virt handler and will trigger core dump in libvirt with memory only flag. There will be always only one memory dump command at the same time. Once the dump is complete it will save the inforamtion of the last damp so next call if done for the same file will indicate the completion or error of the last dump. Signed-off-by: Shelly Kagan <skagan@redhat.com>
The command calls a subresource endpoint. Then the vm status is being patched with the memory dump command Signed-off-by: Shelly Kagan <skagan@redhat.com>
The pvc of the memory dump should be mounted as a directory to be able to dump to it the memory, it doesnt have a disk to mount. Signed-off-by: Shelly Kagan <skagan@redhat.com>
This volume source doesnt have a matching disk, need to take that into account in the checks verifications Signed-off-by: Shelly Kagan <skagan@redhat.com>
The vm status is updated with a memory dump request with a phase of binding. That triggers update of the vm and vmi volumes update with memory dump volume source and the phase changes to inprogress. In that time the memorydump is being mounted and done, once done or failed the volume status in the vmi is updated with timestamp or error in such case the state will be updated to unmounting which updates the vmi volume list to be without the memory dump volume source which unmounts the pvc. Once it is removed from the vmi volume status list the phase will be updated to completed. Signed-off-by: Shelly Kagan <skagan@redhat.com>
The vmi gets an update of add volume, added adjustments to identify memory dump volume source as a hot plug. The volume is add to the volume status list the same as a hotplug do. Once it is attached to the virt launcher the phase is updated to mounted. Once this Phase is updated the memory dump is triggered and the phase is updated to in progress. Then the virt handler reconcile loop will keep check for memory dump completion, once completed it will update the phase and the timestamp of the memory dump which will be recieved in the vm watch which will trigger unmount of the memory dump volume source. Signed-off-by: Shelly Kagan <skagan@redhat.com>
The comman will dissocaite the memory dump, it will remove the memory dump from the volumes list and the memory dump request from the vm status. Signed-off-by: Shelly Kagan <skagan@redhat.com>
Signed-off-by: Shelly Kagan <skagan@redhat.com>
Signed-off-by: Shelly Kagan <skagan@redhat.com>
Add a validation for the pvc size that should be enough to contain the memory dump. The size should be the vm memory plus a hard coded overhead of 100Mi - that should be enough after some checking and testing examples. Also moved the pvc validation from virtctl to subresource. Signed-off-by: Shelly Kagan <skagan@redhat.com>
…umeSource Signed-off-by: Shelly Kagan <skagan@redhat.com>
Signed-off-by: Shelly Kagan <skagan@redhat.com>
* unite memory-dump and remove-memory dump to the same command with an action argumeant "get" or "remove" * make flag claim-name optional in case we want to dump to the same pvc which is already associated with the vm * move the phase assignment from virtctl to the subresource * make sure cant trigger remove when another remove is in progress Signed-off-by: Shelly Kagan <skagan@redhat.com>
Instead of polling for the memory dump completion by the same memory dump command, now updates of completion or failure is updates on the domain. Adjusted code and unit tests accordingly. Signed-off-by: Shelly Kagan <skagan@redhat.com>
/lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
excellent work!
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: davidvossel The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest-required |
1 similar comment
/retest-required |
What this PR does / why we need it:
This PR implements virtctl command of getting memory dump to a pvc.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #
Special notes for your reviewer:
Release note: