Add support for disk cache writeback #6968

vladikr · 2021-12-17T02:44:02Z

What this PR does / why we need it:
This is a follow-up PR to #3144
It will allow users to set disk cache mode to writeback

Added Writeback disk cache support

vladikr · 2021-12-17T02:44:58Z

/cc @cedbossneo

kubevirt-bot · 2021-12-17T02:44:59Z

@vladikr: GitHub didn't allow me to request PR reviews from the following users: cedbossneo.

Note that only kubevirt members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @cedbossneo

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

rmohr

In general no issues with this change, I am just wondering if we should somehow make it visible for people that this is a setting which has a higher chance than usual to corrupt your disks.

davidvossel · 2021-12-17T14:56:40Z

/cc @davidvossel

vladikr · 2021-12-17T16:41:36Z

In general no issues with this change, I am just wondering if we should somehow make it visible for people that this is a setting which has a higher chance than usual to corrupt your disks.

I suppose there is always a risk, but not like the unsafe cache mode that we don't allow in KubeVirt.
iirc, Writethrough is the safest. my understanding is that writeback caches on the host but still requires the guest os to flush which still makes it safe, but I'm not 100% sure :)

I think there are different nuances with each of these cache modes and we can definitely document that. I'll try to find whether there is something written already or reach out to QEMU maintainers.

vladikr · 2021-12-20T02:58:57Z

/retest

vladikr · 2021-12-20T03:04:39Z

@rmohr Kevin Wolf wrote this to OpenStack back at the time, I'll contact him directly too.
According to this, it's better for us to choose wrteback instead of writethrough when direct I/O is not available.

     The thing that makes 'writethrough' so safe against host crashes is
        that it never keeps data in a "write cache", but it calls fsync()
        after _every_ write. This is also what makes it horribly slow. But
        'cache=none' doesn't do this and therefore doesn't provide this kind
        of safety. The guest OS must explicitly flush the cache in the
        right places to make sure data is safe on the disk. And OSes do
        that.

        So if 'cache=none' is safe enough for you, then 'cache=writeback'
        should be safe enough for you, too -- because both of them have the
        boolean 'cache.writeback=on'. The difference is only in
        'cache.direct', but 'cache.direct=on' only bypasses the host kernel
        page cache and data could still sit in other caches that could be
        present between QEMU and the disk (such as commonly a volatile write
        cache on the disk itself).

maya-r · 2021-12-20T17:59:29Z

/lgtm

iholder101 · 2021-12-22T13:55:14Z

Out of curiosity - why didn't we have write-back cache mode until now?

rmohr · 2021-12-22T13:57:03Z

@rmohr Kevin Wolf wrote this to OpenStack back at the time, I'll contact him directly too. According to this, it's better for us to choose wrteback instead of writethrough when direct I/O is not available.

     The thing that makes 'writethrough' so safe against host crashes is
        that it never keeps data in a "write cache", but it calls fsync()
        after _every_ write. This is also what makes it horribly slow. But
        'cache=none' doesn't do this and therefore doesn't provide this kind
        of safety. The guest OS must explicitly flush the cache in the
        right places to make sure data is safe on the disk. And OSes do
        that.

        So if 'cache=none' is safe enough for you, then 'cache=writeback'
        should be safe enough for you, too -- because both of them have the
        boolean 'cache.writeback=on'. The difference is only in
        'cache.direct', but 'cache.direct=on' only bypasses the host kernel
        page cache and data could still sit in other caches that could be
        present between QEMU and the disk (such as commonly a volatile write
        cache on the disk itself).

Sounds like this something which we want to choose for the user then if no explicit mode is provided by the user, right?

vladikr · 2021-12-22T15:57:53Z

Sounds like this something which we want to choose for the user then if no explicit mode is provided by the user, right?

@rmohr yea, only on filesystems without O_DIRECT. Today we default to writethrough
https://github.com/kubevirt/kubevirt/blob/main/pkg/virt-launcher/virtwrap/converter/converter.go#L414-L425

vladikr · 2021-12-22T16:03:49Z

Out of curiosity - why didn't we have write-back cache mode until now?

Previously, the use of the writeback disk cache prevented the VMI from being live migrated. We've decided that migratibility of a workload outweighs the benefits of the writeback cache.
Currently this is not the case.

rmohr

Looks good to me. I would love to see that we also cover migrations in the tests.

rmohr · 2021-12-23T12:46:18Z

tests/vmi_configuration_test.go

@@ -1954,17 +1958,22 @@ var _ = Describe("[sig-compute]Configurations", func() {
 			Expect(disks[1].Alias.GetName()).To(Equal("ephemeral-disk2"))
 			Expect(disks[1].Driver.Cache).To(Equal(cacheWritethrough))

+			By("checking if requested cache 'writeback' has been set")
+			Expect(disks[2].Alias.GetName()).To(Equal("ephemeral-disk5"))
+			Expect(disks[2].Driver.Cache).To(Equal(cacheWriteback))


Can you also adjust one migation test to ensure that we can migrate when it is set?

+1 to this. That's the only thing I see missing here.

Yup, good point. Added.

Signed-off-by: Cedric Hauber <hauber.c@gmail.com>

Signed-off-by: Vladik Romanovsky <vromanso@redhat.com>

vladikr · 2022-01-05T21:30:30Z

/retest

rmohr

/lgtm
/approve

👍

rmohr · 2022-01-10T09:50:54Z

tests/migration_test.go

@@ -528,6 +528,39 @@ var _ = Describe("[Serial][rfe_id:393][crit:high][vendor:cnv-qe@redhat.com][leve
 				By("Waiting for VMI to disappear")
 				tests.WaitForVirtualMachineToDisappearWithTimeout(vmi, 240)
 			})
+			It("should be successfully migrate with a WriteBack disk cache", func() {


s/migrate/migrating/

kubevirt-bot · 2022-01-10T09:51:41Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: rmohr

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [rmohr]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

kubevirt-commenter-bot · 2022-01-10T15:18:28Z

/retest
This bot automatically retries jobs that failed/flaked on approved PRs.
Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

kubevirt-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. size/S labels Dec 17, 2021

kubevirt-bot requested review from EdDev and kbidarkar December 17, 2021 02:44

vladikr force-pushed the add_writeback_cache branch from a9e7ca2 to c5edeba Compare December 17, 2021 03:21

rmohr reviewed Dec 17, 2021

View reviewed changes

kubevirt-bot requested a review from davidvossel December 17, 2021 14:56

vladikr force-pushed the add_writeback_cache branch from c5edeba to c1ab2da Compare December 17, 2021 16:27

kubevirt-bot added size/M and removed size/S labels Dec 17, 2021

kubevirt-bot assigned maya-r Dec 20, 2021

kubevirt-bot added the lgtm Indicates that a PR is ready to be merged. label Dec 20, 2021

rmohr reviewed Dec 23, 2021

View reviewed changes

cedbossneo and others added 3 commits January 5, 2022 12:12

Allow cacheback mode for disk devices

542a252

Signed-off-by: Cedric Hauber <hauber.c@gmail.com>

tests to verify writeback disk cache mode is correctly set

331fb93

Signed-off-by: Vladik Romanovsky <vromanso@redhat.com>

verify migration with writeback disk cache

66cc6bd

Signed-off-by: Vladik Romanovsky <vromanso@redhat.com>

vladikr force-pushed the add_writeback_cache branch from c1ab2da to 66cc6bd Compare January 5, 2022 18:42

kubevirt-bot removed the lgtm Indicates that a PR is ready to be merged. label Jan 5, 2022

rmohr approved these changes Jan 10, 2022

View reviewed changes

kubevirt-bot assigned rmohr Jan 10, 2022

kubevirt-bot added the lgtm Indicates that a PR is ready to be merged. label Jan 10, 2022

kubevirt-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 10, 2022

kubevirt-bot merged commit 52ff68e into kubevirt:main Jan 10, 2022

ChandonPierre mentioned this pull request Dec 6, 2022

Disk Device Cache mode update kubevirt/user-guide#627

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for disk cache writeback #6968

Add support for disk cache writeback #6968

vladikr commented Dec 17, 2021

vladikr commented Dec 17, 2021

kubevirt-bot commented Dec 17, 2021

rmohr left a comment

davidvossel commented Dec 17, 2021

vladikr commented Dec 17, 2021 •

edited

vladikr commented Dec 20, 2021

vladikr commented Dec 20, 2021

maya-r commented Dec 20, 2021

iholder101 commented Dec 22, 2021

rmohr commented Dec 22, 2021

vladikr commented Dec 22, 2021

vladikr commented Dec 22, 2021

rmohr left a comment

rmohr Dec 23, 2021

davidvossel Dec 23, 2021

vladikr Jan 5, 2022

vladikr commented Jan 5, 2022

rmohr left a comment

rmohr Jan 10, 2022

kubevirt-bot commented Jan 10, 2022

kubevirt-commenter-bot commented Jan 10, 2022

Add support for disk cache writeback #6968

Add support for disk cache writeback #6968

Conversation

vladikr commented Dec 17, 2021

vladikr commented Dec 17, 2021

kubevirt-bot commented Dec 17, 2021

rmohr left a comment

Choose a reason for hiding this comment

davidvossel commented Dec 17, 2021

vladikr commented Dec 17, 2021 • edited

vladikr commented Dec 20, 2021

vladikr commented Dec 20, 2021

maya-r commented Dec 20, 2021

iholder101 commented Dec 22, 2021

rmohr commented Dec 22, 2021

vladikr commented Dec 22, 2021

vladikr commented Dec 22, 2021

rmohr left a comment

Choose a reason for hiding this comment

rmohr Dec 23, 2021

Choose a reason for hiding this comment

davidvossel Dec 23, 2021

Choose a reason for hiding this comment

vladikr Jan 5, 2022

Choose a reason for hiding this comment

vladikr commented Jan 5, 2022

rmohr left a comment

Choose a reason for hiding this comment

rmohr Jan 10, 2022

Choose a reason for hiding this comment

kubevirt-bot commented Jan 10, 2022

kubevirt-commenter-bot commented Jan 10, 2022

vladikr commented Dec 17, 2021 •

edited