-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Terribly slow virtio write speeds in Windows guest and recommendation of latest Virtio drivers #320
Comments
My Setup: WD Velociraptor Drives 10K RPM
|
There have been lots of conversation about this in IRC, unfortunately I don't know if the folks who were asking about this in IRC correspond to folks who have commented on this bug or not. For whatever reason, we don't see the same data. As I've mentioned in IRC, folks who are seeing this are going to have to drive the investigation and help us understand where the latency is coming from and help us get to the bottom of this. Specifically, you should start by understanding what the ZFS layer latency is to QEMU. In other words, what is the I/O latency that QEMU sees from ZFS for a given request. That will immediately help us rule out one area of the problem or not. The best way to gather this data is with DTrace. Once that data is firmly in hand for everyone's unique case, then we can go from there. Keep in mind that differences in hardware capabilities, etc. may change things greatly here. What we're trying to do is break this down into one of several areas to focus our attention on. There could be a problem with how the host is issuing the I/O that QEMU is requesting and how it is syncing it out. Note that the way the I/O patterns look will be different from Windows to other KVM guests to a zone, so it's important to focus on the latency specifically that the Windows guest is seeing. In addition to the FS I/O, there is the question of how much time is the QEMU process in the host waiting to service the I/O. eg. what is the length of time from which QEMU knows about the I/O, to the time that it begins to service it. Also, what is the length of time from the I/O being completed to QEMU going through and notifying the guest. These lengths of time will help us understand what is going on in the process, where we can still easily observe what's going on. After that, the next thing to understand is how Windows is issuing I/O requests and what it is seeing. But before we go and dig into that, we should take a stab at what is much easier to observe. |
If one could post a dtrace script to run, with instructions, it would be helpful. If I could I'd provide access to the machine to SmartOS developers, but as it stands it sits behind a corporate firewall. |
I'm pretty sure this is the old lack of log problem that has been discussed on the mail list several times. Just to double check (all my previous testing was with Linux guests) I tried the crystal disk mark test in a win 2008 VM. The pool on my test box is a stripe of 4 mirrors of 2.5" 7200 rpm laptop drives, so it's certainly not flash! Anyway, without the logs the write performance was a blistering 12MB/sec.. With two 100GB Intel 3700s added as logs, this increased to 140MB/s, so more than an order of magnitude improvement. It is worth running "iostat -xtcMn" on the host while running these tests to see how busy the pool disks are. Without a log, I typically see (for one mirror):
When I add the logs: The two log devices: One of the mirrors plus the pool totals: |
here is a screen shot of the crystal disk mark data: The read numbers are pretty meaningless, data would have been read from RAM due to the small file size. The 4K QD32 random write really shows the gains form log devices:
|
@IanCollins According to original Open ZFS official documentation, maximum size of separate ZIL device should be half of the available RAM. So unless you have 200 GB of RAM on this machine, you cannot throw in a 100GB log device for sanity. I added two 20 GB Samsung 840 Pros and couldn't get past 60 MB/s at sequential writes, which I guess has to do with running more than a few VMs on the same machine with write intensive usage as the speed divides (may be not the case in your tests). Which also, unfortunately is subpar to what Linux VMs give you on the same host. |
The ZIL device can be as big as you want. Only a small part will actually be used, (cf. the comment in the documentation "because that is the maximum amount of potential in-play data that can be stored") but the overall size doesn't really matter. You'll probably get longer life out of a bigger device if its wear levelling works well. I use 200G 3700s on production systems simply because they have double the IOPs of the 100G part. Samsung 840 Pros aren't really a write optimised SSD and I don't think they have power fail protection. I've found them for cache devices, but not logs. For log devices you need to look for high, sustained random write IOPs, durability and power fail protection. 3700s push all the right buttons. If they aren't good enough, consider a RAM based ZIL, such as a ZeusRAM. |
Hmm. The problem is, no one these days wants to throw in more money on hardware to solve a software problem, which, doesn't exist in Linux VMs on the same host doing the same synchronous IO. The problem, which, also needless to say, does NOT exist in other KVM implementations! |
I see a similar increase. Without a log device, my pool which consists of 2 x RAIDZ2 vdevs with 8 x 15k SAS disks each can only do 34.72 MB/sec. Adding in two similar SSDs (I have the 100G instead of 200G, so not as good) as log devices that @IanCollins is using, that figure jumps to 270MB/sec. @IanCollins I am however curious as to how your pushing those sorts of read rates. Although they aren't representative of disk, the highest my Crystal has ever shown is 1050 MB/sec (even when I had a cache drive) |
@bassu The "problem" isn't with SmartOS KVM, its the way ZFS handles synchronous writes. You would probably observe the same numbers with KVM on Linux if the host used ZFS on Linux. SmartOS KVM could probably be setup to lie and not wait for writes to complete, or add a RAM cache for the virtual drive. You could get the same effect by disabling the ZIL ( DON'T!: http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Disabling_the_ZIL_.28Don.27t.29). Both would boost performance at the expense of data integrity. Or you could add an appropriate SSD.. |
@davefinster See my note about reading from RAM! |
@IanCollins I didn’t see this problem on ZFS on Linux (although there were many other problems). On 06-Oct-2014, at 4:02 am, IanCollins notifications@github.com wrote:
|
@IanCollins I did! Thats why I knew it wasn't representative of the disks, but rather ARC. What I'm wondering what factors would influence the read performance from ARC? |
@davefinster Use a good SATA3 HBA. That’s most of it. On 06-Oct-2014, at 4:30 am, Dave Finster notifications@github.com wrote:
|
@bassu Using a LSI SAS 9207 HBA - but Ian was referring to ARC in RAM, not L2ARC. I also ran my tests on an idle machine. |
@bassu "The problem is, no one these days wants to throw in more money on hardware to solve a software problem, which, doesn't exist in Linux VMs on the same host doing the same synchronous IO." @IanCollins "@bassu The "problem" isn't with SmartOS KVM, its the way ZFS handles synchronous writes." In truth, it's neither. This is not a software problem, it's a design choice. We have chosen to have KVM's writes to zvols treated synchronously. Any implementation making this same design choice will have similar performance characteristics for a given identical backing store. ZFS is not really relevant here; it's just doing what it's told. The tradeoff here is correctness/durability vs perceived performance. It's no different from having a database call fsync() on transaction commit boundaries: if you do it, your clients see fewer transactions per second; if you don't, there are several failure classes within the system that will allow committed transaction state to be lost. Since the backing store for a guest may be used to store transactional state (we have no way to know what you want to use the guest for), the safe choice here is to do I/O synchronously. There is a third way, but it requires that the guest observe certain semantics with respect to its virtual block devices, specifically the use of SYNCHRONIZE CACHE or an analogous protocol-appropriate command. This is what ZFS does, and it is what makes the use of (hardware) disk write caches safe (provided the disk provides the documented standard semantics). If you are absolutely sure that your guest's filesystem(s), block layer, SCSI/ATA layer (if applicable), and drivers all flush the write cache on transaction boundaries, then it would be safe to change KVM to provide a device that emulates a disk with the write cache enabled. Of course, you would also have to make sure that everything in KVM required for this to work is in place and working correctly, as well as virtio assuming you're using that instead of ATA or SCSI. There is no reason this cannot work; however, the combination of numerous known-bad guest OSs and a lack of testing and verification of the KVM and virtio code in this regard makes enabling this behaviour expensive and risky. You're welcome to do so if you know your guest is bug-free in this area, or to simply disable synchronous writes entirely on the relevant zvol if your guest is entirely stateless (or disposable). But in general, the balance of engineering judgment favours the approach we've taken. Instead of asking why we're slow, you could as well ask why your GNU/Linux distributor's KVM I/O configuration is unsafe. They're two sides of the same coin -- and unfortunately, the common buggy guests we're accommodating with our design choice here are none other than GNU/Linux. All that said, I still haven't seen compelling evidence that the problem being reported here has anything to do with this. |
@wesolows Your answer is obviously correct and complete Keith, but the design choice (which I completely agree with) does highlight the issue with ZFS and synchronous writes. I'm sure other filesystems give better numbers by sacrificing data integrity, but the perception will remain that SmartOS KVM is "slow". Maybe education (through the wiki?) is the best solution to this recurring "issue"? I also think the installer to offer more help. Offering a choice between capacity (raidz2) and performance (mirrors) pool configuration rather than defaulting to raidz2, which is seldom the best choice. Just to lay the SmartOS KVM is "slow" stuff to rest, here are the numbers for the same system with ZIL disabled: Not bad for a puny little pool on pimped up consumer PC! |
Just to say as a note for others to see on what @IanCollins did: |
Manpages Reference: Since it is a _performance vs safety_ choice, here are the direct implications of disabling synchronous IO from ZFS reference:
|
What are the drivers currently accepted as being the fastest ones for KVM windows. I tried the stable and latest from fedora and they don't really impress. (This is with all SSD pool with and without ZIL) |
If possible, try with bhyve instead of KVM. The performance is much better. (As in night and day)
On bhyve I use the latest drivers, on KVM I got the best performance with rather old ones: https://pkg.blackdot.be/extras/virtio-win-0.1-49.iso
~ sjorge
… On 15 Aug 2019, at 10:42, matthiasg ***@***.***> wrote:
What are the drivers currently accepted as being the fastest ones for KVM windows. I tried the stable and latest from fedora and they don't really impress. (This is with all SSD pool with and without ZIL)
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
I was under the impression that bhyve is still not quite production ready.
The few windows machines we have, all run important business workloads
running on them, so bad place to experiment.
is there a way to convert kvm drives to bhyve to run a few tests ?
EDIT: The new wiki https://wiki.smartos.org/bhyve/ states it is `Bhyve is fully supported and production ready.`, so I should try it indeed.
On Thu, Aug 15, 2019 at 11:13 AM Jorge Schrauwen <notifications@github.com>
wrote:
… If possible, try with bhyve instead of KVM. The performance is much
better. (As in night and day)
On bhyve I use the latest drivers, on KVM I got the best performance with
rather old ones: https://pkg.blackdot.be/extras/virtio-win-0.1-49.iso
~ sjorge
> On 15 Aug 2019, at 10:42, matthiasg ***@***.***> wrote:
>
>
> What are the drivers currently accepted as being the fastest ones for
KVM windows. I tried the stable and latest from fedora and they don't
really impress. (This is with all SSD pool with and without ZIL)
>
> —
> You are receiving this because you are subscribed to this thread.
> Reply to this email directly, view it on GitHub, or mute the thread.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#320>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AABYTULV4NY4O247D5RJAZDQEUM2TANCNFSM4AORWV7Q>
.
|
@sjorge well it seem bhyve still has some issues, at least in the documentation department. I simply get a black screen when connecting with vnc (no password set), bhyve does not yet support providing a dhcp server so no assigned ips and booting from cd image is also not documented as far as I can see. Does anybody have any experience booting a windows kvm virtio boot disk in bhyve ? |
I’ve only used it for new VMs, but if you are trying to boot an existing one you probably need to set bootrom to bios, which means no VNC (works fine with UEFI bootrom).
To boot from a CD I usually add it afterwords as a 2nd disk after copying the iso to the zoneroot. model=ahci, media=cdrom works fine for those.
Aside from some the things mentioned (no VNC with bios bootrom, no cdrom=... on the command line to boot once with a cdrom) things just work. I have a mix of windows 10, FreeBSD and OmniOS running (all UEFI though)
~ sjorge
… On 15 Aug 2019, at 12:44, matthiasg ***@***.***> wrote:
@sjorge well it seem bhyve still has some issues, at least in the documentation department. I simply get a black screen when connecting with vnc (no password set), bhyve does not yet support providing a dhcp server so no assigned ips and booting from cd image is also not documented as far as I can see.
Does anybody have any experience booting a windows kvm virtio boot disk in bhyve ?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
SmartOS with default zfs params (either dual mirror or raidz1 with or without SLOG):
Writes outside the guest or in Linux guest average on 150 MB/s.
I have compared this to Joyent Cloud, where I see they are using old
Joyent signed Virtio drivers, but the sequential writes in Windows
instances are over there are 160 MB/s.
This is easily reproducible with latest SmartOS and independent on what hardware is used.
zfs_zone_delay_enable
was disabled during all tests.zpool iostat -v
screenshot
The text was updated successfully, but these errors were encountered: