-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Poor performance with Mariadb, slower than ext4 #7573
Comments
We have been talking about this in IRC. Here are some offtime CPU graphs that he also collected: https://svgshare.com/i/6rA.svg mysql is spending more than 7 times the amount of time in offtime. This correlates well to the different run times. What is happening here is that ext4 is quickly queuing things to be processing asynchronously. On the other hand, our AIO implementation is a compatibility shim that the standard allowed rather than a proper asynchronous implementation. The AIO writes in ZFS are handled asynchronously after the buffer is copied into the DMU rather than before the buffer is copied. We also don't do any zero copy, while ext4 likely can do. On the other hand, the reads block until they have completed and this workload is dominated by reads. Compressed ARC likely isn't doing us any favors here either. At the very least, we need to modify the AIO implementation to operate asynchronously on reads so that the callback is done by the DMU. That should speed this up nicely. |
Rajil,
There was a talk about tunning ZFS for MySQL at the last user conference,
perhaps start with that:
https://vimeo.com/album/5150026/video/266112233
…-Alek
On Mon, May 28, 2018, 14:29 rajil ***@***.***> wrote:
System information
Type Version/Name
Distribution Name Gentoo
Distribution Version
Linux Kernel 4.9.95-gentoo
Architecture x86_64
ZFS Version 0.7.9-r0-gentoo
SPL Version 0.7.9-r0-gentoo Describe the problem you're observing
I have a mariadb database (dev-db/mariadb-10.1.31-r1) which performs
poorly with ZFS. However, with ext4 it is pretty quick.
On ext4 it takes 18seconds to complete the query
On ZFS it takes 2 minutes 20 seconds
Describe how to reproduce the problem Include any
warning/errors/backtraces from the system logs
Here are some stats,
ext4
# /usr/share/bcc/tools/ext4dist 500
In file included from /virtual/main.c:3:
/lib/modules/4.9.95-gentoo/build/include/linux/fs.h:2700:9: warning: comparison of unsigned enum expression < 0 is always false [-Wtautological-compare]
if (id < 0 || id >= READING_MAX_ID)
~~ ^ ~
1 warning generated.
Tracing ext4 operation latency... Hit Ctrl-C to end.
^C
23:34:20:
operation = read
usecs : count distribution
0 -> 1 : 533984 |****************************************|
2 -> 3 : 106 | |
4 -> 7 : 103 | |
8 -> 15 : 293 | |
16 -> 31 : 66 | |
32 -> 63 : 3 | |
operation = write
usecs : count distribution
0 -> 1 : 55377 |****************************************|
2 -> 3 : 29 | |
4 -> 7 : 10 | |
8 -> 15 : 66 | |
ZFS
# /usr/share/bcc/tools/zfsdist 500
In file included from /virtual/main.c:3:
/lib/modules/4.9.95-gentoo/build/include/linux/fs.h:2700:9: warning: comparison of unsigned enum expression < 0 is always false [-Wtautological-compare]
if (id < 0 || id >= READING_MAX_ID)
~~ ^ ~
1 warning generated.
Tracing ZFS operation latency... Hit Ctrl-C to end.
^C
23:59:12:
operation = b'open'
usecs : count distribution
0 -> 1 : 1204 |****************************************|
2 -> 3 : 25 | |
4 -> 7 : 5 | |
8 -> 15 : 3 | |
operation = b'read'
usecs : count distribution
0 -> 1 : 275123 |****************************************|
2 -> 3 : 10138 |* |
4 -> 7 : 18033 |** |
8 -> 15 : 1329 | |
16 -> 31 : 158 | |
32 -> 63 : 16 | |
64 -> 127 : 10 | |
128 -> 255 : 4419 | |
256 -> 511 : 171583 |************************ |
512 -> 1023 : 60302 |******** |
1024 -> 2047 : 3719 | |
2048 -> 4095 : 2444 | |
4096 -> 8191 : 22 | |
operation = b'fsync'
usecs : count distribution
0 -> 1 : 0 | |
2 -> 3 : 0 | |
4 -> 7 : 0 | |
8 -> 15 : 0 | |
16 -> 31 : 0 | |
32 -> 63 : 0 | |
64 -> 127 : 1 |***** |
128 -> 255 : 8 |****************************************|
256 -> 511 : 3 |*************** |
512 -> 1023 : 0 | |
1024 -> 2047 : 0 | |
2048 -> 4095 : 1 |***** |
operation = b'write'
usecs : count distribution
0 -> 1 : 0 | |
2 -> 3 : 5 | |
4 -> 7 : 287 | |
8 -> 15 : 42922 |****************************** |
16 -> 31 : 55692 |****************************************|
32 -> 63 : 1338 | |
64 -> 127 : 42 | |
128 -> 255 : 1 | |
256 -> 511 : 16 | |
512 -> 1023 : 81 | |
1024 -> 2047 : 294 | |
2048 -> 4095 : 11 | |
4096 -> 8191 : 2 | |
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#7573>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/ACElkfaoLLMt2yRmRiR5tZ3E2iRMvoYkks5t3EIegaJpZM4UQfq6>
.
|
@alek-p He followed the tuning advice that I posted on the openzfs wiki: http://open-zfs.org/wiki/Performance_tuning#MySQL In this situation, his workload is hitting a genuine deficiency in our ZPL code. The AIO bits are a compatibility shim that do the absolute minimum needed to conform to the POSIX standard. I wrote it that way to avoid having to make more risky changes to the code base at the time. I knew the day would come when we needed to replace it. I did not expect it to take so long before someone found a workload that suffered because of it. In any case, I think that the compatibility shim did more good than harm, so I do not think it was a mistake on my part back then. Hopefully, I'll be able to make time to implement my vision for what the revision was. I still need to finish #7423, but the immediate fire there is out, so I am willing to take my time with it. We'll see which one I finish first. |
After a reboot, the performance is back to poor speeds ~ 3m 9 seconds. |
I guess I was wrong about the shim doing more good than harm, at least in this case. I have updated the openzfs wiki to include this tip: http://open-zfs.org/wiki/Performance_tuning#InnoDB I regret how this issue had eluded us for so long. :/ |
Based on @rajil comment above, the key to improve performance was to disable atomic writes (with |
I have some suspicions here. If the dbdata dataset had logbias=throughput, it will be full of indirect sync writes, and those will be awful for random reads - likely doubling the IOPs needed for a given amount of space, and nearly eliminating the ability to aggregate reads together of either data or metadata. This is made worse because 16K is so small. If the OP is still around and can supply details like logbias and some zpool iostat -r's, it would be fairly clear. ZFS is not slower when properly set up. The instructions on the Wiki will result in badly fragmented data and metadata and an high IOP count to service them, for a random read-heavy workload. Doing the following would likely do much better: 32K recordsize (preserves locality of input data) Edit: If you want to understand the problem, test the IO & disk in isolation to figure out if your metadata is badly fragmented and if the rest of the pipeline is healthy. Clear caches, zfs send >/dev/null, while you watch zpool iostat -r. This will likely show you many isolated 4K reads that cannot be merged. zfs send | zfs receive it into another dataset, clear caches, and zfs send the new data >/dev/null while you watch it - you will likely see much better read IO merge. If zfs send and receive can't get clean IO in the pool, nothing else will be able to. They make great testing tools for finding performance problems. |
Eliminating RMW from kernel interactions with ZFS will also make a huge difference and allow you to use larger records efficiently: |
This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions. |
System information
Describe the problem you're observing
I have a mariadb database (dev-db/mariadb-10.1.31-r1) which performs poorly with ZFS. However, with ext4 it is pretty quick.
On ext4 it takes 18seconds to complete the query
On ZFS it takes 2 minutes 20 seconds
On ZFS with /sys/module/zfs/parameters/zfs_compressed_arc_enabled=0, it takes 3m 50 seconds
Describe how to reproduce the problem
I am using Mythtv-0.29 to populate an xmltv database
Include any warning/errors/backtraces from the system logs
Here are some stats,
ext4
ZFS
ZFS with /sys/module/zfs/parameters/zfs_compressed_arc_enabled=0
Mariadb config
ZFS dataset properties
The text was updated successfully, but these errors were encountered: