Poor performance with Mariadb, slower than ext4 #7573

rajil · 2018-05-28T18:29:41Z

System information

Type	Version/Name
Distribution Name	Gentoo
Distribution Version
Linux Kernel	4.9.95-gentoo
Architecture	x86_64
ZFS Version	0.7.9-r0-gentoo
SPL Version	0.7.9-r0-gentoo

Describe the problem you're observing

I have a mariadb database (dev-db/mariadb-10.1.31-r1) which performs poorly with ZFS. However, with ext4 it is pretty quick.

On ext4 it takes 18seconds to complete the query
On ZFS it takes 2 minutes 20 seconds
On ZFS with /sys/module/zfs/parameters/zfs_compressed_arc_enabled=0, it takes 3m 50 seconds

Describe how to reproduce the problem

I am using Mythtv-0.29 to populate an xmltv database

time /usr/bin/mythfilldatabase  --file --sourceid 1 --xmlfile /tmp/guide/guide.xml

Include any warning/errors/backtraces from the system logs

Here are some stats,

ext4

# /usr/share/bcc/tools/ext4dist 500
In file included from /virtual/main.c:3:
/lib/modules/4.9.95-gentoo/build/include/linux/fs.h:2700:9: warning: comparison of unsigned enum expression < 0 is always false [-Wtautological-compare]
        if (id < 0 || id >= READING_MAX_ID)
            ~~ ^ ~
1 warning generated.
Tracing ext4 operation latency... Hit Ctrl-C to end.
^C
23:34:20:

operation = read
     usecs               : count     distribution
         0 -> 1          : 533984   |****************************************|
         2 -> 3          : 106      |                                        |
         4 -> 7          : 103      |                                        |
         8 -> 15         : 293      |                                        |
        16 -> 31         : 66       |                                        |
        32 -> 63         : 3        |                                        |

operation = write
     usecs               : count     distribution
         0 -> 1          : 55377    |****************************************|
         2 -> 3          : 29       |                                        |
         4 -> 7          : 10       |                                        |
         8 -> 15         : 66       |                                        |

ZFS

# /usr/share/bcc/tools/zfsdist 500
In file included from /virtual/main.c:3:
/lib/modules/4.9.95-gentoo/build/include/linux/fs.h:2700:9: warning: comparison of unsigned enum expression < 0 is always false [-Wtautological-compare]
        if (id < 0 || id >= READING_MAX_ID)
            ~~ ^ ~
1 warning generated.
Tracing ZFS operation latency... Hit Ctrl-C to end.
^C
23:59:12:

operation = b'open'
     usecs               : count     distribution
         0 -> 1          : 1204     |****************************************|
         2 -> 3          : 25       |                                        |
         4 -> 7          : 5        |                                        |
         8 -> 15         : 3        |                                        |

operation = b'read'
     usecs               : count     distribution
         0 -> 1          : 275123   |****************************************|
         2 -> 3          : 10138    |*                                       |
         4 -> 7          : 18033    |**                                      |
         8 -> 15         : 1329     |                                        |
        16 -> 31         : 158      |                                        |
        32 -> 63         : 16       |                                        |
        64 -> 127        : 10       |                                        |
       128 -> 255        : 4419     |                                        |
       256 -> 511        : 171583   |************************                |
       512 -> 1023       : 60302    |********                                |
      1024 -> 2047       : 3719     |                                        |
      2048 -> 4095       : 2444     |                                        |
      4096 -> 8191       : 22       |                                        |

operation = b'fsync'
     usecs               : count     distribution
         0 -> 1          : 0        |                                        |
         2 -> 3          : 0        |                                        |
         4 -> 7          : 0        |                                        |
         8 -> 15         : 0        |                                        |
        16 -> 31         : 0        |                                        |
        32 -> 63         : 0        |                                        |
        64 -> 127        : 1        |*****                                   |
       128 -> 255        : 8        |****************************************|
       256 -> 511        : 3        |***************                         |
       512 -> 1023       : 0        |                                        |
      1024 -> 2047       : 0        |                                        |
      2048 -> 4095       : 1        |*****                                   |

operation = b'write'
     usecs               : count     distribution
         0 -> 1          : 0        |                                        |
         2 -> 3          : 5        |                                        |
         4 -> 7          : 287      |                                        |
         8 -> 15         : 42922    |******************************          |
        16 -> 31         : 55692    |****************************************|
        32 -> 63         : 1338     |                                        |
        64 -> 127        : 42       |                                        |
       128 -> 255        : 1        |                                        |
       256 -> 511        : 16       |                                        |
       512 -> 1023       : 81       |                                        |
      1024 -> 2047       : 294      |                                        |
      2048 -> 4095       : 11       |                                        |
      4096 -> 8191       : 2        |                                        |

ZFS with /sys/module/zfs/parameters/zfs_compressed_arc_enabled=0

# /usr/share/bcc/tools/zfsdist 500
In file included from /virtual/main.c:3:
/lib/modules/4.9.95-gentoo/build/include/linux/fs.h:2700:9: warning: comparison of unsigned enum expression < 0 is always false [-Wtautological-compare]
        if (id < 0 || id >= READING_MAX_ID)
            ~~ ^ ~
1 warning generated.
Tracing ZFS operation latency... Hit Ctrl-C to end.
^C
00:17:51:

operation = b'open'
     usecs               : count     distribution
         0 -> 1          : 1456     |****************************************|
         2 -> 3          : 19       |                                        |
         4 -> 7          : 5        |                                        |
         8 -> 15         : 1        |                                        |

operation = b'write'
     usecs               : count     distribution
         0 -> 1          : 0        |                                        |
         2 -> 3          : 6        |                                        |
         4 -> 7          : 281      |                                        |
         8 -> 15         : 47095    |*****                                   |
        16 -> 31         : 341626   |****************************************|
        32 -> 63         : 5725     |                                        |
        64 -> 127        : 376      |                                        |
       128 -> 255        : 11       |                                        |
       256 -> 511        : 9        |                                        |
       512 -> 1023       : 54       |                                        |
      1024 -> 2047       : 430      |                                        |
      2048 -> 4095       : 70       |                                        |
      4096 -> 8191       : 29       |                                        |
      8192 -> 16383      : 2        |                                        |

operation = b'read'
     usecs               : count     distribution
         0 -> 1          : 323412   |****************************************|
         2 -> 3          : 12762    |*                                       |
         4 -> 7          : 17341    |**                                      |
         8 -> 15         : 1825     |                                        |
        16 -> 31         : 159      |                                        |
        32 -> 63         : 30       |                                        |
        64 -> 127        : 43       |                                        |
       128 -> 255        : 124      |                                        |
       256 -> 511        : 114149   |**************                          |
       512 -> 1023       : 236825   |*****************************           |
      1024 -> 2047       : 11873    |*                                       |
      2048 -> 4095       : 4039     |                                        |
      4096 -> 8191       : 65       |                                        |

Mariadb config

# cat /etc/mysql/my.cnf | egrep -v "(^#.*|^$)"                  
[client]
port                                            = 3306
socket                                          = /var/run/mysqld/mysqld.sock
[mysql]
character-sets-dir=/usr/share/mysql/charsets
default-character-set=utf8
[mysqladmin]
character-sets-dir=/usr/share/mysql/charsets
default-character-set=utf8
[mysqlcheck]
character-sets-dir=/usr/share/mysql/charsets
default-character-set=utf8
[mysqldump]
character-sets-dir=/usr/share/mysql/charsets
default-character-set=utf8
[mysqlimport]
character-sets-dir=/usr/share/mysql/charsets
default-character-set=utf8
[mysqlshow]
character-sets-dir=/usr/share/mysql/charsets
default-character-set=utf8
[myisamchk]
character-sets-dir=/usr/share/mysql/charsets
[myisampack]
character-sets-dir=/usr/share/mysql/charsets
[mysqld_safe]
err-log                                         = /var/log/mysql/mysql.err
[mysqld]
character-set-server            = utf8
user                                            = mysql
port                                            = 3306
socket                                          = /var/run/mysqld/mysqld.sock
pid-file                                        = /var/run/mysqld/mysqld.pid
log-error                                       = /var/log/mysql/mysqld.err
basedir                                         = /usr
datadir                                         = /var/lib/mysql
skip-external-locking
key_buffer_size                         = 16M
max_allowed_packet                      = 4M
table_open_cache                        = 400
sort_buffer_size                        = 512K
net_buffer_length                       = 16K
read_buffer_size                        = 256K
read_rnd_buffer_size            = 512K
myisam_sort_buffer_size         = 8M
lc_messages_dir                 = /usr/share/mysql
lc_messages                     = en_US
bind-address                            = 127.0.0.1
log-bin
server-id                                       = 1
tmpdir                                          = /tmp/
innodb_buffer_pool_size = 128M
innodb_data_file_path = ibdata1:10M:autoextend:max:128M
innodb_log_file_size = 48M
innodb_log_buffer_size = 8M
innodb_log_files_in_group=2
innodb_flush_log_at_trx_commit = 1
innodb_lock_wait_timeout = 50
innodb_file_per_table
innodb_data_home_dir            = /var/lib/mysql/innodb
innodb_log_group_home_dir       = /var/lib/mysql/logs
innodb_doublewrite = 0
skip-name-resolve
loose-federated
[mysqldump]
quick
max_allowed_packet                      = 16M
[mysql]
[isamchk]
key_buffer_size                         = 20M
sort_buffer_size                        = 20M
read_buffer                             = 2M
write_buffer                            = 2M
[myisamchk]
key_buffer_size                         = 20M
sort_buffer_size                        = 20M
read_buffer_size                        = 2M
write_buffer_size                       = 2M
[mysqlhotcopy]
interactive-timeout
[mariadb]

ZFS dataset properties

#zfs get recordsize rpool/db
NAME      PROPERTY    VALUE    SOURCE
rpool/db  recordsize  128K     default
# zfs get recordsize rpool/db/logs
NAME           PROPERTY    VALUE    SOURCE
rpool/db/logs  recordsize  128K     local
# zfs get recordsize rpool/db/innodb
NAME             PROPERTY    VALUE    SOURCE
rpool/db/innodb  recordsize  16K      local

The text was updated successfully, but these errors were encountered:

ryao · 2018-05-28T18:49:22Z

We have been talking about this in IRC. Here are some offtime CPU graphs that he also collected:

https://svgshare.com/i/6rA.svg
https://svgshare.com/i/6qm.svg

mysql is spending more than 7 times the amount of time in offtime. This correlates well to the different run times.

What is happening here is that ext4 is quickly queuing things to be processing asynchronously. On the other hand, our AIO implementation is a compatibility shim that the standard allowed rather than a proper asynchronous implementation. The AIO writes in ZFS are handled asynchronously after the buffer is copied into the DMU rather than before the buffer is copied. We also don't do any zero copy, while ext4 likely can do. On the other hand, the reads block until they have completed and this workload is dominated by reads. Compressed ARC likely isn't doing us any favors here either.

At the very least, we need to modify the AIO implementation to operate asynchronously on reads so that the callback is done by the DMU. That should speed this up nicely.

alek-p · 2018-05-28T18:49:46Z

Rajil, There was a talk about tunning ZFS for MySQL at the last user conference, perhaps start with that: https://vimeo.com/album/5150026/video/266112233

…

-Alek

On Mon, May 28, 2018, 14:29 rajil ***@***.***> wrote: System information Type Version/Name Distribution Name Gentoo Distribution Version Linux Kernel 4.9.95-gentoo Architecture x86_64 ZFS Version 0.7.9-r0-gentoo SPL Version 0.7.9-r0-gentoo Describe the problem you're observing I have a mariadb database (dev-db/mariadb-10.1.31-r1) which performs poorly with ZFS. However, with ext4 it is pretty quick. On ext4 it takes 18seconds to complete the query On ZFS it takes 2 minutes 20 seconds Describe how to reproduce the problem Include any warning/errors/backtraces from the system logs Here are some stats, ext4 # /usr/share/bcc/tools/ext4dist 500 In file included from /virtual/main.c:3: /lib/modules/4.9.95-gentoo/build/include/linux/fs.h:2700:9: warning: comparison of unsigned enum expression < 0 is always false [-Wtautological-compare] if (id < 0 || id >= READING_MAX_ID) ~~ ^ ~ 1 warning generated. Tracing ext4 operation latency... Hit Ctrl-C to end. ^C 23:34:20: operation = read usecs : count distribution 0 -> 1 : 533984 |****************************************| 2 -> 3 : 106 | | 4 -> 7 : 103 | | 8 -> 15 : 293 | | 16 -> 31 : 66 | | 32 -> 63 : 3 | | operation = write usecs : count distribution 0 -> 1 : 55377 |****************************************| 2 -> 3 : 29 | | 4 -> 7 : 10 | | 8 -> 15 : 66 | | ZFS # /usr/share/bcc/tools/zfsdist 500 In file included from /virtual/main.c:3: /lib/modules/4.9.95-gentoo/build/include/linux/fs.h:2700:9: warning: comparison of unsigned enum expression < 0 is always false [-Wtautological-compare] if (id < 0 || id >= READING_MAX_ID) ~~ ^ ~ 1 warning generated. Tracing ZFS operation latency... Hit Ctrl-C to end. ^C 23:59:12: operation = b'open' usecs : count distribution 0 -> 1 : 1204 |****************************************| 2 -> 3 : 25 | | 4 -> 7 : 5 | | 8 -> 15 : 3 | | operation = b'read' usecs : count distribution 0 -> 1 : 275123 |****************************************| 2 -> 3 : 10138 |* | 4 -> 7 : 18033 |** | 8 -> 15 : 1329 | | 16 -> 31 : 158 | | 32 -> 63 : 16 | | 64 -> 127 : 10 | | 128 -> 255 : 4419 | | 256 -> 511 : 171583 |************************ | 512 -> 1023 : 60302 |******** | 1024 -> 2047 : 3719 | | 2048 -> 4095 : 2444 | | 4096 -> 8191 : 22 | | operation = b'fsync' usecs : count distribution 0 -> 1 : 0 | | 2 -> 3 : 0 | | 4 -> 7 : 0 | | 8 -> 15 : 0 | | 16 -> 31 : 0 | | 32 -> 63 : 0 | | 64 -> 127 : 1 |***** | 128 -> 255 : 8 |****************************************| 256 -> 511 : 3 |*************** | 512 -> 1023 : 0 | | 1024 -> 2047 : 0 | | 2048 -> 4095 : 1 |***** | operation = b'write' usecs : count distribution 0 -> 1 : 0 | | 2 -> 3 : 5 | | 4 -> 7 : 287 | | 8 -> 15 : 42922 |****************************** | 16 -> 31 : 55692 |****************************************| 32 -> 63 : 1338 | | 64 -> 127 : 42 | | 128 -> 255 : 1 | | 256 -> 511 : 16 | | 512 -> 1023 : 81 | | 1024 -> 2047 : 294 | | 2048 -> 4095 : 11 | | 4096 -> 8191 : 2 | | — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#7573>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ACElkfaoLLMt2yRmRiR5tZ3E2iRMvoYkks5t3EIegaJpZM4UQfq6> .

ryao · 2018-05-28T18:58:37Z

@alek-p He followed the tuning advice that I posted on the openzfs wiki:

http://open-zfs.org/wiki/Performance_tuning#MySQL

In this situation, his workload is hitting a genuine deficiency in our ZPL code. The AIO bits are a compatibility shim that do the absolute minimum needed to conform to the POSIX standard. I wrote it that way to avoid having to make more risky changes to the code base at the time.

I knew the day would come when we needed to replace it. I did not expect it to take so long before someone found a workload that suffered because of it. In any case, I think that the compatibility shim did more good than harm, so I do not think it was a mistake on my part back then. Hopefully, I'll be able to make time to implement my vision for what the revision was.

I still need to finish #7423, but the immediate fire there is out, so I am willing to take my time with it. We'll see which one I finish first.

rajil · 2018-05-28T19:01:37Z

~~After adding innodb_use_atomic_writes = 0 to my.cnf the query run time is 43 seconds on ZFS. Also, zfs_compressed_arc_enabled=1~~
~~Much better than before.~~

After a reboot, the performance is back to poor speeds ~ 3m 9 seconds.

ryao · 2018-05-28T19:07:35Z

I guess I was wrong about the shim doing more good than harm, at least in this case. I have updated the openzfs wiki to include this tip:

http://open-zfs.org/wiki/Performance_tuning#InnoDB

I regret how this issue had eluded us for so long. :/

shodanshok · 2018-05-29T21:51:59Z

Based on @rajil comment above, the key to improve performance was to disable atomic writes (with innodb_use_atomic_writes=0) rather than disabling AIO (with innodb_use_native_aio=0). Based on the documentation found here, innodb_use_atomic_writes should only have effect on very specific SSD and filesystem combo.
So, why disabling atomic writes (and only them) had such large effect on performance?

janetcampbell · 2019-04-06T12:03:34Z

I have some suspicions here. If the dbdata dataset had logbias=throughput, it will be full of indirect sync writes, and those will be awful for random reads - likely doubling the IOPs needed for a given amount of space, and nearly eliminating the ability to aggregate reads together of either data or metadata. This is made worse because 16K is so small.

If the OP is still around and can supply details like logbias and some zpool iostat -r's, it would be fairly clear. ZFS is not slower when properly set up. The instructions on the Wiki will result in badly fragmented data and metadata and an high IOP count to service them, for a random read-heavy workload. Doing the following would likely do much better:

32K recordsize (preserves locality of input data)
logbias=latency (preserves locality of data on disk)
primarycache=all (anything other than this usually indicates trouble)
raising zfs_immediate_write_sz to 64K (all data writes will be direct sync) or 256K (all sync writes to both data and log will be direct sync)
OR add an SLOG (even better)
raising txg timeout and dirty_data_sync (reduce RMW from partial writes)
decreasing zfs_sync_taskq_batch_pct (decrease impact of txg commit and rate limit RMW reads)

Edit: If you want to understand the problem, test the IO & disk in isolation to figure out if your metadata is badly fragmented and if the rest of the pipeline is healthy. Clear caches, zfs send >/dev/null, while you watch zpool iostat -r. This will likely show you many isolated 4K reads that cannot be merged. zfs send | zfs receive it into another dataset, clear caches, and zfs send the new data >/dev/null while you watch it - you will likely see much better read IO merge.

If zfs send and receive can't get clean IO in the pool, nothing else will be able to. They make great testing tools for finding performance problems.

janetcampbell · 2019-04-07T22:59:59Z

Eliminating RMW from kernel interactions with ZFS will also make a huge difference and allow you to use larger records efficiently:

#8590

stale · 2020-08-24T21:53:32Z

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

rajil changed the title ~~Poor performance with Mysql, slower than ext4~~ Poor performance with Mariadb, slower than ext4 May 28, 2018

bunder2015 added the Type: Performance Performance improvement or performance problem label Jan 31, 2019

stale bot added the Status: Stale No recent activity for issue label Aug 24, 2020

stale bot closed this as completed Nov 25, 2020

gedia mentioned this issue Apr 29, 2023

Direct IO Support #10018

Open

17 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Poor performance with Mariadb, slower than ext4 #7573

Poor performance with Mariadb, slower than ext4 #7573

rajil commented May 28, 2018 •

edited

Loading

ryao commented May 28, 2018

alek-p commented May 28, 2018 via email

ryao commented May 28, 2018 •

edited

Loading

rajil commented May 28, 2018 •

edited

Loading

ryao commented May 28, 2018

shodanshok commented May 29, 2018 •

edited

Loading

janetcampbell commented Apr 6, 2019 •

edited

Loading

janetcampbell commented Apr 7, 2019

stale bot commented Aug 24, 2020

Poor performance with Mariadb, slower than ext4 #7573

Poor performance with Mariadb, slower than ext4 #7573

Comments

rajil commented May 28, 2018 • edited Loading

System information

Describe the problem you're observing

Describe how to reproduce the problem

Include any warning/errors/backtraces from the system logs

ryao commented May 28, 2018

alek-p commented May 28, 2018 via email

ryao commented May 28, 2018 • edited Loading

rajil commented May 28, 2018 • edited Loading

ryao commented May 28, 2018

shodanshok commented May 29, 2018 • edited Loading

janetcampbell commented Apr 6, 2019 • edited Loading

janetcampbell commented Apr 7, 2019

stale bot commented Aug 24, 2020

rajil commented May 28, 2018 •

edited

Loading

ryao commented May 28, 2018 •

edited

Loading

rajil commented May 28, 2018 •

edited

Loading

shodanshok commented May 29, 2018 •

edited

Loading

janetcampbell commented Apr 6, 2019 •

edited

Loading