Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZED constant config_sync #6667

Closed
ryangoode86 opened this issue Sep 22, 2017 · 7 comments
Closed

ZED constant config_sync #6667

ryangoode86 opened this issue Sep 22, 2017 · 7 comments
Labels
Component: ZED ZFS Event Daemon

Comments

@ryangoode86
Copy link

ryangoode86 commented Sep 22, 2017

System information

Type Version/Name
Distribution Name Centos
Distribution Version 7.4
Linux Kernel 3.10.0-693.2.2.el7.x86_64
Architecture x86_64
ZFS Version 0.7.1-1
SPL Version 0.7.1-1

Describe the problem you're observing

after upgrading to centos 7.4 from 7.3 and removing/reinstalling the required ZFS repos and software packages, ZED continually creats event entries and writes to the pool continually.

Disabling ZED puts the pool activity to 0 and a scrub has yielded no errors so far. This is 1 pool of 3 but the only one that is showing this "issue"

Describe how to reproduce the problem

im unsure how to reporduce it from another system

Include any warning/errors/backtraces from the system logs

Sep 22 05:31:41 nas01 zed: eid=4482 class=config_sync pool_guid=0x2903BDEA141420B6
Sep 22 05:31:42 nas01 zed: eid=4483 class=config_sync pool_guid=0x2903BDEA141420B6
Sep 22 05:31:42 nas01 zed: eid=4484 class=config_sync pool_guid=0x2903BDEA141420B6
Sep 22 05:31:43 nas01 zed: eid=4485 class=config_sync pool_guid=0x2903BDEA141420B6
Sep 22 05:31:43 nas01 zed: eid=4486 class=config_sync pool_guid=0x2903BDEA141420B6
Sep 22 05:31:44 nas01 zed: eid=4487 class=config_sync pool_guid=0x2903BDEA141420B6
Sep 22 05:31:44 nas01 zed: eid=4488 class=config_sync pool_guid=0x2903BDEA141420B6
Sep 22 05:31:45 nas01 zed: eid=4489 class=config_sync pool_guid=0x2903BDEA141420B6
Sep 22 05:31:45 nas01 zed: eid=4490 class=config_sync pool_guid=0x2903BDEA141420B6
Sep 22 05:31:46 nas01 zed: eid=4491 class=config_sync pool_guid=0x2903BDEA141420B6
Sep 22 05:31:46 nas01 zed: eid=4492 class=config_sync pool_guid=0x2903BDEA141420B6
Sep 22 05:31:46 nas01 zed: eid=4493 class=config_sync pool_guid=0x2903BDEA141420B6
Sep 22 05:31:46 nas01 zed: eid=4494 class=config_sync pool_guid=0x2903BDEA141420B6
Sep 22 05:31:47 nas01 zed: eid=4495 class=config_sync pool_guid=0x2903BDEA141420B6
Sep 22 05:31:47 nas01 zed: eid=4496 class=config_sync pool_guid=0x2903BDEA141420B6
zpool history for the problematic pool:
2017-09-22.01:16:21 zpool export pool1
2017-09-22.01:17:00 zpool import pool1
2017-09-22.01:21:22 zpool upgrade pool1
2017-09-22.01:23:53 zpool clear pool1
2017-09-22.01:35:08 zpool scrub pool1
2017-09-22.01:42:28 zpool scrub -s pool1
2017-09-22.01:49:03 zpool scrub pool1
2017-09-22.01:49:20 zpool scrub -s pool1
2017-09-22.02:46:10 zpool export pool1
2017-09-22.02:47:00 zpool export pool1
2017-09-22.02:48:46 zpool export pool1
2017-09-22.03:11:42 zpool export -f pool1
2017-09-22.03:12:06 zpool import pool1
2017-09-22.03:12:53 zpool export -f pool1
2017-09-22.03:12:58 zpool export -f pool1
2017-09-22.03:15:17 zpool import pool1
2017-09-22.03:32:47 zfs set atime=off pool1
2017-09-22.03:33:09 zpool export -f pool1
2017-09-22.03:33:11 zpool export -f pool1
2017-09-22.03:33:31 zpool import pool1
2017-09-22.03:46:03 zpool export pool1
2017-09-22.03:46:50 zpool import pool1
2017-09-22.03:54:02 zpool export pool1
2017-09-22.03:54:33 zpool import pool1
2017-09-22.03:56:27 zpool scrub pool1
2017-09-22.04:22:28 zpool scrub -p pool1

zpool iostat pool1 -v (while zed active):

                                       capacity     operations     bandwidth
pool                                 alloc   free   read  write   read  write
-----------------------------------  -----  -----  -----  -----  -----  -----
pool1                                22.9T  20.6T    132    822  8.72M  14.1M
  raidz2                             16.1T  5.63T     87    367  5.76M  7.56M
    ata-ST4000DM000-1F2168_Z305MGEK      -      -     14     61   983K  1.27M
    ata-ST4000DM000-1F2168_Z3041PX6      -      -     14     61   983K  1.26M
    ata-ST4000DM000-1F2168_Z303PBJZ      -      -     14     61   983K  1.26M
    ata-ST4000DM000-1F2168_Z304MLBP      -      -     14     61   983K  1.27M
    ata-ST4000DM000-1F2168_S301MCSE      -      -     14     60   983K  1.25M
    ata-ST4000DM000-1F2168_Z304LHB9      -      -     14     60   983K  1.25M
  raidz2                             6.77T  15.0T     45    454  2.96M  6.55M
    ata-ST4000DM000-1F2168_Z304M5SV      -      -      7     76   506K  1.10M
    ata-ST4000DM000-1F2168_Z304LGTW      -      -      7     76   506K  1.09M
    ata-ST4000DM000-1F2168_W301DEPX      -      -      7     77   506K  1.09M
    ata-ST4000DM000-1F2168_S301MCSF      -      -      7     74   506K  1.10M
    ata-ST4000DM000-1F2168_Z307QR9A      -      -      7     74   506K  1.08M
    ata-ST4000DM000-1F2168_Z303F9TK      -      -      7     74   506K  1.08M
-----------------------------------  -----  -----  -----  -----  -----  -----

zpool iostat pool1 -v (while zed inactive):

                                       capacity     operations     bandwidth
pool                                 alloc   free   read  write   read  write
-----------------------------------  -----  -----  -----  -----  -----  -----
pool1                                22.9T  20.6T      6     15   156K   202K
  raidz2                             16.1T  5.63T      3      6  76.8K  89.0K
    ata-ST4000DM000-1F2168_Z305MGEK      -      -      0      1  16.6K  14.7K
    ata-ST4000DM000-1F2168_Z3041PX6      -      -      0      1  11.6K  14.3K
    ata-ST4000DM000-1F2168_Z303PBJZ      -      -      0      1  12.5K  14.3K
    ata-ST4000DM000-1F2168_Z304MLBP      -      -      0      1  11.8K  15.5K
    ata-ST4000DM000-1F2168_S301MCSE      -      -      0      1  11.7K  15.1K
    ata-ST4000DM000-1F2168_Z304LHB9      -      -      0      1  12.8K  15.1K
  raidz2                             6.77T  15.0T      3      9  79.6K   113K
    ata-ST4000DM000-1F2168_Z304M5SV      -      -      0      1  13.4K  18.6K
    ata-ST4000DM000-1F2168_Z304LGTW      -      -      0      1  13.0K  18.6K
    ata-ST4000DM000-1F2168_W301DEPX      -      -      0      1  13.2K  19.6K
    ata-ST4000DM000-1F2168_S301MCSF      -      -      0      1  13.1K  18.5K
    ata-ST4000DM000-1F2168_Z307QR9A      -      -      0      1  12.6K  18.5K
    ata-ST4000DM000-1F2168_Z303F9TK      -      -      0      1  14.3K  19.1K
-----------------------------------  -----  -----  -----  -----  -----  -----

zpool history after shutting down zed:

Sep 22 2017 05:39:57.677353409 sysevent.fs.zfs.pool_destroy
Sep 22 2017 05:39:57.922353061 sysevent.fs.zfs.config_sync
Sep 22 2017 05:40:31.551305297 sysevent.fs.zfs.history_event
Sep 22 2017 05:40:32.078304549 sysevent.fs.zfs.config_sync
Sep 22 2017 05:40:32.078304549 sysevent.fs.zfs.pool_import
Sep 22 2017 05:40:32.125304482 sysevent.fs.zfs.history_event
Sep 22 2017 05:40:32.623303775 sysevent.fs.zfs.config_sync
Sep 22 2017 05:40:37.826296385 sysevent.fs.zfs.vdev_autoexpand
Sep 22 2017 05:40:37.826296385 sysevent.fs.zfs.vdev_autoexpand
Sep 22 2017 05:40:37.826296385 sysevent.fs.zfs.vdev_autoexpand
Sep 22 2017 05:40:37.826296385 sysevent.fs.zfs.vdev_autoexpand
Sep 22 2017 05:40:37.826296385 sysevent.fs.zfs.vdev_autoexpand
Sep 22 2017 05:40:37.826296385 sysevent.fs.zfs.vdev_autoexpand
Sep 22 2017 05:40:37.826296385 sysevent.fs.zfs.vdev_autoexpand
Sep 22 2017 05:40:37.826296385 sysevent.fs.zfs.vdev_autoexpand
Sep 22 2017 05:40:37.826296385 sysevent.fs.zfs.vdev_autoexpand
Sep 22 2017 05:40:37.826296385 sysevent.fs.zfs.vdev_autoexpand
Sep 22 2017 05:40:37.826296385 sysevent.fs.zfs.vdev_autoexpand
Sep 22 2017 05:40:37.826296385 sysevent.fs.zfs.vdev_autoexpand
@ryangoode86
Copy link
Author

As my system has been up several days now with no ZED, i looked at the pool status and iostats and found that the pool is once again spiking in I/O.

 zpool iostat pool1 -v
                                       capacity     operations     bandwidth
pool                                 alloc   free   read  write   read  write
-----------------------------------  -----  -----  -----  -----  -----  -----
pool1                                22.9T  20.6T  3.64K     16   193M   491K
  raidz2                             16.1T  5.62T  2.60K      7   135M   232K
    ata-ST4000DM000-1F2168_Z305MGEK      -      -    463      1  22.5M  38.8K
    ata-ST4000DM000-1F2168_Z3041PX6      -      -    441      1  22.4M  38.5K
    ata-ST4000DM000-1F2168_Z303PBJZ      -      -    459      1  22.5M  38.5K
    ata-ST4000DM000-1F2168_Z304MLBP      -      -    445      1  22.5M  38.8K
    ata-ST4000DM000-1F2168_S301MCSE      -      -    388      1  22.4M  38.5K
    ata-ST4000DM000-1F2168_Z304LHB9      -      -    461      1  22.5M  38.5K
  raidz2                             6.82T  14.9T  1.04K      8  57.9M   259K
    ata-ST4000DM000-1F2168_Z304M5SV      -      -    189      1  9.67M  43.1K
    ata-ST4000DM000-1F2168_Z304LGTW      -      -    172      1  9.65M  43.2K
    ata-ST4000DM000-1F2168_W301DEPX      -      -    158      1  9.60M  43.5K
    ata-ST4000DM000-1F2168_S301MCSF      -      -    164      1  9.61M  43.1K
    ata-ST4000DM000-1F2168_Z307QR9A      -      -    190      1  9.68M  43.2K
    ata-ST4000DM000-1F2168_Z303F9TK      -      -    192      1  9.68M  43.5K

Im not finding anything wrong with the pool other than high I/O and starting zed up immediately begins spamming with config sync messages. Im unsure what to make of it and cant seem to find anything that is related via some searching.

@ryangoode86
Copy link
Author

As well, several worker processes are spawning in conjunction with the disk IO

28978 ?        S<     0:00 [z_null_iss]
28979 ?        S<     0:00 [z_null_int]
28980 ?        S<     0:00 [z_rd_iss]
28981 ?        S<     0:00 [z_rd_int_0]
28982 ?        S<     0:00 [z_rd_int_1]
28983 ?        S<     0:00 [z_rd_int_2]
28984 ?        S<     0:00 [z_rd_int_3]
28985 ?        S<     0:00 [z_rd_int_4]
28986 ?        S<     0:00 [z_rd_int_5]
28987 ?        S<     0:00 [z_rd_int_6]
28988 ?        S<     0:00 [z_rd_int_7]
28989 ?        S<     0:03 [z_wr_iss]
28990 ?        S<     0:03 [z_wr_iss]
28991 ?        S<     0:03 [z_wr_iss]
28992 ?        S<     0:03 [z_wr_iss]
28993 ?        S<     0:03 [z_wr_iss]
28994 ?        S<     0:03 [z_wr_iss]
28995 ?        S<     0:00 [z_wr_iss_h]
28996 ?        S<     0:01 [z_wr_int_0]
28997 ?        S<     0:01 [z_wr_int_1]
28998 ?        S<     0:01 [z_wr_int_2]
28999 ?        S<     0:01 [z_wr_int_3]
29000 ?        S<     0:01 [z_wr_int_4]
29001 ?        S<     0:01 [z_wr_int_5]
29002 ?        S<     0:01 [z_wr_int_6]
29003 ?        S<     0:01 [z_wr_int_7]
29011 ?        S<     0:00 [z_fr_iss_6]
29012 ?        S<     0:00 [z_fr_iss_7]
29013 ?        S<     0:00 [z_fr_int]
29014 ?        S<     0:00 [z_cl_iss]
29015 ?        S<     0:00 [z_cl_int]
29016 ?        S<     0:00 [z_ioctl_iss]
29017 ?        S<     0:00 [z_ioctl_int]
29018 ?        S      0:00 [z_zvol]
29019 ?        S      0:00 [z_upgrade]
29020 ?        S<     0:00 [metaslab_group_]
29024 ?        SN     0:00 [dp_sync_taskq]
29025 ?        SN     0:00 [dp_sync_taskq]
29026 ?        SN     0:00 [dp_sync_taskq]

@michaelmrose
Copy link

I had noticed on my system funtoo current zed constantly spiking the cpu after upgrade I will have to collect more data.

@ryangoode86
Copy link
Author

i couldnt figure it out. after doing some debugs it looked like udev was constantly making the drive appear ONLINE even though that was its previous state.

In the end i nuked the OS since all of my customizations are in docker and stored on the pools, and low and behold, the issue was solved.

No more constant churn on the drives and the IO was back to idling with nothing going on.

@behlendorf behlendorf added the Component: ZED ZFS Event Daemon label Oct 2, 2017
@behlendorf
Copy link
Contributor

@firechicken86 thanks for following up and letting us know what was causing the load. Based on your description I'm going to close this issue since it sounds like there's too much we can do on the ZFS side. When udev generates an event we really do need to handle it.

@therevoman
Copy link

therevoman commented Nov 28, 2017

Sorry to post to a closed ticket but I am experiencing this issue. My system was automatically upgraded from RHEL 7.3 to 7.4, I removed and re-added zfs according to the docs. Same versions (slightly newer kernel) from what's listed at the top. I really don't want to blow away the system....so I'll try a few things.

zpool export tank
reboot
zpool import tank
reboot

seemed to fix it.

@tonyhutter
Copy link
Contributor

Possible fix: #13729

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: ZED ZFS Event Daemon
Projects
None yet
Development

No branches or pull requests

5 participants