Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

txg_sync, z_null_iss and txg_quiesce freezes #3409

Closed
ghost opened this issue May 13, 2015 · 55 comments
Closed

txg_sync, z_null_iss and txg_quiesce freezes #3409

ghost opened this issue May 13, 2015 · 55 comments

Comments

@ghost
Copy link

ghost commented May 13, 2015

I have an issue very similiar to #3235 where txg_sync, z_null_iss/0 and txg_quiesce are stuck and consuming CPU. Any processes trying to access some files inside the filesystem will get stuck as well.

I cannot restart the system but have to turn the power off.

Some information and stack traces:
slabinfo http://pastebin.com/JLyqJKJA
kmem/slab http://pastebin.com/sRY3BWQn
arcstats http://pastebin.com/T6tm8702
meminfo http://pastebin.com/qDrDipkV
SysRq-t http://paste.ubuntu.com/11109410/
SysRq-w http://pastebin.com/qn8qqKiB

Worth mentioning is also that I do use dmcrypt on the block devices, in case it makes any difference.
The freeze occured shortly after firing up rtorrent to start seeding a torrent.

I have no problems with "normal" IO (moving and opening files), however.

@kernelOfTruth
Copy link
Contributor

Please also post some information on your harddrive setup (mirror ? raidz ? SSD ? even LVM ?), what models ? NCQ enabled/working ?

Memory is obviously 8 GiB. Intel ? AMD ?

NO swap

NO L2arc

since you're posting on paste.ubuntu.com - it's ubuntu ? (or according to your profile - Gentoo =) )

what kernel version ?

~amd64 ? (64bit ?)

@ghost
Copy link
Author

ghost commented May 13, 2015

I'm doing a raidz consisting of 4tb (3 x 1tb + 2 x 0.5tb). No SSDs or LVM.

NCQ enabled (/sys/block/sde/device/queue_depth = 31) on all drives. Not sure how to tell if it is working or not though.
CPU: model name : Intel(R) Core(TM) i5-3570K CPU @ 3.40GHz

Correct, no swap enabled. According go Memavailable it doesn't seem to be much of an issue though.

edit: 64 bit gentoo

I used the ubuntu pastebin because of a 512kb limit at pastebin.

uname -a: Linux localhost 4.0.1-geek #1 SMP Thu May 7 15:29:41 2015 x86_64 Intel(R) Core(TM) i5-3570K CPU @ 3.40GHz GenuineIntel GNU/Linux

I have also tried Linux 3.17 and the issue is still there. I'm using them compiled into kernel - not as modules.

@ghost
Copy link
Author

ghost commented May 13, 2015

emerge --info zfs

sys-fs/zfs-0.6.4::gentoo was built with the following: USE="kernel-builtin -bash-completion -custom-cflags -debug -rootfs -static-libs -test-suite" ABI_X86="64" PYTHON_TARGETS="python2_7 python3_3 -python3_4

@kernelOfTruth
Copy link
Contributor

How full is the pool ?

zpool list -v

that should also show fragmentation

@angstymeat
Copy link

Sometime between the April 23rd commits and May13th, something has happened that has made my issue even worse and I am now seeing signs like @hallabro is mentioning. I compiled ZFS & SPL from source last night and when my backups run I end up with txg_quiesce and txg_sync taking up between 50% and 90% of my CPU and performance screeches to a halt with all cores at >60% within a few minutes

My machine has 16GB of RAM, but it didn't exceed 5.9GB of usage. Running echo 3 > /proc/sys/vm/drop_caches does nothing to alleviate the CPU usage and took hours to release memory.

My ARC graphs are currently showing ~129K cache hits/sec and 18K misses/sec while the ARC is nearly empty.

Once again, the system is a Dell R515 with 16GB of RAM. The pool is a single raidz2 pool made of of 7 2TB SATA drives. Compression is enabled and set to lz4. atime is off.

The OS is Fedora 20 with the 3.17.8-200.fc20.x86_64 kernel.

I've uploaded debugging information as requested in #3303 and #3235 to https://cloud.passcal.nmt.edu/index.php/s/gHacCaxkl8VCI6z.

It is currently still running if you need me to do anything else.

@dweeezil
Copy link
Contributor

@angstymeat A shot in the dark: check zpool events for any appearance of ereport.fs.zfs.delay.

@dweeezil
Copy link
Contributor

@angstymeat Another idea: echo 100 > /sys/module/zfs/parameters/zfs_txg_history and then watch /proc/spl/kstat/zfs/<pool>/txgs to see how the txgs are being created and disposed of. If both the sync and quiesce threads are spinning, there should be a whole lot of txgs being created, yet I get the impression there's not much IO happening.

@angstymeat
Copy link

zpool events doesn't list ereport.fs.zfs.delay. What are the parameters for that command? I'm only showing 9 events and I don't know if it's limiting how much it is showing, or if that's all that's there.

txgs lists this, but I'm not sure how to interpret it. It looks to me like it is creating a lot of txgs.

txg      birth            state ndirty       nread        nwritten     reads    writes   otime        qtime        wtime        stime
5611684927 40266933252447   C     0            0            0            0        0        196140       7169         174616       207501
5611684928 40266933448587   C     0            0            0            0        0        187497       7976         197611       227984
5611684929 40266933636084   C     0            0            0            0        0        214255       7255         215854       225487
5611684930 40266933850339   C     0            0            0            0        0        227971       7177         217244       236663
5611684931 40266934078310   C     0            0            0            0        0        229432       7299         228186       207331
5611684932 40266934307742   C     0            0            0            0        0        241266       7322         197900       206635
5611684933 40266934549008   C     0            0            0            0        0        213878       7284         194957       184563
5611684934 40266934762886   C     0            0            0            0        0        207167       7240         176047       189857
5611684935 40266934970053   C     0            0            0            0        0        188389       7134         181403       154161
5611684936 40266935158442   C     0            0            0            0        0        194289       7320         142922       161459
5611684937 40266935352731   C     0            0            0            0        0        157764       7558         149974       201496
5611684938 40266935510495   C     0            0            0            0        0        162434       7403         192749       189616
5611684939 40266935672929   C     0            0            0            0        0        205098       7337         180890       202037
5611684940 40266935878027   C     0            0            0            0        0        193880       7363         192727       217081
5611684941 40266936071907   C     0            0            0            0        0        208603       7360         204910       183817
5611684942 40266936280510   C     0            0            0            0        0        217173       7208         175544       195831
5611684943 40266936497683   C     0            0            0            0        0        187700       7565         186981       196515
5611684944 40266936685383   C     0            0            0            0        0        200130       7196         187408       199199
5611684945 40266936885513   C     0            0            0            0        0        203025       7258         187202       197784
5611684946 40266937088538   C     0            0            0            0        0        199375       7241         189774       180228
5611684947 40266937287913   C     0            0            0            0        0        207211       11548        160171       198274
5611684948 40266937495124   C     0            0            0            0        0        186958       9221         177649       168092
5611684949 40266937682082   C     0            0            0            0        0        195428       7283         155998       184087
5611684950 40266937877510   C     0            0            0            0        0        168302       7233         173749       150507
5611684951 40266938045812   C     0            0            0            0        0        183828       7410         143953       156328
5611684952 40266938229640   C     0            0            0            0        0        155667       7954         147929       139879
5611684953 40266938385307   C     0            0            0            0        0        164508       7287         125876       139221
5611684955 40266938693560   C     0            0            0            0        0        133756       7555         113426       225776
5611684956 40266938827316   C     0            0            0            0        0        128752       7164         214457       217005
5611684957 40266938956068   C     0            0            0            0        0        230311       7250         204784       206514
5611684958 40266939186379   C     0            0            0            0        0        216926       7046         198469       200343
5611684959 40266939403305   C     0            0            0            0        0        210494       7130         191942       213046
5611684960 40266939613799   C     0            0            0            0        0        204708       7171         204341       210999
5611684961 40266939818507   C     0            0            0            0        0        220062       7138         199093       179817
5611684962 40266940038569   C     0            0            0            0        0        211549       7051         171168       223642
5611684963 40266940250118   C     0            0            0            0        0        183255       7091         215216       222092
5611684964 40266940433373   C     0            0            0            0        0        227858       7070         213274       239212
5611684965 40266940661231   C     0            0            0            0        0        228919       7056         227278       244209
5611684966 40266940890150   C     0            0            0            0        0        239192       7117         235961       224522
5611684967 40266941129342   C     0            0            0            0        0        247909       7299         216092       245538
5611684968 40266941377251   C     0            0            0            0        0        228944       7438         236269       236082
5611684969 40266941606195   C     0            0            0            0        0        252271       7221         224266       239865
5611684970 40266941858466   C     0            0            0            0        0        236437       7258         231351       229256
5611684971 40266942094903   C     0            0            0            0        0        243672       7364         220511       240659
5611684972 40266942338575   C     0            0            0            0        0        233446       7270         231596       236306
5611684973 40266942572021   C     0            0            0            0        0        247430       7268         224321       237710
5611684974 40266942819451   C     0            0            0            0        0        236517       7371         229096       211335
5611684975 40266943055968   C     0            0            0            0        0        241461       7258         202701       211541
5611684976 40266943297429   C     0            0            0            0        0        215676       7287         202123       211804
5611684977 40266943513105   C     0            0            0            0        0        217891       7245         199611       130831
5611684978 40266943730996   C     0            0            0            0        0        211681       7218         122841       109830
5611684979 40266943942677   C     0            0            0            0        0        132767       7876         103440       128564
5611684980 40266944075444   C     0            0            0            0        0        126521       7926         109567       120327
5611684981 40266944201965   C     0            0            0            0        0        129722       7235         103165       160581
5611684982 40266944331687   C     0            0            0            0        0        116865       7368         150438       147786
5611684984 40266944611253   C     0            0            0            0        0        152712       7316         136276       83756
5611684985 40266944763965   C     0            0            0            0        0        152414       7985         73852        113965
5611684986 40266944916379   C     0            0            0            0        0        85565        11809        105979       173027
5611684987 40266945001944   C     0            0            0            0        0        131637       12120        149033       157651
5611684988 40266945133581   C     0            0            0            0        0        164868       7368         154116       174301
5611684989 40266945298449   C     0            0            0            0        0        169012       7469         163002       198102
5611684990 40266945467461   C     0            0            0            0        0        175643       7708         188656       171714
5611684991 40266945643104   C     0            0            0            0        0        207476       9459         154698       178884
5611684992 40266945850580   C     0            0            0            0        0        172948       8595         170382       139631
5611684993 40266946023528   C     0            0            0            0        0        189111       8736         125388       165318
5611684994 40266946212639   C     0            0            0            0        0        140762       7834         154359       175324
5611684995 40266946353401   C     0            0            0            0        0        166923       7286         166955       186276
5611684996 40266946520324   C     0            0            0            0        0        179777       7319         176632       185409
5611684997 40266946700101   C     0            0            0            0        0        192374       7817         172423       190712
5611684998 40266946892475   C     0            0            0            0        0        185104       7287         181784       213414
5611684999 40266947077579   C     0            0            0            0        0        193991       7723         204241       202613
5611685000 40266947271570   C     0            0            0            0        0        217474       7398         192847       224835
5611685001 40266947489044   C     0            0            0            0        0        208571       7437         212218       191172
5611685002 40266947697615   C     0            0            0            0        0        228091       8250         177734       202711
5611685003 40266947925706   C     0            0            0            0        0        194272       8678         189281       201001
5611685004 40266948119978   C     0            0            0            0        0        203633       7358         191110       192053
5611685005 40266948323611   C     0            0            0            0        0        206723       7509         179779       216454
5611685006 40266948530334   C     0            0            0            0        0        192114       7368         207454       216024
5611685007 40266948722448   C     0            0            0            0        0        219621       7577         207059       222190
5611685008 40266948942069   C     0            0            0            0        0        223064       8405         208631       154935
5611685009 40266949165133   C     0            0            0            0        0        226649       8742         139271       164550
5611685010 40266949391782   C     0            0            0            0        0        150628       7784         157742       159328
5611685011 40266949542410   C     0            0            0            0        0        170228       7340         150880       140216
5611685013 40266949876153   C     0            0            0            0        0        144824       7319         173242       188915
5611685014 40266950020977   C     0            0            0            0        0        183403       7407         181712       196074
5611685015 40266950204380   C     0            0            0            0        0        197672       8687         182095       193226
5611685016 40266950402052   C     0            0            0            0        0        199241       8464         178195       183737
5611685017 40266950601293   C     0            0            0            0        0        194235       7586         171799       203591
5611685018 40266950795528   C     0            0            0            0        0        183933       8161         194073       200200
5611685019 40266950979461   C     0            0            0            0        0        206851       7506         191267       78775
5611685020 40266951186312   C     0            0            0            0        0        630512       9296         51957        141484
5611685021 40266951816824   C     0            0            0            0        0        73307        11711        120885       149342
5611685022 40266951890131   C     0            0            0            0        0        137305       7578         140130       192187
5611685023 40266952027436   C     0            0            0            0        0        152400       7694         183078       190694
5611685024 40266952179836   C     0            0            0            0        0        199660       8596         176495       181054
5611685025 40266952379496   C     0            0            0            0        0        194464       8795         166140       136704
5611685026 40266952573960   S     0            0            0            0        0        179716       55692        81422        0
5611685027 40266952753676   W     0            0            0            0        0        150966       8856         0            0
5611685028 40266952904642   O     0            0            0            0        0        0            0            0            0```

@dweeezil
Copy link
Contributor

@angstymeat I figured the events were a longshot. In any case, the txg history output quite clearly shows the problem. The txgs are performing exactly zero IO and are occurring about every 150 microseconds. Ouch! No wonder the system has recorded over 5 billion txgs.

@dweeezil
Copy link
Contributor

@angstymeat You didn't by chance change the value of zfs_txg_timeout?

EDIT: Maybe cat /sys/module/zfs/parameters/zfs_txg_timeout to check it. It doesn't look like we've got any safeguard against it being set to zero.

@angstymeat
Copy link

No, all of my module parameters should be the defaults.

zfs_txg_timeout is currently 5.

@dweeezil
Copy link
Contributor

@angstymeat Just to be sure, could you please post all of them:

for i in /sys/module/zfs/parameters/*; do echo $(basename $i) $(cat $i); done

@behlendorf
Copy link
Contributor

And cat /proc/spl/kstat/zfs/dmu_tx would be helpful.

@angstymeat
Copy link

The module parameters are:

l2arc_feed_again 1
l2arc_feed_min_ms 200
l2arc_feed_secs 1
l2arc_headroom 2
l2arc_headroom_boost 200
l2arc_nocompress 0
l2arc_noprefetch 1
l2arc_norw 0
l2arc_write_boost 8388608
l2arc_write_max 8388608
metaslab_bias_enabled 1
metaslab_debug_load 0
metaslab_debug_unload 0
metaslab_fragmentation_factor_enabled 1
metaslab_lba_weighting_enabled 1
metaslab_preload_enabled 1
metaslabs_per_vdev 200
spa_asize_inflation 24
spa_config_path /etc/zfs/zpool.cache
spa_load_verify_data 1
spa_load_verify_maxinflight 10000
spa_load_verify_metadata 1
zfetch_array_rd_sz 1048576
zfetch_block_cap 256
zfetch_max_streams 8
zfetch_min_sec_reap 2
zfs_arc_average_blocksize 8192
zfs_arc_grow_retry 5
zfs_arc_max 0
zfs_arc_memory_throttle_disable 1
zfs_arc_meta_adjust_restarts 4096
zfs_arc_meta_limit 0
zfs_arc_meta_prune 10000
zfs_arc_min 0
zfs_arc_min_prefetch_lifespan 1000
zfs_arc_p_aggressive_disable 1
zfs_arc_p_dampener_disable 1
zfs_arc_shrink_shift 5
zfs_autoimport_disable 1
zfs_dbuf_state_index 0
zfs_deadman_enabled 1
zfs_deadman_synctime_ms 1000000
zfs_dedup_prefetch 0
zfs_delay_min_dirty_percent 60
zfs_delay_scale 500000
zfs_dirty_data_max 1667718348
zfs_dirty_data_max_max 4169295872
zfs_dirty_data_max_max_percent 25
zfs_dirty_data_max_percent 10
zfs_dirty_data_sync 67108864
zfs_disable_dup_eviction 0
zfs_expire_snapshot 300
zfs_flags 0
zfs_free_leak_on_eio 0
zfs_free_max_blocks 100000
zfs_free_min_time_ms 1000
zfs_immediate_write_sz 32768
zfs_max_recordsize 1048576
zfs_mdcomp_disable 0
zfs_metaslab_fragmentation_threshold 70
zfs_mg_fragmentation_threshold 85
zfs_mg_noalloc_threshold 0
zfs_nocacheflush 0
zfs_nopwrite_enabled 1
zfs_no_scrub_io 0
zfs_no_scrub_prefetch 0
zfs_pd_bytes_max 52428800
zfs_prefetch_disable 0
zfs_read_chunk_size 1048576
zfs_read_history 0
zfs_read_history_hits 0
zfs_recover 0
zfs_resilver_delay 2
zfs_resilver_min_time_ms 3000
zfs_scan_idle 50
zfs_scan_min_time_ms 1000
zfs_scrub_delay 4
zfs_send_corrupt_data 0
zfs_sync_pass_deferred_free 2
zfs_sync_pass_dont_compress 5
zfs_sync_pass_rewrite 2
zfs_top_maxinflight 32
zfs_txg_history 100
zfs_txg_timeout 5
zfs_vdev_aggregation_limit 131072
zfs_vdev_async_read_max_active 3
zfs_vdev_async_read_min_active 1
zfs_vdev_async_write_active_max_dirty_percent 60
zfs_vdev_async_write_active_min_dirty_percent 30
zfs_vdev_async_write_max_active 10
zfs_vdev_async_write_min_active 1
zfs_vdev_cache_bshift 16
zfs_vdev_cache_max 16384
zfs_vdev_cache_size 0
zfs_vdev_max_active 1000
zfs_vdev_mirror_switch_us 10000
zfs_vdev_read_gap_limit 32768
zfs_vdev_scheduler noop
zfs_vdev_scrub_max_active 2
zfs_vdev_scrub_min_active 1
zfs_vdev_sync_read_max_active 10
zfs_vdev_sync_read_min_active 10
zfs_vdev_sync_write_max_active 10
zfs_vdev_sync_write_min_active 10
zfs_vdev_write_gap_limit 4096
zfs_zevent_cols 80
zfs_zevent_console 0
zfs_zevent_len_max 1024
zil_replay_disable 0
zil_slog_limit 1048576
zio_delay_max 30000
zio_requeue_io_start_cut_in_line 1
zvol_inhibit_dev 0
zvol_major 230
zvol_max_discard_blocks 16384
zvol_threads 32

and dmu_tx is:

name                            type data
dmu_tx_assigned                 4    11307
dmu_tx_delay                    4    0
dmu_tx_error                    4    0
dmu_tx_suspended                4    0
dmu_tx_group                    4    151129
dmu_tx_memory_reserve           4    0
dmu_tx_memory_reclaim           4    0
dmu_tx_dirty_throttle           4    4875683387
dmu_tx_dirty_delay              4    0
dmu_tx_dirty_over_max           4    0
dmu_tx_quota                    4    0

@dweeezil
Copy link
Contributor

@angstymeat Very interesting. The system appears to have backed itself into a corner rather nicely by beating arc_c down so low. It seems to me that any OpenZFS system could potentially get into this state. I've got to wonder whether our new super-aggressive metadata reclaim is indirectly responsible for this. If this problem is easy for you to reproduce, you might want to try raising zfs_arc_min to maybe 1073741824 and see what happens. You'll likely have to reboot the system in order to un-stick it. I'd suggest setting it at modprobe time so arc_c doesn't have a chance to get too low.

@behlendorf
Copy link
Contributor

I've seen this same thing once before actually. We should probably give some thought on how to tweak that dmu_tx_dirty_throttle case so this can't happen.

@angstymeat
Copy link

I can reproduce this in about 10 minutes.

I've just changed zfs_arc_min to 1G and I'm rebooting to see what happens.

@behlendorf
Copy link
Contributor

@dweeezil other OpenZFS platforms might not be able to trigger this as easily because they never allow zfs_arc_min to drop below 1/32 of total memory.

@dweeezil
Copy link
Contributor

@behlendorf Agreed, it was more of a hypothetical assertion. We should certainly investigate some safeguards in the throttle to prevent this from happening.

@angstymeat
Copy link

I'm wondering if I might have done something stupid and wasted your time with this.

Last night I played around with applying #3308 and after I started my backup I saw the same thing I reported today, with txg_sync and txg_quiesce freezing the system. I stopped, recompiled ZFS without #3308, rebooted and had the same thing happen. I assumed it was a problem with the GIT committed code and #3308 wasn't directly responsible.

Now, I'm wondering if I made a mistake and missed making a new initramfs after I recompiled so I was still running the code with #3308 applied.

Currently, everything appears to be working normally, neglecting my issues with #3303. Would changing zfs_arc_min make this much of a difference? When this run is finished I'm going to try setting it back to 0 and see if I can reproduce the problem.

@behlendorf
Copy link
Contributor

@angstymeat yes, setting zfs_arc_min to say 1G would keep the ARC from ever collapsing below that and would help keep it out of this case.

@dweeezil
Copy link
Contributor

@angstymeat The latest version of #3308 would effectively be a no-op so it shouldn't be entering into the equation. Raising zfs_arc_min should be a good workaround for you for the time being. It looks like we need to add some safeguards to the throttle to deal with this situation.

@angstymeat
Copy link

Ok. I've tried it again with zfs_arc_min set back to 0 and I'm not having problems this time, but it could also be because testing with the higher zfs_arc_min allowed it copy a lot of files into place and I'm not in the same situation I was this morning anymore.

I'm going to run it with zfs_arc_min set to 0 for now and if I see it again tonight or tomorrow (when there should be a bunch of new files to sync) I try moving it back up to 1G.

I will say that it seems like my memory is filling a lot slower with the commits from the last week or so in place.

@angstymeat
Copy link

Running for about 75 minutes and all of a sudden txg_sync and txg_quiesce exploded into 80%+ CPU with the ARC at 5G in size. I'm going to keep an eye on it. It's processing the email servers so there's a lot of small files.

@angstymeat
Copy link

The rsync process that was running the mail backup died with a socket error from hanging for so long, and the other processes dropped back down to normal CPU times.

@dewiniaid
Copy link

I'm running an unconventional setup, but I think I've just encountered this same issue. I've run through all of the diagnostic steps here with similar results, so I'm now rebooting with zfs_arc_min set to 4G.

I use dedup & compression on one of three zpools, but this issue caused all IO on all pools to stall. (Interestingly, I could ls /pool and see a directory, but ls /pool/folder stalled -- presumably the former was cached). zfs iostat 1 also listed a small amount writes (but no reads) to one of the pools at probably an average of 1-2 MB/s -- but nothing on the system should have been causing them.

(The unconventional setup in use is essentially: zpools /local and /remote on separate virtual disks, zvol /local/inner, zpool /storage with compress+dedup on the zvol, periodic zfs send/recv of snapshots between /local and /remote. The exact semantics and reasoning is likely unimportant for this issue, but I can provide it upon request.)

@kernelOfTruth
Copy link
Contributor

@dewiniaid in the end - @angstymeat 's issues were fixed by running a fairly up-to-date zfs stack and an NUMA-aware per-superblock shrinker ( 90947b2 )

you're by chance running a NUMA setup ?

@behlendorf
Copy link
Contributor

All the related fixes have been merged to master if you want to try that.

@dewiniaid
Copy link

It turns out that I was running a NUMA setup within the VM -- which is odd, because the actual physical host only has one of the two sockets actually filled. Not entirely certain how that happened. That may explain some messaging I saw about possible memory deadlocks then.

I just reconfigured the VM to not use NUMA (it shouldn't be anyways!), so I'll see if that improves things. (I'm running off the Ubuntu PPA currently -- which looks like it's about 2 months behind in terms of last being updated.)

@devZer0
Copy link

devZer0 commented Oct 6, 2016

i think we hit the same issue here.

  • xenserver 7 virtual machine
  • one large raidz pool - 4 virtual disks
  • lz4 compression
  • multiple zfs subvolumes
  • zfs-auto-snapshot
  • mostly being written to via rsync --inplace

cat /etc/redhat-release

CentOS Linux release 7.2.1511 (Core)

rpm -q -a|grep -i zfs

zfs-0.6.5.8-1.el7.centos.x86_64
zfs-dkms-0.6.5.8-1.el7.centos.noarch
zfs-release-1-3.el7.centos.noarch
libzfs2-0.6.5.8-1.el7.centos.x86_64

top - 10:52:58 up 7 days, 21:57, 2 users, load average: 12,35, 12,43, 11,60
Tasks: 652 total, 5 running, 647 sleeping, 0 stopped, 0 zombie
%Cpu0 : 0,0 us, 0,6 sy, 0,0 ni, 96,1 id, 0,0 wa, 0,0 hi, 0,0 si, 3,4 st
%Cpu1 : 0,4 us, 31,9 sy, 0,0 ni, 67,3 id, 0,0 wa, 0,0 hi, 0,0 si, 0,4 st
%Cpu2 : 0,0 us, 1,5 sy, 0,0 ni, 0,0 id, 94,7 wa, 0,0 hi, 0,0 si, 3,8 st
%Cpu3 : 0,0 us, 24,3 sy, 0,0 ni, 75,3 id, 0,0 wa, 0,0 hi, 0,0 si, 0,4 st
KiB Mem : 3618884 total, 1202284 free, 217676 used, 2198924 buff/cache
KiB Swap: 1048572 total, 942880 free, 105692 used. 1231560 avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1820 root 20 0 0 0 0 R 47,0 0,0 448:16.06 txg_sync
1430 root 0 -20 0 0 0 R 31,5 0,0 254:08.98 z_null_iss
1819 root 20 0 0 0 0 S 20,2 0,0 133:16.90 txg_quiesce
26554 root 20 0 0 0 0 D 6,6 0,0 2:02.25 backuphost.sh
17713 root 20 0 1236 4 0 D 4,6 0,0 25:09.12 autosnap.sh

zpool list -v

NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
zfspool 6,22T 2,05T 4,17T - 20% 32% 1.00x ONLINE -
raidz1 6,22T 2,05T 4,17T - 20% 32%
xvdb1 - - - - - -
xvdc1 - - - - - -
xvdd1 - - - - - -
xvde1 - - - - - -

zpool status -v

pool: zfspool
state: ONLINE
scan: resilvered 5,18G in 307445734561825858h42m with 0 errors on Fri Sep 23 18:31:37 2016
config:

    NAME        STATE     READ WRITE CKSUM
    zfspool     ONLINE       0     0     0
      raidz1-0  ONLINE       0     0     0
        xvdb1   ONLINE       0     0     0
        xvdc1   ONLINE       0     0     0
        xvdd1   ONLINE       0     0     0
        xvde1   ONLINE       0     0     0

errors: No known data errors

@devZer0
Copy link

devZer0 commented Oct 6, 2016

slabinfo http://pastebin.com/31ye2RUT
kmem/slab http://pastebin.com/ESs1n7hn
arcstats http://pastebin.com/nck266E4
meminfo http://pastebin.com/e3ZWsseG
sysrq-t: http://paste.ubuntu.com/23283821/
sysrq-w: http://pastebin.com/nYvitrDp
zfs module parameters http://pastebin.com/17skkB3g
txgs http://pastebin.com/dKkZBU8p

cat /proc/spl/kstat/zfs/dmu_tx
5 1 0x01 11 528 9962951018 688642839972838
name type data
dmu_tx_assigned 4 103634060
dmu_tx_delay 4 17
dmu_tx_error 4 0
dmu_tx_suspended 4 0
dmu_tx_group 4 4
dmu_tx_memory_reserve 4 0
dmu_tx_memory_reclaim 4 466
dmu_tx_dirty_throttle 4 442069800
dmu_tx_dirty_delay 4 44753
dmu_tx_dirty_over_max 4 0
dmu_tx_quota 4 0

@devZer0
Copy link

devZer0 commented Oct 6, 2016

don`t know if this helps, but in #zfsonlinux i was told to collect some data with "perf record" , so i issued "perf record -ags". can provide datafile if this helps

@devZer0
Copy link

devZer0 commented Nov 8, 2016

the issue starts to appear more and more often here, so i`d call this a real bummer. i think zfs is not "production ready" with this, as regular reboots being needed to resolve this.

echo 1|3 >/proc/sys/vm/drop_caches does not work, btw - at least not when this issue is already occuring. echoing to that procfs file on my system hangs and never returns.

i set zfs_arc_min to 1GB now and a background task with echo 3 >drop_caches ever five minutes - to see if this makes a difference

NOTE: the above tuning seems to have cured the problem

@ianabc
Copy link

ianabc commented Feb 17, 2017

I'm pretty sure I just walked into this as well. My system started running low on memory, txg_sync, and txg_quiesce were both pegged at 50-60% (8 CPU system) for 40-50 minutes. I tried drop_caches a couple of times to no noticeable effect, but I noticed my arc size had dropped to 437M and the target size was down to 138M. Setting zfs_arc_min to 1G brought things back to life almost instantaneously.

@devZer0
Copy link

devZer0 commented Feb 17, 2017

does that mean you can interactively "heal" the problem when it`s occurring just by setting zfs_arc_min via sysfs?

@ianabc
Copy link

ianabc commented Feb 17, 2017

I only have one data point at the moment, but in my case yes. My load average was over 100 when I did it and it immediately started dropping back.

The system was serving a lot of interactive users when it started running into problems so might be a little bit hard for me to reproduce. Basically I think new user processes had chewed through all of the available system memory (I confirmed this) and so ZFS started squashing down the ARC to very small values. It then ended up working incredible hard trying to work within the memory constraint.

I can't reproduce the exact conditions on demand, but I have another system running the same versions of everything and I'll try making a little bit of IO load then start claiming as much system memory as I can and see what happens.

@kernelOfTruth
Copy link
Contributor

Considering the background, why don't we raise the minimum size for ARC (zfs_arc_min) to at least 1 GiB when available RAM equals 3 or more RAM ?

(e.g. @devZer0 's RAM size is 3.5 GiB)

Smaller systems might need more careful adjustment of that setting ...

the comments I collected regarding this and similar issue was:

setting zfs_arc_min to 1GiB minimum for more stability
and preventing massive load and stalls

@joelzamboni
Copy link

Thanks for the recommendation, worked for me:

echo 1073741824 >> /sys/module/zfs/parameters/zfs_arc_min

Added to the file /etc/modprobe.d/zfs.conf:

options zfs zfs_arc_max=8589934592
options zfs zfs_arc_min=1073741824
options zfs zfs_prefetch_disable=1

Before you copy the parameters, the server has 16GB of RAM

@behlendorf
Copy link
Contributor

Ideally we want to make the default minimum as small as possible, but no smaller. This ensures the system can reclaim as much memory as possible when it's absolutely needed. And it makes it possible to use ZFS on small low memory systems.

@ianabc @joelzamboni if you have a test system it would be great if you could try increasing zfs_arc_min to say 64M or 128M and see if that's enough to resolve the issue.

@ianabc
Copy link

ianabc commented Mar 2, 2017 via email

@uzytkownik
Copy link

uzytkownik commented Jul 28, 2017

I reproduced the problem when I tried to migrate to zfs and started copying images to zvol on Gentoo (kernel 4.9, zfs 0.7.0). Server has 16 GiB of memory and it chokes after 22GiB of data on 750 GiB partition. Increasing limits and periodically dropping cache seems to work but the latter is annoying.

Most annoying is that system seems idle - no disk/cpu activity beyond background radiation level, swap empty etc.

FWIW clearing pagecache seems to be the most important thing...

@behlendorf
Copy link
Contributor

@uzytkownik could you try setting the module option zfs_arc_pc_percent=100 and see if that resolves the issue.

@uzytkownik
Copy link

@behlendorf No, at least not during operation is running. I'm rebooting the system to check.

I have an additional data point - I have zfs on lvm on mdraid. When I tried to pvmove an unrelated partition but it got stuck/hanged as well. Is it possible that Linux scheduler is in some weird state for whatever reason?

@uzytkownik
Copy link

No. I run into the same problem after reboot.

@cytrinox
Copy link

I could reproduce this issue on my system, too. Setting zfs_arc_min to 1G resolves the problem. Even during a txg_sync freeze and stuck rsync this immediately fixed it.

@stewartadam
Copy link

I experienced the same issue on a large (multi-TB) zfs send/recv to mirror my pool to a new pool featuring the new native encryption. Setting min arc size per comments above immediately unblocked I/O and CPU.

@behlendorf
Copy link
Contributor

@cytrinox @stewartadam would it be possible for either of you to experiment a little to determine if smaller values of zfs_arc_min would have also avoided the issue. We'd rather not increase this minimal value more than we need too. Thanks.

@stewartadam
Copy link

stewartadam commented Aug 25, 2017

I'm in the middle of a multi-TB send/recv and just changed the zfs_arc_min from 1G to 128M and it's been progressing fine for the last ~4 hours... but not sure if having it prior set to 1G affects the results.

@stewartadam
Copy link

stewartadam commented Sep 10, 2017

Just experienced the same issue running zts on a VM (vanilla zfs 0.7.1 from the F26 repos). Froze up on test tests/functional/rsend/rsend_021_pos with high CPU usage for 60+ mins and unfroze immediately once I bumped up zfs_arc_min from 0 to 256M.

@alek-p
Copy link
Contributor

alek-p commented Oct 4, 2017

Looks like the original issue was fixed in 1b8951b3 so in v0.7.0-rc1

@jkalousek
Copy link

jkalousek commented Dec 22, 2017

I suspect that I'm experiencing same or very similar problem on module v0.7.3-1, ZFS pool version 5000, ZFS filesystem version 5.
I'm running Proxmox with 2x SSD in Raid 1 as storage for VMs and 2x SataDom also in Raid 1 for system
Machine has:
Intel Xeon CPU E5-2620 v4
64 GB of ECC RAM - normal usage is around 25 GB

options zfs zfs_arc_max=34359738368
options zfs zfs_arc_min=1073741824
options zfs zfs_prefetch_disable=1

As soon as I hit rsync copy or VM backup system load shoots up so high that It will froze whole machine, I will get txg_sync (and more) blocked for more than 120s and copy speed will fall down to KB/s. Sometimes I have to do hard reboot to even get server back, sometimes it responds in short bursts to console.

This problem started to appear slowly few months ago when I noticed occasionally slowdowns during copy and now (from week ago) I can't copy anything from pool without freezing server.

I already tested copy/backups with all VMs turn of so It's not caused by load from them.

Also sometimes txg_sync hang messages will pop up also during totally quiet period when server is not doing anything which confuses me most.

# cat /proc/spl/kstat/zfs/dmu_tx
12 1 0x01 11 528 3399225792 904138080608
name                            type data
dmu_tx_assigned                 4    357141
dmu_tx_delay                    4    0
dmu_tx_error                    4    0
dmu_tx_suspended                4    0
dmu_tx_group                    4    252
dmu_tx_memory_reserve           4    0
dmu_tx_memory_reclaim           4    0
dmu_tx_dirty_throttle           4    0
dmu_tx_dirty_delay              4    0
dmu_tx_dirty_over_max           4    0
dmu_tx_quota                    4    10
dbuf_cache_hiwater_pct 10
dbuf_cache_lowater_pct 10
dbuf_cache_max_bytes 104857600
dbuf_cache_max_shift 5
dmu_object_alloc_chunk_shift 7
ignore_hole_birth 1
l2arc_feed_again 1
l2arc_feed_min_ms 200
l2arc_feed_secs 1
l2arc_headroom 2
l2arc_headroom_boost 200
l2arc_noprefetch 1
l2arc_norw 0
l2arc_write_boost 8388608
l2arc_write_max 8388608
metaslab_aliquot 524288
metaslab_bias_enabled 1
metaslab_debug_load 0
metaslab_debug_unload 0
metaslab_fragmentation_factor_enabled 1
metaslab_lba_weighting_enabled 1
metaslab_preload_enabled 1
metaslabs_per_vdev 200
send_holes_without_birth_time 1
spa_asize_inflation 24
spa_config_path /etc/zfs/zpool.cache
spa_load_verify_data 1
spa_load_verify_maxinflight 10000
spa_load_verify_metadata 1
spa_slop_shift 5
zfetch_array_rd_sz 1048576
zfetch_max_distance 8388608
zfetch_max_streams 8
zfetch_min_sec_reap 2
zfs_abd_scatter_enabled 1
zfs_abd_scatter_max_order 10
zfs_admin_snapshot 1
zfs_arc_average_blocksize 8192
zfs_arc_dnode_limit 0
zfs_arc_dnode_limit_percent 10
zfs_arc_dnode_reduce_percent 10
zfs_arc_grow_retry 0
zfs_arc_lotsfree_percent 10
zfs_arc_max 34359738368
zfs_arc_meta_adjust_restarts 4096
zfs_arc_meta_limit 0
zfs_arc_meta_limit_percent 75
zfs_arc_meta_min 0
zfs_arc_meta_prune 10000
zfs_arc_meta_strategy 1
zfs_arc_min 1073741824
zfs_arc_min_prefetch_lifespan 0
zfs_arc_p_aggressive_disable 1
zfs_arc_pc_percent 0
zfs_arc_p_dampener_disable 1
zfs_arc_p_min_shift 0
zfs_arc_shrink_shift 0
zfs_arc_sys_free 0
zfs_autoimport_disable 1
zfs_compressed_arc_enabled 1
zfs_dbgmsg_enable 0
zfs_dbgmsg_maxsize 4194304
zfs_dbuf_state_index 0
zfs_deadman_checktime_ms 5000
zfs_deadman_enabled 1
zfs_deadman_synctime_ms 1000000
zfs_dedup_prefetch 0
zfs_delay_min_dirty_percent 60
zfs_delay_scale 500000
zfs_delete_blocks 20480
zfs_dirty_data_max 4294967296
zfs_dirty_data_max_max 4294967296
zfs_dirty_data_max_max_percent 25
zfs_dirty_data_max_percent 10
zfs_dirty_data_sync 67108864
zfs_dmu_offset_next_sync 0
zfs_expire_snapshot 300
zfs_flags 0
zfs_free_bpobj_enabled 1
zfs_free_leak_on_eio 0
zfs_free_max_blocks 100000
zfs_free_min_time_ms 1000
zfs_immediate_write_sz 32768
zfs_max_recordsize 1048576
zfs_mdcomp_disable 0
zfs_metaslab_fragmentation_threshold 70
zfs_metaslab_segment_weight_enabled 1
zfs_metaslab_switch_threshold 2
zfs_mg_fragmentation_threshold 85
zfs_mg_noalloc_threshold 0
zfs_multihost_fail_intervals 5
zfs_multihost_history 0
zfs_multihost_import_intervals 10
zfs_multihost_interval 1000
zfs_multilist_num_sublists 0
zfs_nocacheflush 0
zfs_nopwrite_enabled 1
zfs_no_scrub_io 0
zfs_no_scrub_prefetch 0
zfs_object_mutex_size 64
zfs_pd_bytes_max 52428800
zfs_per_txg_dirty_frees_percent 30
zfs_prefetch_disable 1
zfs_read_chunk_size 1048576
zfs_read_history 0
zfs_read_history_hits 0
zfs_recover 0
zfs_resilver_delay 2
zfs_resilver_min_time_ms 3000
zfs_scan_idle 50
zfs_scan_min_time_ms 1000
zfs_scrub_delay 4
zfs_send_corrupt_data 0
zfs_sync_pass_deferred_free 2
zfs_sync_pass_dont_compress 5
zfs_sync_pass_rewrite 2
zfs_sync_taskq_batch_pct 75
zfs_top_maxinflight 32
zfs_txg_history 0
zfs_txg_timeout 5
zfs_vdev_aggregation_limit 131072
zfs_vdev_async_read_max_active 3
zfs_vdev_async_read_min_active 1
zfs_vdev_async_write_active_max_dirty_percent 60
zfs_vdev_async_write_active_min_dirty_percent 30
zfs_vdev_async_write_max_active 10
zfs_vdev_async_write_min_active 2
zfs_vdev_cache_bshift 16
zfs_vdev_cache_max 16384
zfs_vdev_cache_size 0
zfs_vdev_max_active 1000
zfs_vdev_mirror_non_rotating_inc 0
zfs_vdev_mirror_non_rotating_seek_inc 1
zfs_vdev_mirror_rotating_inc 0
zfs_vdev_mirror_rotating_seek_inc 5
zfs_vdev_mirror_rotating_seek_offset 1048576
zfs_vdev_queue_depth_pct 1000
zfs_vdev_raidz_impl [fastest] original scalar sse2 ssse3 avx2
zfs_vdev_read_gap_limit 32768
zfs_vdev_scheduler noop
zfs_vdev_scrub_max_active 2
zfs_vdev_scrub_min_active 1
zfs_vdev_sync_read_max_active 10
zfs_vdev_sync_read_min_active 10
zfs_vdev_sync_write_max_active 10
zfs_vdev_sync_write_min_active 10
zfs_vdev_write_gap_limit 4096
zfs_zevent_cols 80
zfs_zevent_console 0
zfs_zevent_len_max 256
zil_replay_disable 0
zil_slog_bulk 786432
zio_delay_max 30000
zio_dva_throttle_enabled 1
zio_requeue_io_start_cut_in_line 1
zio_taskq_batch_pct 75
zvol_inhibit_dev 0
zvol_major 230
zvol_max_discard_blocks 16384
zvol_prefetch_bytes 131072
zvol_request_sync 0
zvol_threads 32
zvol_volmode 1

Thanks to anybody who could try to help as I'm right now in very bad situation with backups.

EDIT: task trace:

[  363.169333] INFO: task txg_sync:777 blocked for more than 120 seconds.
[  363.169376]       Tainted: P           O    4.13.13-2-pve #1
[  363.169407] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  363.169438] txg_sync        D    0   777      2 0x00000000
[  363.169440] Call Trace:
[  363.169447]  __schedule+0x3cc/0x860
[  363.169449]  schedule+0x36/0x80
[  363.169452]  io_schedule+0x16/0x40
[  363.169459]  cv_wait_common+0xb2/0x140 [spl]
[  363.169462]  ? wait_woken+0x80/0x80
[  363.169466]  __cv_wait_io+0x18/0x20 [spl]
[  363.169510]  zio_wait+0xfd/0x1b0 [zfs]
[  363.169537]  dsl_pool_sync+0xb8/0x440 [zfs]
[  363.169568]  spa_sync+0x42d/0xdb0 [zfs]
[  363.169630]  txg_sync_thread+0x2d4/0x4a0 [zfs]
[  363.169657]  ? txg_quiesce_thread+0x3f0/0x3f0 [zfs]
[  363.169660]  thread_generic_wrapper+0x72/0x80 [spl]
[  363.169662]  kthread+0x109/0x140
[  363.169664]  ? __thread_exit+0x20/0x20 [spl]
[  363.169666]  ? kthread_create_on_node+0x70/0x70
[  363.169666]  ? kthread_create_on_node+0x70/0x70
[  363.169668]  ret_from_fork+0x25/0x30
[  484.004628] INFO: task txg_sync:777 blocked for more than 120 seconds.
[  484.004682]       Tainted: P           O    4.13.13-2-pve #1
[  484.004717] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  484.004763] txg_sync        D    0   777      2 0x00000000
[  484.004766] Call Trace:
[  484.004775]  __schedule+0x3cc/0x860
[  484.004778]  schedule+0x36/0x80
[  484.004782]  io_schedule+0x16/0x40
[  484.004792]  cv_wait_common+0xb2/0x140 [spl]
[  484.004796]  ? wait_woken+0x80/0x80
[  484.004802]  __cv_wait_io+0x18/0x20 [spl]
[  484.004891]  zio_wait+0xfd/0x1b0 [zfs]
[  484.004944]  dsl_pool_sync+0xb8/0x440 [zfs]
[  484.005016]  spa_sync+0x42d/0xdb0 [zfs]
[  484.005076]  txg_sync_thread+0x2d4/0x4a0 [zfs]
[  484.005120]  ? txg_quiesce_thread+0x3f0/0x3f0 [zfs]
[  484.005125]  thread_generic_wrapper+0x72/0x80 [spl]
[  484.005128]  kthread+0x109/0x140
[  484.005133]  ? __thread_exit+0x20/0x20 [spl]
[  484.005134]  ? kthread_create_on_node+0x70/0x70
[  484.005136]  ? kthread_create_on_node+0x70/0x70
[  484.005138]  ret_from_fork+0x25/0x30
[  604.835362] INFO: task txg_sync:777 blocked for more than 120 seconds.
[  604.835391]       Tainted: P           O    4.13.13-2-pve #1
[  604.835410] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  604.835436] txg_sync        D    0   777      2 0x00000000
[  604.835438] Call Trace:
[  604.835445]  __schedule+0x3cc/0x860
[  604.835448]  schedule+0x36/0x80
[  604.835451]  io_schedule+0x16/0x40
[  604.835458]  cv_wait_common+0xb2/0x140 [spl]
[  604.835462]  ? wait_woken+0x80/0x80
[  604.835466]  __cv_wait_io+0x18/0x20 [spl]
[  604.835509]  zio_wait+0xfd/0x1b0 [zfs]
[  604.835539]  dsl_pool_sync+0xb8/0x440 [zfs]
[  604.835571]  spa_sync+0x42d/0xdb0 [zfs]
[  604.835620]  txg_sync_thread+0x2d4/0x4a0 [zfs]
[  604.835669]  ? txg_quiesce_thread+0x3f0/0x3f0 [zfs]
[  604.835673]  thread_generic_wrapper+0x72/0x80 [spl]
[  604.835674]  kthread+0x109/0x140
[  604.835677]  ? __thread_exit+0x20/0x20 [spl]
[  604.835678]  ? kthread_create_on_node+0x70/0x70
[  604.835679]  ? kthread_create_on_node+0x70/0x70
[  604.835681]  ret_from_fork+0x25/0x30
[  725.663824] INFO: task txg_sync:777 blocked for more than 120 seconds.
[  725.663850]       Tainted: P           O    4.13.13-2-pve #1
[  725.663885] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  725.663911] txg_sync        D    0   777      2 0x00000000
[  725.663914] Call Trace:
[  725.663921]  __schedule+0x3cc/0x860
[  725.663923]  schedule+0x36/0x80
[  725.663926]  io_schedule+0x16/0x40
[  725.663934]  cv_wait_common+0xb2/0x140 [spl]
[  725.663937]  ? wait_woken+0x80/0x80
[  725.663941]  __cv_wait_io+0x18/0x20 [spl]
[  725.663988]  zio_wait+0xfd/0x1b0 [zfs]
[  725.664016]  dsl_pool_sync+0xb8/0x440 [zfs]
[  725.664064]  spa_sync+0x42d/0xdb0 [zfs]
[  725.664116]  txg_sync_thread+0x2d4/0x4a0 [zfs]
[  725.664162]  ? txg_quiesce_thread+0x3f0/0x3f0 [zfs]
[  725.664165]  thread_generic_wrapper+0x72/0x80 [spl]
[  725.664167]  kthread+0x109/0x140
[  725.664170]  ? __thread_exit+0x20/0x20 [spl]
[  725.664172]  ? kthread_create_on_node+0x70/0x70
[  725.664172]  ? kthread_create_on_node+0x70/0x70
[  725.664174]  ret_from_fork+0x25/0x30
[  846.494574] INFO: task txg_sync:777 blocked for more than 120 seconds.
[  846.494600]       Tainted: P           O    4.13.13-2-pve #1
[  846.494619] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  846.494644] txg_sync        D    0   777      2 0x00000000
[  846.494646] Call Trace:
[  846.494653]  __schedule+0x3cc/0x860
[  846.494655]  schedule+0x36/0x80
[  846.494658]  io_schedule+0x16/0x40
[  846.494666]  cv_wait_common+0xb2/0x140 [spl]
[  846.494669]  ? wait_woken+0x80/0x80
[  846.494674]  __cv_wait_io+0x18/0x20 [spl]
[  846.494740]  zio_wait+0xfd/0x1b0 [zfs]
[  846.494773]  dsl_pool_sync+0xb8/0x440 [zfs]
[  846.494824]  spa_sync+0x42d/0xdb0 [zfs]
[  846.494875]  txg_sync_thread+0x2d4/0x4a0 [zfs]
[  846.494921]  ? txg_quiesce_thread+0x3f0/0x3f0 [zfs]
[  846.494925]  thread_generic_wrapper+0x72/0x80 [spl]
[  846.494926]  kthread+0x109/0x140
[  846.494929]  ? __thread_exit+0x20/0x20 [spl]
[  846.494930]  ? kthread_create_on_node+0x70/0x70
[  846.494931]  ? kthread_create_on_node+0x70/0x70
[  846.494933]  ret_from_fork+0x25/0x30
[  967.324697] INFO: task txg_sync:777 blocked for more than 120 seconds.
[  967.324726]       Tainted: P           O    4.13.13-2-pve #1
[  967.324745] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  967.324770] txg_sync        D    0   777      2 0x00000000
[  967.324772] Call Trace:
[  967.324779]  __schedule+0x3cc/0x860
[  967.324782]  schedule+0x36/0x80
[  967.324785]  io_schedule+0x16/0x40
[  967.324794]  cv_wait_common+0xb2/0x140 [spl]
[  967.324797]  ? wait_woken+0x80/0x80
[  967.324801]  __cv_wait_io+0x18/0x20 [spl]
[  967.324851]  zio_wait+0xfd/0x1b0 [zfs]
[  967.324883]  dsl_pool_sync+0xb8/0x440 [zfs]
[  967.324918]  spa_sync+0x42d/0xdb0 [zfs]
[  967.324986]  txg_sync_thread+0x2d4/0x4a0 [zfs]
[  967.325033]  ? txg_quiesce_thread+0x3f0/0x3f0 [zfs]
[  967.325037]  thread_generic_wrapper+0x72/0x80 [spl]
[  967.325039]  kthread+0x109/0x140
[  967.325042]  ? __thread_exit+0x20/0x20 [spl]
[  967.325043]  ? kthread_create_on_node+0x70/0x70
[  967.325043]  ? kthread_create_on_node+0x70/0x70
[  967.325045]  ret_from_fork+0x25/0x30
[  987.641504] CIFS VFS: Free previous auth_key.response = ffff9747dba4c0c0
[ 1088.154668] INFO: task txg_sync:777 blocked for more than 120 seconds.
[ 1088.154720]       Tainted: P           O    4.13.13-2-pve #1
[ 1088.154756] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1088.154805] txg_sync        D    0   777      2 0x00000000
[ 1088.154809] Call Trace:
[ 1088.154820]  __schedule+0x3cc/0x860
[ 1088.154824]  schedule+0x36/0x80
[ 1088.154829]  io_schedule+0x16/0x40
[ 1088.154842]  cv_wait_common+0xb2/0x140 [spl]
[ 1088.154846]  ? wait_woken+0x80/0x80
[ 1088.154855]  __cv_wait_io+0x18/0x20 [spl]
[ 1088.154976]  zio_wait+0xfd/0x1b0 [zfs]
[ 1088.155000]  dsl_pool_sync+0xb8/0x440 [zfs]
[ 1088.155028]  spa_sync+0x42d/0xdb0 [zfs]
[ 1088.155057]  txg_sync_thread+0x2d4/0x4a0 [zfs]
[ 1088.155099]  ? txg_quiesce_thread+0x3f0/0x3f0 [zfs]
[ 1088.155102]  thread_generic_wrapper+0x72/0x80 [spl]
[ 1088.155103]  kthread+0x109/0x140
[ 1088.155106]  ? __thread_exit+0x20/0x20 [spl]
[ 1088.155107]  ? kthread_create_on_node+0x70/0x70
[ 1088.155108]  ? kthread_create_on_node+0x70/0x70
[ 1088.155109]  ret_from_fork+0x25/0x30
[ 1208.988501] INFO: task txg_sync:777 blocked for more than 120 seconds.
[ 1208.988528]       Tainted: P           O    4.13.13-2-pve #1
[ 1208.988544] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1208.988566] txg_sync        D    0   777      2 0x00000000
[ 1208.988568] Call Trace:
[ 1208.988592]  __schedule+0x3cc/0x860
[ 1208.988595]  schedule+0x36/0x80
[ 1208.988598]  io_schedule+0x16/0x40
[ 1208.988605]  cv_wait_common+0xb2/0x140 [spl]
[ 1208.988608]  ? wait_woken+0x80/0x80
[ 1208.988612]  __cv_wait_io+0x18/0x20 [spl]
[ 1208.988670]  zio_wait+0xfd/0x1b0 [zfs]
[ 1208.988696]  dsl_pool_sync+0xb8/0x440 [zfs]
[ 1208.988727]  spa_sync+0x42d/0xdb0 [zfs]
[ 1208.988799]  txg_sync_thread+0x2d4/0x4a0 [zfs]
[ 1208.988839]  ? txg_quiesce_thread+0x3f0/0x3f0 [zfs]
[ 1208.988844]  thread_generic_wrapper+0x72/0x80 [spl]
[ 1208.988846]  kthread+0x109/0x140
[ 1208.988850]  ? __thread_exit+0x20/0x20 [spl]
[ 1208.988852]  ? kthread_create_on_node+0x70/0x70
[ 1208.988853]  ? kthread_create_on_node+0x70/0x70
[ 1208.988855]  ret_from_fork+0x25/0x30
[ 1329.821044] INFO: task txg_sync:777 blocked for more than 120 seconds.
[ 1329.821089]       Tainted: P           O    4.13.13-2-pve #1
[ 1329.821122] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1329.821167] txg_sync        D    0   777      2 0x00000000
[ 1329.821170] Call Trace:
[ 1329.821179]  __schedule+0x3cc/0x860
[ 1329.821182]  schedule+0x36/0x80
[ 1329.821186]  io_schedule+0x16/0x40
[ 1329.821196]  cv_wait_common+0xb2/0x140 [spl]
[ 1329.821200]  ? wait_woken+0x80/0x80
[ 1329.821224]  __cv_wait_io+0x18/0x20 [spl]
[ 1329.821328]  zio_wait+0xfd/0x1b0 [zfs]
[ 1329.821382]  dsl_pool_sync+0xb8/0x440 [zfs]
[ 1329.821443]  spa_sync+0x42d/0xdb0 [zfs]
[ 1329.821490]  txg_sync_thread+0x2d4/0x4a0 [zfs]
[ 1329.821535]  ? txg_quiesce_thread+0x3f0/0x3f0 [zfs]
[ 1329.821541]  thread_generic_wrapper+0x72/0x80 [spl]
[ 1329.821544]  kthread+0x109/0x140
[ 1329.821549]  ? __thread_exit+0x20/0x20 [spl]
[ 1329.821550]  ? kthread_create_on_node+0x70/0x70
[ 1329.821552]  ? kthread_create_on_node+0x70/0x70
[ 1329.821554]  ret_from_fork+0x25/0x30
[ 1450.652725] INFO: task txg_sync:777 blocked for more than 120 seconds.
[ 1450.652754]       Tainted: P           O    4.13.13-2-pve #1
[ 1450.652772] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1450.652797] txg_sync        D    0   777      2 0x00000000
[ 1450.652799] Call Trace:
[ 1450.652807]  __schedule+0x3cc/0x860
[ 1450.652809]  schedule+0x36/0x80
[ 1450.652812]  io_schedule+0x16/0x40
[ 1450.652820]  cv_wait_common+0xb2/0x140 [spl]
[ 1450.652822]  ? wait_woken+0x80/0x80
[ 1450.652827]  __cv_wait_io+0x18/0x20 [spl]
[ 1450.652876]  zio_wait+0xfd/0x1b0 [zfs]
[ 1450.652907]  dsl_pool_sync+0xb8/0x440 [zfs]
[ 1450.652940]  spa_sync+0x42d/0xdb0 [zfs]
[ 1450.653010]  txg_sync_thread+0x2d4/0x4a0 [zfs]
[ 1450.653067]  ? txg_quiesce_thread+0x3f0/0x3f0 [zfs]
[ 1450.653070]  thread_generic_wrapper+0x72/0x80 [spl]
[ 1450.653072]  kthread+0x109/0x140
[ 1450.653075]  ? __thread_exit+0x20/0x20 [spl]
[ 1450.653076]  ? kthread_create_on_node+0x70/0x70
[ 1450.653076]  ? kthread_create_on_node+0x70/0x70
[ 1450.653078]  ret_from_fork+0x25/0x30

@thoro
Copy link

thoro commented Apr 20, 2018

Hey,

just run into exactly the same issue, I'm just doing a zfs receive on one side:

mbuffer -s 128k -m 4G -I 1337 | zfs receive -s storage/creamdata-stream01

and on the other side:

zfs send -p -c storage/creamdata-stream02@20180418_2 | mbuffer -s 128k -m 1G -O 10.0.13.8:1337

It happened reproducable twice with the same snapshot after about 39 GB (35 actually on receiving side)

txg_sync, txg_quiesce and receive_writer are all running at 100%,60%,50 % cpu load respectivly.

The pool is created with ashift=12 , it's running on a single SSD because of the compression feature.

it's creating a lot of transactions and nothing is moving forward.

Call Trace:

Apr 20 14:01:33 at-host-05 kernel: INFO: task zfs:5917 blocked for more than 120 seconds.
Apr 20 14:01:33 at-host-05 kernel:      Tainted: P          IOE   4.12.5 #1
Apr 20 14:01:33 at-host-05 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 20 14:01:33 at-host-05 kernel: zfs             D    0  5917   5600 0x00000086
Apr 20 14:01:33 at-host-05 kernel: Call Trace:
Apr 20 14:01:33 at-host-05 kernel: __schedule+0x28a/0x880
Apr 20 14:01:33 at-host-05 kernel: schedule+0x36/0x80
Apr 20 14:01:33 at-host-05 kernel: cv_wait_common+0x101/0x140 [spl]
Apr 20 14:01:33 at-host-05 kernel: ? remove_wait_queue+0x60/0x60
Apr 20 14:01:33 at-host-05 kernel: __cv_wait+0x15/0x20 [spl]
Apr 20 14:01:33 at-host-05 kernel: bqueue_enqueue+0x62/0xf0 [zfs]
Apr 20 14:01:33 at-host-05 kernel: dmu_recv_stream+0x696/0xe40 [zfs]
Apr 20 14:01:33 at-host-05 kernel: ? zfs_set_prop_nvlist+0x31c/0x4f0 [zfs]
Apr 20 14:01:33 at-host-05 kernel: ? spl_vmem_alloc+0x19/0x20 [spl]
Apr 20 14:01:33 at-host-05 kernel: ? nv_free_spl+0x14/0x20 [znvpair]
Apr 20 14:01:33 at-host-05 kernel: zfs_ioc_recv_impl+0x2c5/0x1090 [zfs]
Apr 20 14:01:33 at-host-05 kernel: ? dbuf_hold_impl+0xb2/0xd0 [zfs]
Apr 20 14:01:33 at-host-05 kernel: ? __kmalloc_node+0x191/0x260
Apr 20 14:01:33 at-host-05 kernel: zfs_ioc_recv_new+0x369/0x3f0 [zfs]
Apr 20 14:01:33 at-host-05 kernel: ? spl_kmem_alloc_impl+0xc9/0x170 [spl]
Apr 20 14:01:33 at-host-05 kernel: ? spl_vmem_alloc+0x19/0x20 [spl]
Apr 20 14:01:33 at-host-05 kernel: ? nv_alloc_sleep_spl+0x1f/0x30 [znvpair]
Apr 20 14:01:33 at-host-05 kernel: ? nv_mem_zalloc.isra.12+0x2a/0x40 [znvpair]
Apr 20 14:01:33 at-host-05 kernel: ? nvlist_xalloc.part.13+0x5f/0xc0 [znvpair]
Apr 20 14:01:33 at-host-05 kernel: zfsdev_ioctl+0x1d6/0x620 [zfs]
Apr 20 14:01:33 at-host-05 kernel: do_vfs_ioctl+0xa7/0x5e0
Apr 20 14:01:33 at-host-05 kernel: ? getnstimeofday64+0xe/0x20
Apr 20 14:01:33 at-host-05 kernel: SyS_ioctl+0x79/0x90
Apr 20 14:01:33 at-host-05 kernel: do_syscall_64+0x67/0x150
Apr 20 14:01:33 at-host-05 kernel: entry_SYSCALL64_slow_path+0x25/0x25
Apr 20 14:01:33 at-host-05 kernel: RIP: 0033:0x7f7397d74107
Apr 20 14:01:33 at-host-05 kernel: RSP: 002b:00007ffe378de258 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Apr 20 14:01:33 at-host-05 kernel: RAX: ffffffffffffffda RBX: 000000000000000c RCX: 00007f7397d74107
Apr 20 14:01:33 at-host-05 kernel: RDX: 00007ffe378de290 RSI: 0000000000005a46 RDI: 0000000000000005
Apr 20 14:01:33 at-host-05 kernel: RBP: 00007ffe378de290 R08: ffffffffffffffff R09: 0000000000020000
Apr 20 14:01:33 at-host-05 kernel: R10: 0000000000000022 R11: 0000000000000246 R12: 00007ffe378e18e0
Apr 20 14:01:33 at-host-05 kernel: R13: 0000000000000005 R14: 0000000000005a46 R15: 00000000008b8960
Apr 20 14:03:36 at-host-05 kernel: INFO: task zfs:5917 blocked for more than 120 seconds.
Apr 20 14:03:36 at-host-05 kernel:      Tainted: P          IOE   4.12.5 #1
Apr 20 14:03:36 at-host-05 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 20 14:03:36 at-host-05 kernel: zfs             D    0  5917   5600 0x00000086
Apr 20 14:03:36 at-host-05 kernel: Call Trace:
Apr 20 14:03:36 at-host-05 kernel: __schedule+0x28a/0x880
Apr 20 14:03:36 at-host-05 kernel: schedule+0x36/0x80
Apr 20 14:03:36 at-host-05 kernel: cv_wait_common+0x101/0x140 [spl]
Apr 20 14:03:36 at-host-05 kernel: ? remove_wait_queue+0x60/0x60
Apr 20 14:03:36 at-host-05 kernel: __cv_wait+0x15/0x20 [spl]
Apr 20 14:03:36 at-host-05 kernel: bqueue_enqueue+0x62/0xf0 [zfs]
Apr 20 14:03:36 at-host-05 kernel: dmu_recv_stream+0x696/0xe40 [zfs]
Apr 20 14:03:36 at-host-05 kernel: ? zfs_set_prop_nvlist+0x31c/0x4f0 [zfs]
Apr 20 14:03:36 at-host-05 kernel: ? spl_vmem_alloc+0x19/0x20 [spl]
Apr 20 14:03:36 at-host-05 kernel: ? nv_free_spl+0x14/0x20 [znvpair]
Apr 20 14:03:36 at-host-05 kernel: zfs_ioc_recv_impl+0x2c5/0x1090 [zfs]
Apr 20 14:03:36 at-host-05 kernel: ? dbuf_hold_impl+0xb2/0xd0 [zfs]
Apr 20 14:03:36 at-host-05 kernel: ? __kmalloc_node+0x191/0x260
Apr 20 14:03:36 at-host-05 kernel: zfs_ioc_recv_new+0x369/0x3f0 [zfs]
Apr 20 14:03:36 at-host-05 kernel: ? spl_kmem_alloc_impl+0xc9/0x170 [spl]
Apr 20 14:03:36 at-host-05 kernel: ? spl_vmem_alloc+0x19/0x20 [spl]
Apr 20 14:03:36 at-host-05 kernel: ? nv_alloc_sleep_spl+0x1f/0x30 [znvpair]
Apr 20 14:03:36 at-host-05 kernel: ? nv_mem_zalloc.isra.12+0x2a/0x40 [znvpair]
Apr 20 14:03:36 at-host-05 kernel: ? nvlist_xalloc.part.13+0x5f/0xc0 [znvpair]
Apr 20 14:03:36 at-host-05 kernel: zfsdev_ioctl+0x1d6/0x620 [zfs]
Apr 20 14:03:36 at-host-05 kernel: do_vfs_ioctl+0xa7/0x5e0
Apr 20 14:03:36 at-host-05 kernel: ? getnstimeofday64+0xe/0x20
Apr 20 14:03:36 at-host-05 kernel: SyS_ioctl+0x79/0x90
Apr 20 14:03:36 at-host-05 kernel: do_syscall_64+0x67/0x150
Apr 20 14:03:36 at-host-05 kernel: entry_SYSCALL64_slow_path+0x25/0x25
Apr 20 14:03:36 at-host-05 kernel: RIP: 0033:0x7f7397d74107
Apr 20 14:03:36 at-host-05 kernel: RSP: 002b:00007ffe378de258 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Apr 20 14:03:36 at-host-05 kernel: RAX: ffffffffffffffda RBX: 000000000000000c RCX: 00007f7397d74107
Apr 20 14:03:36 at-host-05 kernel: RDX: 00007ffe378de290 RSI: 0000000000005a46 RDI: 0000000000000005
Apr 20 14:03:36 at-host-05 kernel: RBP: 00007ffe378de290 R08: ffffffffffffffff R09: 0000000000020000
Apr 20 14:03:36 at-host-05 kernel: R10: 0000000000000022 R11: 0000000000000246 R12: 00007ffe378e18e0
Apr 20 14:03:36 at-host-05 kernel: R13: 0000000000000005 R14: 0000000000005a46 R15: 00000000008b8960

ZFS 0.7.8
SPL 0.7.8
Kernel Linux at-host-05 4.12.5 #1 SMP Fri Aug 11 09:55:48 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

The pool was created with 4.9.31 and ZFS 0.7.0-rc4 -> I will now try to recreate it with 4.12.5 and 0.7.8

Drop_caches or increading arc_min did nothing

Update: this seems to be somehow connected to the snapshot - or the target disk , because I have already copied this one snapshot multiple times to other servers without an issue - same kernel and same zfs version!

Update 2: also produceable with kernel 4.12.5 and 4.16.3 and zfs 7.8.0

I now watched very closly, first it starts to print out 0 KB/s for some time, then there's quick bursts of ~600 MB/s and then again 0 MB/s, while zpool iostat shows ~30 MB/s until it reaches the 35 GB mark -> then it's dead ...

Interestingly I can get over this part with the resume (zfs -t xxx send) functionality - that one stalled at ~41 GB transferred

Update 3: now tried without the -c and -p flags -> ran through like it should, could there be a bug in that code part?

@h1z1
Copy link

h1z1 commented Apr 20, 2018

Given this goes back to 2015 and I've hit what I think is it in 0.7.8, can anyone else confirm this was ever resolved or are we all doing the same dumb thing to trigger it? The default zfs_arc_min appears to still be 0 (if that was the cause?).

@stewartadam
Copy link

@behlendorf I'm about to have to do a large transfer again so will likely repro this. Is there any data you'd like me to collect during such a hang, or should I just test for the min size that unblocks the transfer?

@dweeezil
Copy link
Contributor

This issue appears to have suffered sufficient creep that it might be better to close it and to re-open more specific PRs for anything still outstanding. The original issues appeared to have been related to ARC collapse and were apparently solved by increasing zfs_arc_min.

@jkalousek The non-zero dmu_tx_quota indicates the stalls might have to do with approaching a quota or an almost-full pool. Also, your setting of zfs_arc_min to 1GiB was actually lowering it (see below) because your system has 64GiB of memory.

I'll note that as of the 0.7.2 release, the minimum ARC size was changed from 32MiB to the larger of 32MiB or 1/32 the system memory.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

16 participants