Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix inflated quiesce time caused by lwb_tx during zil_commit(). #12321

Merged
merged 3 commits into from
May 26, 2022

Conversation

jxdking
Copy link
Contributor

@jxdking jxdking commented Jul 3, 2021

Motivation and Context

In current zil_commit() process, transaction lwb_tx is assigned in
zil_lwb_write_issue(), and is committed in zil_lwb_flush_vdevs_done().
Thus, during lwb write out process, the txg is held in open or quiesing
state, until zil_lwb_flush_vdevs_done() is called. If the zil's zio
latency is high, it will cause txg_sync_thread() to starve.

You can verify this quiesce-time degradation with sync=always on your dataset, and observe txg history.

Description

The goal here is to defer waiting for zil_lwb_flush_vdevs_done to the 'syncing' txg state. That is, in zil_sync().

In this patch, it achieves the the goal without holding transaction.
A new function zil_lwb_flush_wait_all() is introduced. It waits for
the completion of all the zil_lwb_flush_vdevs_done() by given txg.

How Has This Been Tested?

Tested with fio and zfs set sync=always
The improvement depends your slog device speed. The slower slog device is, the more improvement you will see.
After this patch there is no quiesce-time degradation visible after set sync=always.

Here is my test with spinning disk as slog, to show difference of the quiesce time.

with patch

txg      birth            state ndirty       nread        nwritten     reads    writes   otime        qtime        wtime        stime
722      3819311759383    C     0            3072         1033216      2        53       199546499    5026         88637        120417478
723      3819511305882    C     0            0            0            0        0        126672837    1683         17886        106016
724      3819637978719    C     0            1024         1025536      1        53       13947365     1288         36773        112106800
725      3819651926084    C     0            0            0            0        0        118113533    1597         10495        91106
726      3819770039617    C     0            1024         103424       1        38       5095744305   27835        145215       102824714
727      3824865783922    C     0            0            0            0        0        5120053917   4375         178995       417700
728      3829985837839    C     0            0            0            0        0        5119996727   4435         177865       469305
729      3835105834566    C     54018048     307200       139350528    9        858      1469141722   37584        28793        2172787838
730      3836574976288    C     89800704     2048         218882048    2        1276     2173390655   1138         31007        2134700931
731      3838748366943    C     133840896    1024         270976512    1        1463     2134744982   1139         6894         2932904532
732      3840883111925    C     142098432    1024         274039808    1        1414     2932928067   1547         32078        2641837091
733      3843816039992    C     131743744    1024         269938176    1        1447     2641883713   1305         15513        2774043735
734      3846457923705    C     146554880    2048         403771904    1        2571     2774076421   1547         17621        4987809631
735      3849232000126    C     255606784    1536         509530624    1        2865     4987836594   855          12714        8616763778
736      3854219836720    C     253771776    2048         509279232    1        2833     8616787556   986          19922        8308391114
737      3862836624276    C     253247488    1536         498496000    1        2783     8308421970   936          15241        8632259611
738      3871145046246    C     247087104    2048         510252544    1        2915     8632288585   1332         16683        8299059261
739      3879777334831    C     261898240    1024         508218880    1        2753     8299089281   1385         10823        6602355938
740      3888076424112    S     0            0            0            0        0        6602379488   1730         10614        0
741      3894678803600    O     0            0            0            0        0        0            0            0            0

without patch

txg      birth            state ndirty       nread        nwritten     reads    writes   otime        qtime        wtime        stime
789      4102835274754    C     0            2048         1057792      1        65       57904525     1450         37147        97630298
790      4102893179279    C     0            0            0            0        0        106901941    1476         12980        104187
791      4103000081220    C     0            0            1020928      0        49       12889766     857          10118        62213896
792      4103012970986    C     0            0            0            0        0        68551739     1607         36512        80945
793      4103081522725    C     2768896      0            1898496      0        85       4845745890   7530077      19759        196281535
794      4107927268615    C     53886976     192000       112074752    5        613      657526958    26952709     31097        1943359870
795      4108584795573    C     64897024     75776        134765568    4        717      1970413166   44824232     61839        1039096626
796      4110555208739    C     76955648     0            196256768    0        1171     1083993186   31569824     49449        2383267018
797      4111639201925    C     126631936    0            263712256    0        1406     2414895225   33835725     25244        2539729055
798      4114054097150    C     143278080    0            276193280    0        1468     2573601631   60943307     32181        2759894703
799      4116627698781    C     141180928    3072         320628736    1        1850     2820975839   36787516     30533        3222832141
800      4119448674620    C     185876480    0            440704512    0        2639     3259674596   31768175     45426        7350388640
801      4122708349216    C     261242880    4608         520176640    2        2878     7382218476   1369         16087        8646623611
802      4130090567692    C     260456448    0            495588864    0        2692     8646657587   1782         22131        5619056915
803      4138737225279    C     235945984    2560         502074368    1        2919     5619092136   1011         18622        8073224226
804      4144356317415    C     263733248    0            508206592    0        2835     8073272065   1444         31298        6019223002
805      4152429589480    C     244465664    2560         499199488    1        2818     6019264869   2831552      50281        8347371827
806      4158448854349    C     259670016    1536         493344768    1        2642     8350275168   1579         22592        6446452458
807      4166799129517    S     0            0            0            0        0        6446485932   1057         17744        0
808      4173245615449    O     0            0            0            0        0        0            0            0            0

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

@jxdking jxdking changed the title Fix long quiesce time caused by lwb_tx during zil_commit(). Fix inflated quiesce time caused by lwb_tx during zil_commit(). Jul 3, 2021
@jxdking jxdking force-pushed the zil-lwb-flush branch 6 times, most recently from 396779e to 53b4f70 Compare July 7, 2021 00:22
@jxdking jxdking marked this pull request as ready for review July 7, 2021 00:25
@ahrens ahrens requested a review from prakashsurya July 9, 2021 04:04
@prakashsurya
Copy link
Member

The improvement depends your slog device speed. The slower slog device is, the more improvement you will see.

Before looking at the code change, just to make sure I'm understanding, the case where this will have a performance improvement, is when the data is able to be written out via spa_sync faster than it's written out via the SLOG? i.e. we issue the LWB write, and that write latency is greater than the latency (would have been) to write that data out via the normal spa_sync code path.. and since we hold the TXG open during the LWB write, we add latency to the spa_sync (which would have otherwise completed)?

* to the next block in the chain, so it's OK to let the txg in
* which we allocated the next block sync.
*/
dmu_tx_commit(tx);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ahrens can you remind me why we chose to hold the TX for the entirety of the LWB write/flush? the comment mentions the next block in the chain.. why can't we let the TXG sync, until this block is flushed? may the TXG sync impact the next block in the chain, (potentially) causing some sort of corruption of the on-disk linked list?

I believe this PR is attempting to commit this TX at the time that we issue the LWB, rather than at the time the LWB write is flushed (current behavior), so I'm trying to think about why (or not) it'd be safe to commit the TX at the time we issue the write.. and trying to remember why we chose (need?) to wait to commit it unitl after the LWB has been flushed.

Copy link
Contributor Author

@jxdking jxdking Jul 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the goal was to ensure that zil_lwb_flush_vdevs_done() is called
before zil_sync(). zil_sync() is called in syncing context. Holding the txg open until zil_lwb_flush_vdevs_done() is called ensure the correct order.
In zil_lwb_flush_vdevs_done(), we set lwb->lwb_buf = NULL, so that zil_sync() can pick this lwb up and remove it. The order of the call is important.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the goal was to ensure that zil_lwb_flush_vdevs_done() is called before zil_sync().

In zil_lwb_flush_vdevs_done(), we set lwb->lwb_buf = NULL, so that zil_sync() can pick this lwb up and remove it.

But if lwb_buf is not NULL, then zil_sync will just skip that LWB (and all later LWBs in the zl_lwb_list).. for now.. in the next sync, it'll try to process zl_lwb_list again, and likely find that it can now be freed, right? meaning it's not critical to prevent zil_sync from being called until after the LWBs are flushed (so there must be another reason for doing this)...?

It's been a little while since I was in this code. so I'll have to refresh my memory..

Copy link
Contributor Author

@jxdking jxdking Jul 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if lwb_buf is not NULL, then zil_sync will just skip that LWB (and all later LWBs in the zl_lwb_list).. for now.. in the next sync, it'll try to process zl_lwb_list again, and likely find that it can now be freed, right? meaning it's not critical to prevent zil_sync from being called until after the LWBs are flushed (so there must be another reason for doing this)...?

It's been a little while since I was in this code. so I'll have to refresh my memory..

I haven't looked in detail of this code path you mentioned above.
However, I tried. I let zil_sync() without waiting zil_lwb_flush_vdevs_done(). zil_close() panics at ASSERT3P(lwb, ==, list_tail(&zilog->zl_lwb_list)).
zil_close() expects all inflight lwbs should be settled after zl_dirty_max_txg or lwb_max_txg is synced.

Another potential issue, in zil_lwb_write_issue(), we call dsl_dataset_dirty(dmu_objset_ds(zilog->zl_os), tx). I believe zil_sync() only is called for dirty dataset in current txg. If we let lwb fly longer, it may not be picked up in next zil_sync() if the dataset is no longer considered as dirty.

@jxdking
Copy link
Contributor Author

jxdking commented Jul 9, 2021

Before looking at the code change, just to make sure I'm understanding, the case where this will have a performance improvement, is when the data is able to be written out via spa_sync faster than it's written out via the SLOG? i.e. we issue the LWB write, and that write latency is greater than the latency (would have been) to write that data out via the normal spa_sync code path.. and since we hold the TXG open during the LWB write, we add latency to the spa_sync (which would have otherwise completed)?

Eh... No.
Current txg_kick(), if there is one txg is quiescing, it implies there is nothing in syncing stage. Before this patch, the latency of slog will be potentially added to quiescing time because of holding lwb_tx. When it is quiescing, it is not syncing.
After this patch, the latency of slog will NOT be added to quiescing time. Thus, I say, the slower the slog device is, the more improvement you will see.

BTW, I always assume the slog device should be faster than main vdev.

@amotin
Copy link
Member

amotin commented Jul 9, 2021

BTW, I always assume the slog device should be faster than main vdev.

SLOG device should have lower latency. It is difficult to have peak bandwidth higher than all main vdevs together, even though everybody would like it.

@sempervictus
Copy link
Contributor

Doesn't every pool now have SLOG metaslabs when no dedicated SLOG is present?
This would have probably saved a pool we recently lost - RAID1 NVME on a PCI card fronting a pool went sour on us, didn't fully die, but one of the RAID1 vols did and the controller apparently doesnt really do IO when a member keels (i'd have used a direct mirror if i could - no such option there). Thing went down to KB/s range and stalled the pool, killing IOs, and being unable to recover the slog after the inevitable crash this caused. If such a diverter valve was in place, it would have written blocks out to the VDEVs instead and might have survived the fall.

@adamdmoss
Copy link
Contributor

FWIW I've been running with this branch merged for 3 or 4 months without a problem. I don't have a dedicated SLOG but I assume this also affects the inline SLOG on the reserved metaslab.

@problame
Copy link
Contributor

problame commented Nov 7, 2021

Remark on the commit message:

The goal here is to ensure that zil_lwb_flush_vdevs_done() is called
before zil_sync(). The cost of the original implimentation mentioned
above is just too high.

I found that pargraph more confusing that helpful.
If I understand this patch correctly, what you're trying to convey is

The goal here is to defer waiting for zil_lwb_flush_vdevs_done to the 'syncing' txg state.
(Edit)
That is, in zil_sync().

@sempervictus
Copy link
Contributor

@adamdmoss - do you have any systems where this patch is present but no dedicated SLOG (root pools and such)? Curious if there are any dangers lurking under the skin when the SLOG blocks are on the only VDEV available (concern being for things like boot from USB media where the block dev underpinning the permanent and SLOG blocks is the same, and slow at that).

@jxdking
Copy link
Contributor Author

jxdking commented Nov 8, 2021

Remark on the commit message:

The goal here is to ensure that zil_lwb_flush_vdevs_done() is called
before zil_sync(). The cost of the original implimentation mentioned
above is just too high.

I found that pargraph more confusing that helpful. If I understand this patch correctly, what you're trying to convey is

The goal here is to defer waiting for zil_lwb_flush_vdevs_done to the 'syncing' txg state.
(Edit)
That is, in zil_sync().

You are correct. I will change the commit message.

@adamdmoss
Copy link
Contributor

@adamdmoss - do you have any systems where this patch is present but no dedicated SLOG (root pools and such)? Curious if there are any dangers lurking under the skin when the SLOG blocks are on the only VDEV available (concern being for things like boot from USB media where the block dev underpinning the permanent and SLOG blocks is the same, and slow at that).

Yeah, I have four pools without dedicated SLOG and with these patches. I don't boot from any of them, FWIW.

@jxdking
Copy link
Contributor Author

jxdking commented Nov 17, 2021

I have revised the commit message and rebased to latest master.

@adamdmoss
Copy link
Contributor

Is this waiting on anything in particular? Just re-reviews?

@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label May 18, 2022
@behlendorf
Copy link
Contributor

Next steps here would be to refresh the patch and get the previous reviewers to approve it.

@sempervictus
Copy link
Contributor

@behlendorf - feel free to tag me as a reviewer if you need an approval: been running this since last year in our builds (grsec & hardened versions).

@behlendorf
Copy link
Contributor

@sempervictus great, thanks for putting this change through its paces!

@behlendorf behlendorf requested review from behlendorf and amotin May 18, 2022 22:59
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
In current zil_commit() process, transaction lwb_tx is assigned in
zil_lwb_write_issue(), and is committed in zil_lwb_flush_vdevs_done().
Thus, during lwb write out process, the txg is held in open or quiesing
state, until zil_lwb_flush_vdevs_done() is called. If the zil's zio
latency is high, it will cause txg_sync_thread() to starve.

The goal here is to defer waiting for zil_lwb_flush_vdevs_done to the
'syncing' txg state. That is, in zil_sync().

In this patch, it achieves the goal without holding transaction.
A new function zil_lwb_flush_wait_all() is introduced. It waits for
the completion of all the zil_lwb_flush_vdevs_done() by given txg.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Signed-off-by: jxdking <lostking2008@hotmail.com>
Closes openzfs#12321
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
In current zil_commit() process, transaction lwb_tx is assigned in
zil_lwb_write_issue(), and is committed in zil_lwb_flush_vdevs_done().
Thus, during lwb write out process, the txg is held in open or quiesing
state, until zil_lwb_flush_vdevs_done() is called. If the zil's zio
latency is high, it will cause txg_sync_thread() to starve.

The goal here is to defer waiting for zil_lwb_flush_vdevs_done to the
'syncing' txg state. That is, in zil_sync().

In this patch, it achieves the goal without holding transaction.
A new function zil_lwb_flush_wait_all() is introduced. It waits for
the completion of all the zil_lwb_flush_vdevs_done() by given txg.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Signed-off-by: jxdking <lostking2008@hotmail.com>
Closes openzfs#12321
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
In current zil_commit() process, transaction lwb_tx is assigned in
zil_lwb_write_issue(), and is committed in zil_lwb_flush_vdevs_done().
Thus, during lwb write out process, the txg is held in open or quiesing
state, until zil_lwb_flush_vdevs_done() is called. If the zil's zio
latency is high, it will cause txg_sync_thread() to starve.

The goal here is to defer waiting for zil_lwb_flush_vdevs_done to the
'syncing' txg state. That is, in zil_sync().

In this patch, it achieves the goal without holding transaction.
A new function zil_lwb_flush_wait_all() is introduced. It waits for
the completion of all the zil_lwb_flush_vdevs_done() by given txg.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Signed-off-by: jxdking <lostking2008@hotmail.com>
Closes openzfs#12321
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
In current zil_commit() process, transaction lwb_tx is assigned in
zil_lwb_write_issue(), and is committed in zil_lwb_flush_vdevs_done().
Thus, during lwb write out process, the txg is held in open or quiesing
state, until zil_lwb_flush_vdevs_done() is called. If the zil's zio
latency is high, it will cause txg_sync_thread() to starve.

The goal here is to defer waiting for zil_lwb_flush_vdevs_done to the
'syncing' txg state. That is, in zil_sync().

In this patch, it achieves the goal without holding transaction.
A new function zil_lwb_flush_wait_all() is introduced. It waits for
the completion of all the zil_lwb_flush_vdevs_done() by given txg.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Signed-off-by: jxdking <lostking2008@hotmail.com>
Closes openzfs#12321
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
In current zil_commit() process, transaction lwb_tx is assigned in
zil_lwb_write_issue(), and is committed in zil_lwb_flush_vdevs_done().
Thus, during lwb write out process, the txg is held in open or quiesing
state, until zil_lwb_flush_vdevs_done() is called. If the zil's zio
latency is high, it will cause txg_sync_thread() to starve.

The goal here is to defer waiting for zil_lwb_flush_vdevs_done to the
'syncing' txg state. That is, in zil_sync().

In this patch, it achieves the goal without holding transaction.
A new function zil_lwb_flush_wait_all() is introduced. It waits for
the completion of all the zil_lwb_flush_vdevs_done() by given txg.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Signed-off-by: jxdking <lostking2008@hotmail.com>
Closes openzfs#12321
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
In current zil_commit() process, transaction lwb_tx is assigned in
zil_lwb_write_issue(), and is committed in zil_lwb_flush_vdevs_done().
Thus, during lwb write out process, the txg is held in open or quiesing
state, until zil_lwb_flush_vdevs_done() is called. If the zil's zio
latency is high, it will cause txg_sync_thread() to starve.

The goal here is to defer waiting for zil_lwb_flush_vdevs_done to the
'syncing' txg state. That is, in zil_sync().

In this patch, it achieves the goal without holding transaction.
A new function zil_lwb_flush_wait_all() is introduced. It waits for
the completion of all the zil_lwb_flush_vdevs_done() by given txg.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Signed-off-by: jxdking <lostking2008@hotmail.com>
Closes openzfs#12321
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
In current zil_commit() process, transaction lwb_tx is assigned in
zil_lwb_write_issue(), and is committed in zil_lwb_flush_vdevs_done().
Thus, during lwb write out process, the txg is held in open or quiesing
state, until zil_lwb_flush_vdevs_done() is called. If the zil's zio
latency is high, it will cause txg_sync_thread() to starve.

The goal here is to defer waiting for zil_lwb_flush_vdevs_done to the
'syncing' txg state. That is, in zil_sync().

In this patch, it achieves the goal without holding transaction.
A new function zil_lwb_flush_wait_all() is introduced. It waits for
the completion of all the zil_lwb_flush_vdevs_done() by given txg.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Signed-off-by: jxdking <lostking2008@hotmail.com>
Closes openzfs#12321
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
In current zil_commit() process, transaction lwb_tx is assigned in
zil_lwb_write_issue(), and is committed in zil_lwb_flush_vdevs_done().
Thus, during lwb write out process, the txg is held in open or quiesing
state, until zil_lwb_flush_vdevs_done() is called. If the zil's zio
latency is high, it will cause txg_sync_thread() to starve.

The goal here is to defer waiting for zil_lwb_flush_vdevs_done to the
'syncing' txg state. That is, in zil_sync().

In this patch, it achieves the goal without holding transaction.
A new function zil_lwb_flush_wait_all() is introduced. It waits for
the completion of all the zil_lwb_flush_vdevs_done() by given txg.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Signed-off-by: jxdking <lostking2008@hotmail.com>
Closes openzfs#12321
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
In current zil_commit() process, transaction lwb_tx is assigned in
zil_lwb_write_issue(), and is committed in zil_lwb_flush_vdevs_done().
Thus, during lwb write out process, the txg is held in open or quiesing
state, until zil_lwb_flush_vdevs_done() is called. If the zil's zio
latency is high, it will cause txg_sync_thread() to starve.

The goal here is to defer waiting for zil_lwb_flush_vdevs_done to the
'syncing' txg state. That is, in zil_sync().

In this patch, it achieves the goal without holding transaction.
A new function zil_lwb_flush_wait_all() is introduced. It waits for
the completion of all the zil_lwb_flush_vdevs_done() by given txg.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Signed-off-by: jxdking <lostking2008@hotmail.com>
Closes openzfs#12321
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
In current zil_commit() process, transaction lwb_tx is assigned in
zil_lwb_write_issue(), and is committed in zil_lwb_flush_vdevs_done().
Thus, during lwb write out process, the txg is held in open or quiesing
state, until zil_lwb_flush_vdevs_done() is called. If the zil's zio
latency is high, it will cause txg_sync_thread() to starve.

The goal here is to defer waiting for zil_lwb_flush_vdevs_done to the
'syncing' txg state. That is, in zil_sync().

In this patch, it achieves the goal without holding transaction.
A new function zil_lwb_flush_wait_all() is introduced. It waits for
the completion of all the zil_lwb_flush_vdevs_done() by given txg.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Signed-off-by: jxdking <lostking2008@hotmail.com>
Closes openzfs#12321
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
In current zil_commit() process, transaction lwb_tx is assigned in
zil_lwb_write_issue(), and is committed in zil_lwb_flush_vdevs_done().
Thus, during lwb write out process, the txg is held in open or quiesing
state, until zil_lwb_flush_vdevs_done() is called. If the zil's zio
latency is high, it will cause txg_sync_thread() to starve.

The goal here is to defer waiting for zil_lwb_flush_vdevs_done to the
'syncing' txg state. That is, in zil_sync().

In this patch, it achieves the goal without holding transaction.
A new function zil_lwb_flush_wait_all() is introduced. It waits for
the completion of all the zil_lwb_flush_vdevs_done() by given txg.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Signed-off-by: jxdking <lostking2008@hotmail.com>
Closes openzfs#12321
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
In current zil_commit() process, transaction lwb_tx is assigned in
zil_lwb_write_issue(), and is committed in zil_lwb_flush_vdevs_done().
Thus, during lwb write out process, the txg is held in open or quiesing
state, until zil_lwb_flush_vdevs_done() is called. If the zil's zio
latency is high, it will cause txg_sync_thread() to starve.

The goal here is to defer waiting for zil_lwb_flush_vdevs_done to the
'syncing' txg state. That is, in zil_sync().

In this patch, it achieves the goal without holding transaction.
A new function zil_lwb_flush_wait_all() is introduced. It waits for
the completion of all the zil_lwb_flush_vdevs_done() by given txg.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Signed-off-by: jxdking <lostking2008@hotmail.com>
Closes openzfs#12321
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
In current zil_commit() process, transaction lwb_tx is assigned in
zil_lwb_write_issue(), and is committed in zil_lwb_flush_vdevs_done().
Thus, during lwb write out process, the txg is held in open or quiesing
state, until zil_lwb_flush_vdevs_done() is called. If the zil's zio
latency is high, it will cause txg_sync_thread() to starve.

The goal here is to defer waiting for zil_lwb_flush_vdevs_done to the
'syncing' txg state. That is, in zil_sync().

In this patch, it achieves the goal without holding transaction.
A new function zil_lwb_flush_wait_all() is introduced. It waits for
the completion of all the zil_lwb_flush_vdevs_done() by given txg.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Signed-off-by: jxdking <lostking2008@hotmail.com>
Closes openzfs#12321
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
In current zil_commit() process, transaction lwb_tx is assigned in
zil_lwb_write_issue(), and is committed in zil_lwb_flush_vdevs_done().
Thus, during lwb write out process, the txg is held in open or quiesing
state, until zil_lwb_flush_vdevs_done() is called. If the zil's zio
latency is high, it will cause txg_sync_thread() to starve.

The goal here is to defer waiting for zil_lwb_flush_vdevs_done to the
'syncing' txg state. That is, in zil_sync().

In this patch, it achieves the goal without holding transaction.
A new function zil_lwb_flush_wait_all() is introduced. It waits for
the completion of all the zil_lwb_flush_vdevs_done() by given txg.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Signed-off-by: jxdking <lostking2008@hotmail.com>
Closes openzfs#12321
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
In current zil_commit() process, transaction lwb_tx is assigned in
zil_lwb_write_issue(), and is committed in zil_lwb_flush_vdevs_done().
Thus, during lwb write out process, the txg is held in open or quiesing
state, until zil_lwb_flush_vdevs_done() is called. If the zil's zio
latency is high, it will cause txg_sync_thread() to starve.

The goal here is to defer waiting for zil_lwb_flush_vdevs_done to the
'syncing' txg state. That is, in zil_sync().

In this patch, it achieves the goal without holding transaction.
A new function zil_lwb_flush_wait_all() is introduced. It waits for
the completion of all the zil_lwb_flush_vdevs_done() by given txg.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Signed-off-by: jxdking <lostking2008@hotmail.com>
Closes openzfs#12321
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
In current zil_commit() process, transaction lwb_tx is assigned in
zil_lwb_write_issue(), and is committed in zil_lwb_flush_vdevs_done().
Thus, during lwb write out process, the txg is held in open or quiesing
state, until zil_lwb_flush_vdevs_done() is called. If the zil's zio
latency is high, it will cause txg_sync_thread() to starve.

The goal here is to defer waiting for zil_lwb_flush_vdevs_done to the
'syncing' txg state. That is, in zil_sync().

In this patch, it achieves the goal without holding transaction.
A new function zil_lwb_flush_wait_all() is introduced. It waits for
the completion of all the zil_lwb_flush_vdevs_done() by given txg.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Signed-off-by: jxdking <lostking2008@hotmail.com>
Closes openzfs#12321
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
In current zil_commit() process, transaction lwb_tx is assigned in
zil_lwb_write_issue(), and is committed in zil_lwb_flush_vdevs_done().
Thus, during lwb write out process, the txg is held in open or quiesing
state, until zil_lwb_flush_vdevs_done() is called. If the zil's zio
latency is high, it will cause txg_sync_thread() to starve.

The goal here is to defer waiting for zil_lwb_flush_vdevs_done to the
'syncing' txg state. That is, in zil_sync().

In this patch, it achieves the goal without holding transaction.
A new function zil_lwb_flush_wait_all() is introduced. It waits for
the completion of all the zil_lwb_flush_vdevs_done() by given txg.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Signed-off-by: jxdking <lostking2008@hotmail.com>
Closes openzfs#12321
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
In current zil_commit() process, transaction lwb_tx is assigned in
zil_lwb_write_issue(), and is committed in zil_lwb_flush_vdevs_done().
Thus, during lwb write out process, the txg is held in open or quiesing
state, until zil_lwb_flush_vdevs_done() is called. If the zil's zio
latency is high, it will cause txg_sync_thread() to starve.

The goal here is to defer waiting for zil_lwb_flush_vdevs_done to the
'syncing' txg state. That is, in zil_sync().

In this patch, it achieves the goal without holding transaction.
A new function zil_lwb_flush_wait_all() is introduced. It waits for
the completion of all the zil_lwb_flush_vdevs_done() by given txg.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Signed-off-by: jxdking <lostking2008@hotmail.com>
Closes openzfs#12321
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
In current zil_commit() process, transaction lwb_tx is assigned in
zil_lwb_write_issue(), and is committed in zil_lwb_flush_vdevs_done().
Thus, during lwb write out process, the txg is held in open or quiesing
state, until zil_lwb_flush_vdevs_done() is called. If the zil's zio
latency is high, it will cause txg_sync_thread() to starve.

The goal here is to defer waiting for zil_lwb_flush_vdevs_done to the
'syncing' txg state. That is, in zil_sync().

In this patch, it achieves the goal without holding transaction.
A new function zil_lwb_flush_wait_all() is introduced. It waits for
the completion of all the zil_lwb_flush_vdevs_done() by given txg.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Signed-off-by: jxdking <lostking2008@hotmail.com>
Closes openzfs#12321
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
In current zil_commit() process, transaction lwb_tx is assigned in
zil_lwb_write_issue(), and is committed in zil_lwb_flush_vdevs_done().
Thus, during lwb write out process, the txg is held in open or quiesing
state, until zil_lwb_flush_vdevs_done() is called. If the zil's zio
latency is high, it will cause txg_sync_thread() to starve.

The goal here is to defer waiting for zil_lwb_flush_vdevs_done to the
'syncing' txg state. That is, in zil_sync().

In this patch, it achieves the goal without holding transaction.
A new function zil_lwb_flush_wait_all() is introduced. It waits for
the completion of all the zil_lwb_flush_vdevs_done() by given txg.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Signed-off-by: jxdking <lostking2008@hotmail.com>
Closes openzfs#12321
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
In current zil_commit() process, transaction lwb_tx is assigned in
zil_lwb_write_issue(), and is committed in zil_lwb_flush_vdevs_done().
Thus, during lwb write out process, the txg is held in open or quiesing
state, until zil_lwb_flush_vdevs_done() is called. If the zil's zio
latency is high, it will cause txg_sync_thread() to starve.

The goal here is to defer waiting for zil_lwb_flush_vdevs_done to the
'syncing' txg state. That is, in zil_sync().

In this patch, it achieves the goal without holding transaction.
A new function zil_lwb_flush_wait_all() is introduced. It waits for
the completion of all the zil_lwb_flush_vdevs_done() by given txg.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Signed-off-by: jxdking <lostking2008@hotmail.com>
Closes openzfs#12321
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
In current zil_commit() process, transaction lwb_tx is assigned in
zil_lwb_write_issue(), and is committed in zil_lwb_flush_vdevs_done().
Thus, during lwb write out process, the txg is held in open or quiesing
state, until zil_lwb_flush_vdevs_done() is called. If the zil's zio
latency is high, it will cause txg_sync_thread() to starve.

The goal here is to defer waiting for zil_lwb_flush_vdevs_done to the
'syncing' txg state. That is, in zil_sync().

In this patch, it achieves the goal without holding transaction.
A new function zil_lwb_flush_wait_all() is introduced. It waits for
the completion of all the zil_lwb_flush_vdevs_done() by given txg.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Signed-off-by: jxdking <lostking2008@hotmail.com>
Closes openzfs#12321
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
In current zil_commit() process, transaction lwb_tx is assigned in
zil_lwb_write_issue(), and is committed in zil_lwb_flush_vdevs_done().
Thus, during lwb write out process, the txg is held in open or quiesing
state, until zil_lwb_flush_vdevs_done() is called. If the zil's zio
latency is high, it will cause txg_sync_thread() to starve.

The goal here is to defer waiting for zil_lwb_flush_vdevs_done to the
'syncing' txg state. That is, in zil_sync().

In this patch, it achieves the goal without holding transaction.
A new function zil_lwb_flush_wait_all() is introduced. It waits for
the completion of all the zil_lwb_flush_vdevs_done() by given txg.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Signed-off-by: jxdking <lostking2008@hotmail.com>
Closes openzfs#12321
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
In current zil_commit() process, transaction lwb_tx is assigned in
zil_lwb_write_issue(), and is committed in zil_lwb_flush_vdevs_done().
Thus, during lwb write out process, the txg is held in open or quiesing
state, until zil_lwb_flush_vdevs_done() is called. If the zil's zio
latency is high, it will cause txg_sync_thread() to starve.

The goal here is to defer waiting for zil_lwb_flush_vdevs_done to the
'syncing' txg state. That is, in zil_sync().

In this patch, it achieves the goal without holding transaction.
A new function zil_lwb_flush_wait_all() is introduced. It waits for
the completion of all the zil_lwb_flush_vdevs_done() by given txg.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Signed-off-by: jxdking <lostking2008@hotmail.com>
Closes openzfs#12321
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
In current zil_commit() process, transaction lwb_tx is assigned in
zil_lwb_write_issue(), and is committed in zil_lwb_flush_vdevs_done().
Thus, during lwb write out process, the txg is held in open or quiesing
state, until zil_lwb_flush_vdevs_done() is called. If the zil's zio
latency is high, it will cause txg_sync_thread() to starve.

The goal here is to defer waiting for zil_lwb_flush_vdevs_done to the
'syncing' txg state. That is, in zil_sync().

In this patch, it achieves the goal without holding transaction.
A new function zil_lwb_flush_wait_all() is introduced. It waits for
the completion of all the zil_lwb_flush_vdevs_done() by given txg.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Signed-off-by: jxdking <lostking2008@hotmail.com>
Closes openzfs#12321
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
In current zil_commit() process, transaction lwb_tx is assigned in
zil_lwb_write_issue(), and is committed in zil_lwb_flush_vdevs_done().
Thus, during lwb write out process, the txg is held in open or quiesing
state, until zil_lwb_flush_vdevs_done() is called. If the zil's zio
latency is high, it will cause txg_sync_thread() to starve.

The goal here is to defer waiting for zil_lwb_flush_vdevs_done to the
'syncing' txg state. That is, in zil_sync().

In this patch, it achieves the goal without holding transaction.
A new function zil_lwb_flush_wait_all() is introduced. It waits for
the completion of all the zil_lwb_flush_vdevs_done() by given txg.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Signed-off-by: jxdking <lostking2008@hotmail.com>
Closes openzfs#12321
snajpa pushed a commit to vpsfreecz/zfs that referenced this pull request Oct 22, 2022
In current zil_commit() process, transaction lwb_tx is assigned in
zil_lwb_write_issue(), and is committed in zil_lwb_flush_vdevs_done().
Thus, during lwb write out process, the txg is held in open or quiesing
state, until zil_lwb_flush_vdevs_done() is called. If the zil's zio
latency is high, it will cause txg_sync_thread() to starve.

The goal here is to defer waiting for zil_lwb_flush_vdevs_done to the
'syncing' txg state. That is, in zil_sync().

In this patch, it achieves the goal without holding transaction.
A new function zil_lwb_flush_wait_all() is introduced. It waits for
the completion of all the zil_lwb_flush_vdevs_done() by given txg.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Signed-off-by: jxdking <lostking2008@hotmail.com>
Closes openzfs#12321
snajpa pushed a commit to vpsfreecz/zfs that referenced this pull request Oct 22, 2022
In current zil_commit() process, transaction lwb_tx is assigned in
zil_lwb_write_issue(), and is committed in zil_lwb_flush_vdevs_done().
Thus, during lwb write out process, the txg is held in open or quiesing
state, until zil_lwb_flush_vdevs_done() is called. If the zil's zio
latency is high, it will cause txg_sync_thread() to starve.

The goal here is to defer waiting for zil_lwb_flush_vdevs_done to the
'syncing' txg state. That is, in zil_sync().

In this patch, it achieves the goal without holding transaction.
A new function zil_lwb_flush_wait_all() is introduced. It waits for
the completion of all the zil_lwb_flush_vdevs_done() by given txg.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Signed-off-by: jxdking <lostking2008@hotmail.com>
Closes openzfs#12321
snajpa pushed a commit to vpsfreecz/zfs that referenced this pull request Oct 23, 2022
In current zil_commit() process, transaction lwb_tx is assigned in
zil_lwb_write_issue(), and is committed in zil_lwb_flush_vdevs_done().
Thus, during lwb write out process, the txg is held in open or quiesing
state, until zil_lwb_flush_vdevs_done() is called. If the zil's zio
latency is high, it will cause txg_sync_thread() to starve.

The goal here is to defer waiting for zil_lwb_flush_vdevs_done to the
'syncing' txg state. That is, in zil_sync().

In this patch, it achieves the goal without holding transaction.
A new function zil_lwb_flush_wait_all() is introduced. It waits for
the completion of all the zil_lwb_flush_vdevs_done() by given txg.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Signed-off-by: jxdking <lostking2008@hotmail.com>
Closes openzfs#12321
bsdjhb pushed a commit to CTSRD-CHERI/zfs that referenced this pull request Jul 13, 2023
This is not associated with a specific upstream commit but apparently
a local diff applied as part of:

commit e3aa18ad71782a73d3dd9dd3d526bbd2b607ca16
Merge: 645886d028c8 b9d9845
Author: Martin Matuska <mm@FreeBSD.org>
Date:   Fri Jun 3 17:58:39 2022 +0200

    zfs: merge openzfs/zfs@b9d98453f

    Notable upstream pull request merges:
      openzfs#12321 Fix inflated quiesce time caused by lwb_tx during zil_commit()
      openzfs#13244 zstd early abort
      openzfs#13360 Verify BPs as part of spa_load_verify_cb()
      openzfs#13452 More speculative prefetcher improvements
      openzfs#13466 Expose zpool guids through kstats
      openzfs#13476 Refactor Log Size Limit
      openzfs#13484 FreeBSD: libspl: Add locking around statfs globals
      openzfs#13498 Cancel in-progress rebuilds when we finish removal
      openzfs#13499 zed: Take no action on scrub/resilver checksum errors
      openzfs#13513 Remove wrong assertion in log spacemap

    Obtained from:  OpenZFS
    OpenZFS commit: b9d9845
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants