[IMPROVEMENT] Potentially reduce the two minute iSCSI timeout for v1 volumes #8382
Labels
kind/improvement
Request for improvement of existing function
require/backport
Require backport. Only used when the specific versions to backport have not been definied.
require/doc
Require updating the longhorn.io documentation
require/manual-test-plan
Require adding/updating manual test cases if they can't be automated
Milestone
Is your improvement request related to a feature? Please describe (馃憤 if you like this request)
While investigating #2187, we did a deep dive into the behavior of v1 volumes when the instance-manager process group is abruptly killed: https://github.com/longhorn/longhorn/wiki/Freezing-File-Systems-With-dmsetup-suspend-Versus-fsfreeze.
During the investigation, we noticed that all I/O was blocked (it could neither complete successfully or return an error) until two minutes after the crash. Relevant
dmesg
logs look like:Since iSCSI traffic is all local to a node, it is unlikely there is a timeout for any reason OTHER than a tgtd crash, so waiting two minutes does not seem necessary.
Describe the solution you'd like
Reduce the iSCSI timeout if it is practical to do so.
Describe alternatives you've considered
If it is not practical to reduce the iSCSI timeout, we can keep it like it is.
Additional context
There are various online sources discussing ways to change iSCSI and/or SCSI timeouts. We need to do a bit of investigation to determine which timeout and method is correct for this use case.
The text was updated successfully, but these errors were encountered: