-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[vhost_boot]: bdev_lvol_create_lvstore times out on large device #3050
Comments
Another instance of this failure. Reported by @mikeBashStuff. log: https://ci.spdk.io/public_build/autotest-per-patch_105754.html |
[Bug scrub] For the timeout during lvol store creation, try using the |
Finally got some time to test it. As expected, using With that in mind, I will close this issue. In case it resurfaces we know the potential workaround. |
Another instance of this issue: https://ci.spdk.io/public_build/autotest-spdk-master-vs-dpdk-v23.11_184.html |
Another instance of this issue: https://ci.spdk.io/public_build/autotest-spdk-master-vs-dpdk-v23.11_203.html |
Just to double check I've ran lvs creation on bdev of such sizes as specified in the issue. There does not seem to be anything out of ordinary and the time it takes is somewhat linear with the size. I've used the same drive here, and sizes below 8TB were measured by creating split bdev with specified size:
Everytime all of that was handled by single unmap and time is result of the device processing that I/O operation. Ex:
Summing up, we need to accommodate those times in the tests. Default timeout in rpc.py is 60 seconds, which could timeout in worst case scenarios. spdk/test/vhost/lvol/lvol_test.sh Line 47 in a60e1ac
@mchomnic Changing clean_method is troublesome, because it would have to be done in most tests, and would have to enforced in each new test, still having possibility to miss that. It was communicated to me that the rpc timeout in tests are decreased (to 15 sec), but I don't see it in the test scripts and first log in this issue times out after 1 minute. If the test suites actually use the default 60 sec, maybe extending it bit further would help elevate this issue ? |
This seems to be referring to #3312 (comment) though I am not sure what exactly you discussed outside of these issues. There was an internal thread about what I mentioned in this linked comment though. |
@mikeBashStuff , do you think we can just extend the timeout to 30 seconds? I know it's always tricky to raise the timeout whenever it's needed, but it seems that bigger disks require more of it. |
This was mentioned as an offhand comment offline, but as I've said I couldn't find it being the case. Didn't notice the
Please note that even 60 seconds are not sufficient - per the original report. Rather than changing timeout for all commands, or requiring to explicitly write tests with higher timeout (or clean_method parameter). It would be best to increase timeout specifically only for lvol store creation, yet I'm not sure what options we have for that. |
You mean teach I like this solution, as long as we document it, and ensure that picking a timeout on the rpc.py command line still takes precedence. It certainly seems possible, from looking at the JSONRPCClient python code. I think it just needs to add a method to set the timeout on a previously created client object. We could just set the timeout dynamically when creating the JSONRPCClient, but that doesn't work for batching mode (where we pipe a bunch of RPCs into one rpc.py invocation). For this latter case we would want only the bdev_lvol_create_lvstore to have the longer timeout, where the rest get the normal default. |
Another instance of this failure. Reported by @spdkci / Known Issue Detector. log: https://ci.spdk.io/public_build/autotest-spdk-v24.01-LTS-vs-dpdk-v23.11_388.html |
Another instance of this failure. Reported by @spdkci / Known Issue Detector. log: https://ci.spdk.io/public_build/autotest-nightly-lts_1795.html |
Another instance of this failure. Reported by @spdkci / Known Issue Detector. log: https://ci.spdk.io/public_build/autotest-nightly-lts_1806.html |
Another instance of this failure. Reported by @spdkci / Known Issue Detector. log: https://ci.spdk.io/public_build/autotest-spdk-v24.01-LTS-vs-dpdk-v22.11_402.html |
Another instance of this failure. Reported by @spdkci / Known Issue Detector. log: https://ci.spdk.io/public_build/autotest-nightly-lts_1838.html |
The function bdev_lvol_create with the default unmap clear method may take more than the default 60.0 sec. Increase the timeout to 90.0 for the function. Fixes #3050 Signed-off-by: Marek Chomnicki <marek.chomnicki@intel.com> Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/22753 (master) (cherry picked from commit 679c318) Change-Id: I6b8ac214b20601e7247ba789b485f53d3545955c Signed-off-by: Marek Chomnicki <marek.chomnicki@intel.com> Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/23246 Reviewed-by: Jim Harris <jim.harris@samsung.com> Reviewed-by: Konrad Sztyber <konrad.sztyber@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
https://ci.spdk.io/results/autotest-per-patch/builds/105754/archive/vhost-autotest/index.html
This error was raised after
bdev_lvol_create_lvstore()
hit a timeout, while creating lvstore under 8TB nvme drive. It's likely that the workaround here would be to increase the timeout, similarly to 487da02, however, app termination should be still handled gracefully.The text was updated successfully, but these errors were encountered: