New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CDI download fails due to nbdkit: curl error #2561
Comments
|
Have a look at: Adding these filters allows nbdkit to read ahead and recover from short network failures. |
|
This is how we arrange the filters in virt-v2v, on top of nbdkit-curl-plugin: A few notes about this:
Do you have an example of a source where the current nbdkit command is especially slow? |
|
@rwmjones here is a nice slow url from #2358 https://cloud-images.ubuntu.com/jammy/current/jammy-server-cloudimg-amd64.img |
|
Thanks - I'll play with it tomorrow & see if I can reproduce it and come up with suggestions. |
|
Thanks again for the example, it was useful to have something concrete to experiment with. Firstly the answer to this bug is to add The answer to the performance issues was more interesting and we're still discussing it on IRC. A summary is in this email: https://listman.redhat.com/archives/libguestfs/2023-February/030581.html nbdkit curl plugin has some overhead. I measured it about 25% slower than wget. However that overhead completely disappears when NBD multi-conn is enabled - it's basically the same speed as wget. But there are three bugs which get in the way: (1) The plugin doesn't enable multi-conn. (There is a simple workaround for this involving nbdkit-multi-conn-filter, see email above, but also we will fix the plugin.) (2) In this particular case because the file is qcow2, you want to run Unfortunately the NBD client in qemu doesn't support multi-conn. In discussion with Eric Blake about if we can fix this. (3) Also it looks like there may be another problem in qemu-img convert because performance is terrible. I even tried using the internal qemu curl client, which completely bypasses NBD, nbdkit etc, and performance was still terrible. It was taking 4 or 5 minutes to download and convert the image, even with all the caching and readahead features turned up. If fixing (1) and (2) still shows any performance drop versus wget I'll have a look at this later. |
|
@rwmjones thanks for digging into this! Good to know the retry filter will address the issue here. I think it would be best to track your performance related work/observations in #2358 in which we would like to see if qemu-img + nbdkit can perform as well as wget to scratch space then qemu-img convert locally |
|
Since #2584 is merged, @k8scoder192 have you been able to replicate your issue again with the retry filter? If the download failure has been addressed I think we can consider closing the issue, unless we want to continue with the performance conversation here. |
|
@alromeros I can't test this unless the fix is backported to CDI v1.54. Can someone do that? 1.55 introduced issues (pod doesn't run as root, which fails with my current cluster config) |
@k8scoder192 sure, we can backport the fix to that version. I'll let you know once we release the version with the fix. |
|
Backport PR is #2666 |
|
Hi, I just released v1.54.1 which contains the retry filter. |
|
Excellent, feel free to close the issue then. |
What happened:
CDI download of large image file (380M) fails with error
Describe on importer pod output
What you expected to happen:
CDI should be able to download and convert image file to raw without connection issues.
wget of source
https://cloud-images.ubuntu.com/bionic/current/bionic-server-cloudimg-amd64.imgworks flawlessly even when ran multiple times.How to reproduce it (as minimally and precisely as possible):
Ceph block storage (PVC in block mode)
CDI v1.54.0
Also cdi CR resources were increased to eliminate this as a possible issue
Additional context:
It appears using nbdkit to curl the file could be causing issues.
Support for this stems from the fact that when using CDI to download
.xzfiles, nbdkit is not used and no connection issues were seen (even when ran multiple times).In the
.xzcase, a golang http client downloads to scratch space and when complete, qemu-img starts the conversion to a raw. See PR 2351Others have seen nbdkit curl errors
See issue 1737 and see step 5 (log output) in issue 1980
Suggestion: Don't use nbdkit to curl, use same method as code currently uses for
.xzfilesEnvironment:
kubectl get deployments cdi-deployment -o yaml): v1.54.0kubectl version): v1.24.0 and/or v1.21.0uname -a): 4.15The text was updated successfully, but these errors were encountered: