-
Notifications
You must be signed in to change notification settings - Fork 243
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTTP Datasource can't handle s3 presigned URL for ISO file #2737
Comments
I test different file format behind s3 presigned URL.
If we can handle iso file format, the error will not occur. |
Thanks for the report. So I am a little confused, because we do detect if an image is RAW (RAW is the same as ISO) and if we detect that, nbdkit should not be involved at all since there is no conversion needed. We need to investigate why the detection is not working, as we do have some functional tests that are downloading iso files and it seems to be working there. |
@awels I dig into http-processor, for s3 presigned URL of ISO image, the containerized-data-importer/pkg/importer/http-datasource.go Lines 133 to 141 in 20d21d4
|
The error is:
This comes from the following code: What we are doing here is a HEAD request when initially opening the URL in order to find out the size of the remote file. This is fairly fundamental to the way that the plugin (and NBD) works since it always needs to know the size of the disk image. The HEAD request is also needed for detecting servers which don't support range requests (and IIRC AWS S3 is one of those too). So I guess I have a few questions ...
|
So I think that is the real bug, if not convert, we should be able to download directly with the go client. We shouldn't start nbdkit at that point. |
Yeah wonder if we can always return |
|
I think we introduce nbdkit to save scratch space when process image. Now there are 3 exception cases in HTTP datasource:
This bug hit case-1, so we should download the file to scratch space (or let nbdkit-curl-plugin can handle s3 presigned URL, I find no neat way to do this). For case-2, I think we can improve the code, we can use nddkit-tar-filter to read image inside a remote (compressed) tar file. For case-3, I think current code has some bug, the nbdkit-curl-plugin can handle custom TLS cert. containerized-data-importer/pkg/image/nbdkit.go Lines 98 to 100 in fab858e
|
I think it will be better to add some e2e tests to cover these exception cases. |
Crazy .. Anyway it should be possible to fall back to using GET, grabbing the content-length header and closing the connection. Let me see how possible that is. |
Yes, we just use HEAD request to find the size of the HTTP file, GET request with "Range: bytes=0-0" header can also "peek" the size of the HTTP file without download it. Anyway it's the nbdkit stuff, I am not sure whether it is proper to handle this case in nbdkit-curl-plugin. |
Potential patch for curl plugin posted: https://listman.redhat.com/archives/libguestfs/2023-June/031723.html
This is an idea too, but I wonder if all servers really support zero-length ranges. Do you have an AWS URL we can test against? |
Nice patch 😄 I generate a s3 presigned URL of iso file for test. (it will be expire in 12 hours) |
Thanks for the URL, I tested it here and it works with my posted patches. Unfortunately setting Range: bytes=0-0 doesn't work as it causes S3 to return |
My test:
So it still prints the error on the fallback path, but at least it doesn't fail. |
@mhenriks Do we still need a quick fix in CDI project, or just wait the nbdkit patch merged and upgrade our nbdkit rpms version. |
Not familiar with nbdkit internals, but am curious if the patch works with chunked content encoding as well? |
I just posted #2743. We can add the nbdkit fix late and maybe eliminate |
We never send |
All HTTP/1.1 applications that receive entities MUST accept the https://www.rfc-editor.org/rfc/rfc2616#section-4.4 Edit: my point is that you may want to do a http 1.0 connection for the fallback get request in your patch |
Huh the more you know ... Looks like curl does deal with this case correctly, so we're all good: Edit: So the fallback GET request is an issue if the server doesn't send a valid Content-Length header (for example if it is generating the content on the fly). That's a case for forcing HTTP 1.0 or otherwise disabling chunked-encoding. Not sure how server would handle that well. Edit 2: Added to discussion upstream. |
For a future commit we will need to be able to simulate an Amazon AWS S3 server, which in some circumstances will fail HEAD requests (403 Forbidden error) but allow GET requests to the same URL. Add this capability to our test web server code. Currently unused, so all existing callers pass an extra ", false" parameter. Related: kubevirt/containerized-data-importer#2737 Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Some servers do not support HEAD for requesting the headers. If the HEAD request fails, fallback to using the GET method, abandoning the transfer as soon as possible after the headers have been received. Fixes: kubevirt/containerized-data-importer#2737 Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Summing up sig storage comments: So this is now merged in development branches, https://gitlab.com/nbdkit/nbdkit/-/commit/4b111535c803e896e6bc4cd020651db861d1d8b1 We should think if we want this backported to any stable branch in the short term, or just let it make its way to us Note if we want this in, we need to open backport bugs specifying the versions. |
Hey @lxs137, this should've been temporarily fixed with #2841. We plan to release this version today, so feel free to try it when ready and confirm if your issue persists. Edit: v1.57 is already out https://github.com/kubevirt/containerized-data-importer/releases/tag/v1.57.0 |
What happened:
Create a DataVolume with HTTP Datasource. The HTTP URL is a s3 presigned URL, which is only signed with HTTP Method 'GET' (which means any HTTP Method other than 'GET' is forbidden by s3 policy)
I got error log from cdi-importer, looks like nbdkit try to do HTTP 'HEAD' on the s3 presigned URL, which is forbidden.
What you expected to happen:
A clear and concise description of what you expected to happen.
How to reproduce it (as minimally and precisely as possible):
Steps to reproduce the behavior.
Additional context:
I can't use s3 datasource, because I think it's not safe to use s3 AK/SK just to download something.
Environment:
The text was updated successfully, but these errors were encountered: