Skip to content

Conversation

@motz0815
Copy link
Contributor

@motz0815 motz0815 commented Apr 28, 2025

Closes #549

This PR adds a new option to the mcopy command:
-O or --remote-name - which does the same as the cURL param -O.

This basically strips off all the things after a ? or # and takes the last section of the URL to determine the filename, avoiding usage of the HEAD request (which fails on presigned S3 urls, see #549 )

I tested this method with the following URL (expires on May 5th, 2025):

Details

https://nbg1.your-objectstorage.com/ziffer-artifacts-staging/builds/4e8f7980-3955-48c4-9ff9-9a8dbedcbf41/test-plugin-1.0-SNAPSHOT.jar?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=6OL4X4U2SMEH6AKWA71D/20250428/auto/s3/aws4_request&X-Amz-Date=20250428T135531Z&X-Amz-Expires=604800&X-Amz-SignedHeaders=host&X-Amz-Signature=0832406b9351cda79525911df41b7f76166993eb0949e9ab4135cd07c904c64a

successfully. The mcopy command fails if -O is not provided, and succeeds if -O is provided.

(Sorry for the formatting changes, I couldn't seem to get VSCode working with the .editorconfig correctly, maybe some other Contributor could format it once with IntelliJ?)

Copy link
Owner

@itzg itzg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for jumping on this so quickly.

Overall, this is a lot more complexity than I expected and wasn't really wanting to add a command line arg to conditionally alter the behavior. I would prefer the computed filename logic, which is needed for every URL, to simply work better in all cases.

Now that I took a minute to look at the existing code, I was expecting to enhance me.itzg.helpers.http.FetchBuilderBase#isHeadUnsupportedResponse to check for a 403 and return false. Then it would proceed with a GET. It would also involve re-arranging these bits to check for the 403 ahead of the notSuccess call.

if (notSuccess(resp)) {
return failedRequestMono(resp, bodyMono, "Extracting filename");
}
if (isHeadUnsupportedResponse(resp)) {
return assembleFileDownloadNameViaGet(client);
}

The method me.itzg.helpers.http.OutputToDirectoryFetchBuilder#extractFilename already handles the fallback logic of extracting the filename from the URL path; however, if it's incorrectly getting mixed up by query parameters, then that should be fixed there.

URI.getPath() is not supposed to contain the query nor the fragment. Did you find that wasn't the case?

@motz0815
Copy link
Contributor Author

I see how you want to approach this, and I agree that that would be a lot cleaner.

I would be more than happy to jump in on the work and help you out with this if you don't have time right now.

(As for the URI.getPath() function, I haven't really checked the docs at all, the last time I used Java with anything fetch-related is already a loong time ago, so sorry for that oversight.)

@itzg
Copy link
Owner

itzg commented Apr 28, 2025

That would be great if you could continue on with the approach I'm describing.

I feel bad suggesting this, but it might be cleaner to open a new PR with just the new, minimal changes. It's up to you though since I'll squash-merge the PR anyway. So the incremental changes won't clutter the main line in any case.

@motz0815
Copy link
Contributor Author

Okay sure, I also think a new PR would be better, so I'm just going to close this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

mcopy: avoid use of HEAD to determine filename since some object storage returns 403

2 participants