Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

copyurl: Update copyurl to honor HTTP filename directive #5485

Merged
merged 2 commits into from
Jun 20, 2022

Conversation

jptreen
Copy link

@jptreen jptreen commented Jul 26, 2021

What is the purpose of this change?

Adding --auto-filename-header

To set preferred download filenames for HTTP requests, RFC 6226
specifies a 'filename' directive, available within 'Content-Disposition'
header, and suggests both token and quoted string syntaxes should be allowable
for the filename directive. i.e.: either filename=name.txt or
filename="name.txt" should be acceptable.

See below for details:
https://httpwg.org/specs/rfc6266.html#disposition.parameter.filename
https://httpwg.org/specs/rfc6266.html#advice.generating

--auto-filename-header is designed to work in conjunction with --auto-filename, honoring the filename directive when retrieving files from a server.

Directives contained within the Content-Disposition header are semicolon delimited, and can vary in length and order. Consideration had to be given to parsing entries like inline; filename="file.txt"; as well as the more straightforward examples above.

Was the change discussed in an issue or in the forum before?

I raised it on the forum, got no response, and thought it was an clear enough issue to tackle myself without altering existing functionality. Maybe that was good enough. If not, maybe a rogue PR is enough to get a response. 😉

https://forum.rclone.org/t/copyurl-auto-filename-could-be-cleverer-and-respect-the-http-content-disposition-optional-directive-filename/25553

Checklist

  • I have read the contribution guidelines.
  • I have added tests for all changes in this PR if appropriate.
  • I have added documentation for the changes if appropriate.
  • All commit messages are in house style.
  • I'm done, this Pull Request is ready for review :-)

@jptreen jptreen force-pushed the auto-filename-header branch 2 times, most recently from b6de3ab to 9aa9008 Compare July 27, 2021 12:27
@jptreen
Copy link
Author

jptreen commented Jul 27, 2021

Fixed linux workflow failure and squashed the commits.

@jptreen
Copy link
Author

jptreen commented Jul 29, 2021

Fixed some linting issues.

@ncw
Copy link
Member

ncw commented Aug 1, 2021

--auto-filename-header is designed to work in conjunction with --auto-filename, honoring the filename directive when retrieving files from a server.

Is a new flag necessary? Maybe it should always be parsing Content-Disposition if --auto-filename is set? That is what browsers would do, after all.

What do you think?

I raised it on the forum, got no response, and thought it was an clear enough issue to tackle myself without altering existing functionality. Maybe that was good enough. If not, maybe a rogue PR is enough to get a response.

:-) It looks like a sensible idea to me.

I'd be interested to hear what actual use case you have for it.

I'll review the code now.

Copy link
Member

@ncw ncw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See inline for code comments :-)

If we decide to drop the flag it will simplify the code considerably!

fs/operations/operations.go Outdated Show resolved Hide resolved
fs/operations/operations.go Outdated Show resolved Hide resolved
fs/operations/operations.go Outdated Show resolved Hide resolved
fs/operations/rc.go Outdated Show resolved Hide resolved
@jptreen
Copy link
Author

jptreen commented Aug 14, 2021

Thanks for the review:

Is a new flag necessary? Maybe it should always be parsing Content-Disposition if --auto-filename is set? That is what browsers would do, after all.

Through an abundance of caution, I didn't want to tread on any existing behavior, even if that behavior was wrong. I'm happy, however, to aim to emulate a generic browser functionality.

I'd be interested to hear what actual use case you have for it.

I have to work with a list of temporary web available files, which look like they've been secured with a salted hash, and the filename just isn't in the path, the hash is the final segment. If I don't want to download the files as something like 0894635ac5a1128086581d149e963c2dd9a4ea90, but instead name_i_would_like.jpg, I need to be able to parse the headers. This job was previously done by a person with a browser, so it just never came up as an issue.

In Go a capitalised first letter makes a function externally accessible to the package.

Many thanks for the review. Apologies for my appauling Go code, its not exactly a language I'm best familiar with. The reason I exported this function is because I wanted to test it on its own, and I was getting complaints when trying to run tests directly calling a non-exported function.

I'll take a look at your notes, remove the new flag, check out MIME parsing code, and rework this PR when I get the chance.

@ncw
Copy link
Member

ncw commented Aug 14, 2021

Thanks for the review:

Is a new flag necessary? Maybe it should always be parsing Content-Disposition if --auto-filename is set? That is what browsers would do, after all.

Through an abundance of caution, I didn't want to tread on any existing behavior, even if that behavior was wrong. I'm happy, however, to aim to emulate a generic browser functionality.

I checked the docs for curl and wget and they both require a flag (--remote-header-name and --content-disposition respectively).

Reading the curl docs

   -J, --remote-header-name
          (HTTP) This option tells the -O, --remote-name option to use the
          server-specified  Content-Disposition  filename  instead  of ex‐
          tracting a filename from the URL.
          [snip]
          WARNING:  Exercise  judicious  use of this option, especially on
          Windows. A rogue server could send you the  name  of  a  DLL  or
          other  file  that could possibly be loaded automatically by Win‐
          dows or some third party software.

Made me think that we should probably excercise a little care with the filename

The mozilla docs say (emphasis mine)

filename

Is followed by a string containing the original name of the file transmitted. The filename is always optional and must not be used blindly by the application: path information should be stripped, and conversion to the server file system rules should be done. This parameter provides mostly indicative information. When used in combination with Content-Disposition: attachment, it is used as the default filename for an eventual "Save As" dialog presented to the user.

So I think your flag is a good idea.

I think we should also sanitise the filename...

I'd do this by converting \ to / (with strings.Replace) then running path.Base() on the result to discard the path information as suggested above. This string could still contain characters which can't be represented on the destination filesystem, however the backend should then convert it into something which is safe.

I'd be interested to hear what actual use case you have for it.

I have to work with a list of temporary web available files, which look like they've been secured with a salted hash, and the filename just isn't in the path, the hash is the final segment. If I don't want to download the files as something like 0894635ac5a1128086581d149e963c2dd9a4ea90, but instead name_i_would_like.jpg, I need to be able to parse the headers. This job was previously done by a person with a browser, so it just never came up as an issue.

Thanks for the explanation.

In Go a capitalised first letter makes a function externally accessible to the package.

Many thanks for the review. Apologies for my appauling Go code, its not exactly a language I'm best familiar with.

That is fine - very happy to help newcomers to Go with the syntax!

The reason I exported this function is because I wanted to test it on its own, and I was getting complaints when trying to run tests directly calling a non-exported function.

What you want to do is stick the test in fs/operations/operations_internal_test.go which is the internal tests - the operations_test.go are the external (black box) tests which don't have access to the internal symbols. (The package header is different between those two files, one is operations and the other is operations_test which is why they are different).

I'll take a look at your notes, remove the new flag, check out MIME parsing code, and rework this PR when I get the chance.

Great! Give me a ping when you are read for another review.

@ivandeex
Copy link
Member

ivandeex commented Nov 18, 2021

@jptreen

we should probably exercise a little care with the filename
I think we should also sanitize the filename...
I'd do this by converting \ to / then running path.Base() on the result to discard the path information as suggested above
you want to stick the test in fs/operations/operations_internal_test.go
ping when you are ready for another review.

@ivandeex
Copy link
Member

Unfortunately rclone is short on resources to finish all PRs unfinished by authors.
Marking as draft until we have news from submitter.

@ivandeex ivandeex marked this pull request as draft November 19, 2021 17:26
@ivandeex ivandeex modified the milestones: v1.58, v1.59 Nov 19, 2021
@buengese buengese force-pushed the auto-filename-header branch 3 times, most recently from 1dc8ade to e91e681 Compare June 17, 2022 15:52
Implemented --header-filename for use with copyurl.

For specifically setting preferred download filenames for HTTP requests, RFC 6226
specifies a 'filename' directive, available within 'Content-Disposition'
header. We can handle with 'mime.ParseMediaType'.

See below for details:
https://httpwg.org/specs/rfc6266.html#disposition.parameter.filename
https://httpwg.org/specs/rfc6266.html#advice.generating

Co-authored-by: buengese <buengese@protonmail.com>
@buengese buengese marked this pull request as ready for review June 17, 2022 17:00
@buengese
Copy link
Member

So cleaned up and switched to mime.ParseMediaType. I've kept a separate flag for using the content-disposition header that has to be set in addition to --auto-filename but renamed it --header-filename.

@buengese buengese assigned buengese and unassigned jptreen Jun 17, 2022
@buengese buengese requested a review from ncw June 17, 2022 17:02
Copy link
Member

@ncw ncw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good :-)

Needs a test - luckily there is a framework for them already so it should be easy!

fs/operations/operations_test.go Show resolved Hide resolved
@buengese
Copy link
Member

I've added some test now.

Copy link
Member

@ncw ncw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super job :-)

And thank you for rescuing this and adding tests @buengese

I'll merge this now.

@ncw ncw merged commit ac0dc99 into rclone:master Jun 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants