Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow configuring the message converter in HttpPutFormContentFilter [SPR-14503] #19072

Closed
spring-issuemaster opened this issue Jul 21, 2016 · 11 comments

Comments

@spring-issuemaster
Copy link
Collaborator

commented Jul 21, 2016

Serhii Hromovyi opened SPR-14503 and commented

Hi,

we faced a problem using FormHttpMessageConverter.MultipartHttpOutputMessage while processing multipart request.
Our case is very similar to #16724, but with the only difference - we operate with InputStreamResourse (getFilename always returns null) and filename of multipart is defined in Content-Disposition header. And when MultipartHttpOutputMessage invokes writeHeaders() method - header value is retrieved in ASCII encoding:
byte[] headerValue = getAsciiBytes(headerValueString);
In our case we have there filename, which contains cyrillic symbols, that are transformed to "?" afterwards.
Is it ok, that ASCII encoding is using there?
If so - is it possible to allow us at least to override this encoding? Cause now there is no possibility to do it.


Affects: 4.2.6

Issue Links:

  • #16724 FormHttpMessageConverter writes ASCII encoded so that a multipart form data can not contain filenames with German Umlaute
  • #19115 HttpHeaders.setContentDispositionFormData() doesn't encode non-acsii characters correctly

1 votes, 5 watchers

@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator Author

commented Jul 25, 2016

Rossen Stoyanchev commented

Sorry but I'm not following what you mean. How exactly is the filename getting set given that InputStreamResourse does not have a filename?

@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator Author

commented Jul 26, 2016

Serhii Hromovyi commented

It's just plain http multipart request.
File, that is sent to us is parsed as InputStreamResourse, but the original filename is stored in ContentDisposition header.

@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator Author

commented Aug 8, 2016

Rossen Stoyanchev commented

Sorry but the use case is still not clear. The original description mentions MultipartHttpOutputMessage and writing several times and #16724 fixes an issue on the side of writing. The original description also mentions an InputStreamResource, i.e. reading, and so does your more recent comment. So is this on the input or on the output side or if both you'll have to explain better the scenario and where exactly you think the missing feature is.

@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator Author

commented Dec 22, 2016

Vyacheslav Artemyev commented

@Rossen Stoyanchev I have similar problem with file name encoding. We use Zuul with Ribbon for transfer file from one backend server to another backed server.
When I transfer multipart/form-data file MultipartHttpOutputMessage.getBody() method called.
This method use writeHeaders() method, which work only with ASCII:

byte[] headerName = getAsciiBytes(entry.getKey());

Why?

@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator Author

commented Dec 22, 2016

Rossen Stoyanchev commented

Vyacheslav Artemyev, to answer the why question first, MIME header fields are restricted to US-ASCII characters. From the multipart form-data spec RFC 7578:

In most multipart types, the MIME header fields in each part are
restricted to US-ASCII; for compatibility with those systems, file
names normally visible to users MAY be encoded using the percent-
encoding method in Section 2, following how a "file:" URI
[URI-SCHEME] might be encoded.

Now how to actually do non-ASCII filenames? This is where the confusion begins.

The old multipart form-data spec RFC 2388 (in section 4.4) recommends a type of encoding described in RFC 2231 and later specifically for header fields in RFC 5987 which explicitly puts the encoding in the header field, e.g. title*=UTF-8''%c2%a3%20and%20%e2%82%ac%20rates. We do support that through a property on FormHttpMessageConverter which causes the filename to be encoded. This was work done as part of #16724.

Now the latest RFC 7578 that I quoted in the beginning, in the same section notes not to use RFC 5987 style encoding (as we did for #16724) but to use percent encoding instead. It also adds a comment that "Some commonly deployed systems use multipart/form-data with file names directly encoded including octets outside the US-ASCII range."

This is quite a mess. We could take the advice in the latest spec and use percent encoding but the other side would have to know to decode it likewise. Furthermore the I'm not sure what to make of the comment that some systems plainly ignore the US-ASCII restriction. If we did follow the spec recommendation to percent encode, that would conflict with systems that ignore the ASCII restriction. Oddly enough for an RFC, both of these -- percent encoding and forget ASCII restriction -- make it impossible for the receiving side to do the right thing unless it matched exactly what the sending side did.

At this point I'm more inclined to stick to the existing setMultipartCharset property we have (i.e. RFC 5987 encoding) which we know is supported in libraries and browsers and also explicitly indicates the charset, which does make it possible to do the right thing.

Have you tried that property? Does it work for you?

@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator Author

commented Dec 23, 2016

Paul commented

Our detailed use case is following:
We have several frontend systems, that use zuul proxy to upload files to our backed. Of course right now uploading of files from js is pretty straightforward.
I see couple of things we can do to conform to standards:
Encode all non-ascii filenames on frontend. It will require massive changes in all frontend systems.
Patch zuul proxy (pull request and so on). Right now it doesn't use multipartCharset property and I even can't understand in which place I should use it.

But for me it looks like there is no real clients that area beware of this RFC. I think that the most popular client for multipart upload is js and it doesn't respect RFC. Also there may be clients that upload files percent-quoted out of the box, but then they decide filenames out of the box too.

If utf-8 support for filenames will be enabled by default - neither first or second will suffer - things will continue to work for them executive as now.
Is there any situation in which, if will be enabled, utf-8 support will break existing functionality?
It looks for me that not, it won't.

@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator Author

commented Dec 23, 2016

Vyacheslav Artemyev commented

Rossen Stoyanchev, can you explain how can I use solution with setMultipartCharset?
When our backend receive request following classes are called:

FormHttpMessageConverter -> AllEncompassingFormHttpMessageConverter -> HttpPutFormContentFilter

In this case I can't use setMultipartCharset method directly, unfortunately

@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator Author

commented Dec 23, 2016

Rossen Stoyanchev commented

Indeed when used through the HttpPutFormContentFilter the FormHttpMessageConverter it contains is not exposed. You'll have to replace the filter as well in your patch. I'm scheduling this for a fix since clearly changes are needed.

Regarding browsers HTML5 recommends to use the encoding of the form: "File names included in the generated multipart/form-data resource (as part of file fields) must use the character encoding selected above. Previously HTML 4 recommended encoding non-ascii filenames.

So we need to review our present arrangement in light of all this.

@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator Author

commented Dec 23, 2016

Vyacheslav Artemyev commented

Rossen Stoyanchev, can you explain this phrase:
I'm scheduling this for a fix since clearly changes are needed.

I found that OrderedHttpPutFormContentFilter created in WebMvcAutoConfiguration if no one bean with type HttpPutFormContentFilter found.
I think we can add new method setMultipartCharset to HttpPutFormContentFilter, which will be set multipartCharset in FormHttpMessageConverter.

What do you think about it? If you agree with this approach, I can make new pull request with this solution.
Let me know.

P.S. we aslo can add another property like: spring.mvc.formcontent.putfilter.multipartcharset in property file.

@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator Author

commented Dec 23, 2016

Rossen Stoyanchev commented

At the level of the HttpPutFormContentFilter it makes sense to add setConverter(FormHttpMessageConverter) as well as a getter. Boot can then have a property like the one you suggest. Before we go farther have you been able to confirm if that property even work in your case?

@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator Author

commented Jan 17, 2017

Rossen Stoyanchev commented

Modified title (was: "FormHttpMessageConverter.MultipartHttpOutputMessage writes headers in ASCII so Content-Disposition could not contain filename with non-ascii symbols").

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.