Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HttpPostMultipartRequestDecoder should decode header field parameters #7265

Closed
wants to merge 1 commit into from

Conversation

@dminkovsky
Copy link
Contributor

commented Sep 29, 2017

Motivation:

I am receiving a mutlipart/form_data upload from a Mailgun webhook. This webhook used to send parts like this:

--74e78d11b0214bdcbc2f86491eeb4902
Content-Disposition: form-data; name="attachment-2"; filename="attached_�айл.txt"
Content-Type: text/plain
Content-Length: 32

This is the content of the file

--74e78d11b0214bdcbc2f86491eeb4902--

but now it posts parts like this:

--74e78d11b0214bdcbc2f86491eeb4902
Content-Disposition: form-data; name="attachment-2"; filename*=utf-8''attached_%D1%84%D0%B0%D0%B9%D0%BB.txt

This is the content of the file

--74e78d11b0214bdcbc2f86491eeb4902--

This new format uses field parameter encoding described in RFC 5987. More about this encoding can be found here.

Netty does not parse this format. The result is the filename is not decoded and the part is not parsed into a FileUpload.

Modification:

  • Added failing test in HttpPostRequestDecoderTest.java and updated HttpPostMultipartRequestDecoder.java
  • Refactored to please Netkins

Result:

Fixes #7265 (this):

  • HttpPostMultipartRequestDecoder identifies the RFC 5987 format and parses it.
  • Previous functionality is retained.

@dminkovsky dminkovsky changed the title add test HttpPostMultipartRequestDecoder should decode header field parameters Sep 29, 2017

@dminkovsky dminkovsky force-pushed the dminkovsky:decode-field-parameters branch from 5153a3e to e4a9853 Sep 29, 2017

@dminkovsky

This comment has been minimized.

Copy link
Contributor Author

commented Sep 30, 2017

First commit on PR is the actual patch.

Second commit was required to please netkins. I was already 5 indents in.

@dminkovsky dminkovsky force-pushed the dminkovsky:decode-field-parameters branch from 69f1809 to 78977d5 Sep 30, 2017

if (!shouldDecode) {
value = value.substring(1, value.length() - 1);
} else {
String[] split = value.split("''", 2);

This comment has been minimized.

Copy link
@fenik17

fenik17 Sep 30, 2017

Contributor

Consider replacing String#split with use precompiled Pattern constant or a value.indexOf("''").

This comment has been minimized.

Copy link
@dminkovsky

dminkovsky Sep 30, 2017

Author Contributor

Thank you. I don't have much experience in performant Java string work, so I was following the lead of existing code a few lines up where you have value.split("=", 2). Does the compiler/runtime optimize this case because it's a unit-length string and therefore effectively a char?

This comment has been minimized.

Copy link
@fenik17

fenik17 Sep 30, 2017

Contributor

Yep. For one-char string a String.split in OpenJDK doesn't use Pattern. But the string "''" has two chars.

This comment has been minimized.

Copy link
@dminkovsky

dminkovsky Sep 30, 2017

Author Contributor

Ah okay thanks. I opted for the precompiled pattern because it seemed simpler.

@@ -805,6 +795,35 @@ private InterfaceHttpData findMultipartDisposition() {
}
}

private Attribute getContentDispositionAttribute(String... values) {
Attribute attribute;

This comment has been minimized.

Copy link
@fenik17

fenik17 Sep 30, 2017

Contributor

Why is it necessary to declare the attribute here?

This comment has been minimized.

Copy link
@dminkovsky

dminkovsky Sep 30, 2017

Author Contributor

That is how it was before my patch on 4.1. For the actual test/patch, please see my first commit (e4a9853). My second commit, which this comes from, was done because Netkins complained I had gone above 5 indents and should refactor. So I extracted that whole block as its own private function. I would rather not include this second commit in this PR because it obfuscates the patch. But I wanted the green check mark in the list of PRs.

This comment has been minimized.

Copy link
@fenik17

fenik17 Sep 30, 2017

Contributor

Ok. But now there is no need for this. Just use fast return without variable:

    return factory.createAttribute(request, name, value);

This comment has been minimized.

Copy link
@dminkovsky

dminkovsky Sep 30, 2017

Author Contributor

Yes of course. Fixed this.

// See http://www.w3.org/Protocols/rfc2616/rfc2616-sec19.html
if (HttpHeaderValues.FILENAME.contentEquals(name)) {
// filename value is quoted string so strip them
if (!shouldDecode) {

This comment has been minimized.

Copy link
@fenik17

fenik17 Sep 30, 2017

Contributor

IIUC the RFC 5987 prescribe decoding not only filename header value? Otherwise we should not cut * from name for other headers..

This comment has been minimized.

Copy link
@dminkovsky

dminkovsky Sep 30, 2017

Author Contributor

I thought about that too. But I'm not sure that's the case (what other headers are there? they aren't being tested for flow control in this code). I decided to just cover this case and establish how it might be done if someone encountered this problem with other headers.

This comment has been minimized.

Copy link
@fenik17

fenik17 Sep 30, 2017

Contributor

I mean, if we want apply new RFC only for filename header, we should not change the name attribute for other headers. But now you are doing this: name = name.substring(0, last);

This comment has been minimized.

Copy link
@dminkovsky

dminkovsky Sep 30, 2017

Author Contributor

Ah yes that makes sense. I think I avoid this now. Could add a test for it...

@@ -805,6 +796,35 @@ private InterfaceHttpData findMultipartDisposition() {
}
}

private static Pattern doubleSingleQuote = Pattern.compile("''");

This comment has been minimized.

Copy link
@fenik17

fenik17 Oct 1, 2017

Contributor

final


boolean shouldDecode = false;
int last = name.length() - 1;
if (name.charAt(last) == '*' && HttpHeaderValues.FILENAME.contentEquals(name.substring(0, last))) {

This comment has been minimized.

Copy link
@fenik17

fenik17 Oct 1, 2017

Contributor

Would be better to reduce the possible allocations using filename* constant. Something like this:

    private static final String FILENAME_ENCODED = HttpHeaderValues.FILENAME.toString() + '*';

    private Attribute getContentDispositionAttribute(String... values) {
        String name = cleanString(values[0]);
        String value = values[1];

        // See http://www.w3.org/Protocols/rfc2616/rfc2616-sec19.html
        if (HttpHeaderValues.FILENAME.contentEquals(name)) {
            // filename value is quoted string so strip them
            value = value.substring(1, value.length() - 1);
        } else if (FILENAME_ENCODED.equals(name)) {
            // filename value is encoded. See https://tools.ietf.org/html/rfc5987
            name = name.substring(0, name.length() - 1);
            String[] split = doubleSingleQuote.split(value, 2);
            value = QueryStringDecoder.decodeComponent(split[1], Charset.forName(split[0]));
        } else {
            // otherwise we need to clean the value
            value = cleanString(value);
        }
        return factory.createAttribute(request, name, value);
    }

This comment has been minimized.

Copy link
@dminkovsky

dminkovsky Oct 2, 2017

Author Contributor

Thanks. Pushed this version.

@fenik17

This comment has been minimized.

Copy link
Contributor

commented Oct 1, 2017

@dminkovsky RFC says that ext-value can optional contains language information:

foo: bar; title*=iso-8859-1'en'%A3%20rates

Maybe it makes sense to support for this too? At least, take this into account when parsing the value.

@dminkovsky

This comment has been minimized.

Copy link
Contributor Author

commented Oct 1, 2017

Yes, I thought about that, but what would we do with it? Add a FileUpload attribute?

@dminkovsky dminkovsky force-pushed the dminkovsky:decode-field-parameters branch 2 times, most recently from 382ca6c to c24a4d4 Oct 2, 2017

@fenik17

This comment has been minimized.

Copy link
Contributor

commented Oct 2, 2017

Yes, I thought about that, but what would we do with it?

Ignoring? Just split value through single quote "'".

String[] split = value.split("'", 3);
value = QueryStringDecoder.decodeComponent(split[2], Charset.forName(split[0]))
@dminkovsky

This comment has been minimized.

Copy link
Contributor Author

commented Oct 2, 2017

@fenik17

This comment has been minimized.

Copy link
Contributor

commented Oct 2, 2017

I suggest not doing anything with the language field. Just skip this.
But the possibility of its presence compels us to split the header value through single quote into three parts.

@dminkovsky dminkovsky force-pushed the dminkovsky:decode-field-parameters branch from c24a4d4 to 6555dd7 Oct 2, 2017

@normanmaurer
Copy link
Member

left a comment

LGTM

@normanmaurer normanmaurer self-assigned this Oct 2, 2017

@normanmaurer normanmaurer added the defect label Oct 2, 2017

@normanmaurer normanmaurer added this to the 4.0.53.Final milestone Oct 2, 2017

// See http://www.w3.org/Protocols/rfc2616/rfc2616-sec19.html
if (HttpHeaderValues.FILENAME.contentEquals(name)) {
// filename value is quoted string so strip them
value = value.substring(1, value.length() - 1);

This comment has been minimized.

Copy link
@carl-mastrangelo

carl-mastrangelo Oct 2, 2017

Member

consider checking that the first and last chars are actually quotes.

This comment has been minimized.

Copy link
@dminkovsky

dminkovsky Oct 4, 2017

Author Contributor

this was here before this PR. should i modify this behavior in this PR?

This comment has been minimized.

Copy link
@dminkovsky

dminkovsky Oct 6, 2017

Author Contributor

Adding this check too.

// filename value is encoded. See https://tools.ietf.org/html/rfc5987
name = name.substring(0, name.length() - 1);
String[] split = DOUBLE_SINGLE_QUOTE.split(value, 2);
value = QueryStringDecoder.decodeComponent(split[1], Charset.forName(split[0]));

This comment has been minimized.

Copy link
@carl-mastrangelo

carl-mastrangelo Oct 2, 2017

Member

Here too, you might consider check that the array size is 2 so that you don't throw an NPE.

This comment has been minimized.

Copy link
@dminkovsky

dminkovsky Oct 4, 2017

Author Contributor

Considered this, but if not an NPE, then what? There are other spots in the code around this PR that split and don't check.


// https://github.com/netty/netty/pull/7265
@Test
public void testDecodeContentDispositionFieldParameters() throws Exception {

This comment has been minimized.

Copy link
@carl-mastrangelo

carl-mastrangelo Oct 2, 2017

Member

add a negative case? perhaps a malformed header?

This comment has been minimized.

Copy link
@dminkovsky

dminkovsky Oct 4, 2017

Author Contributor

Yes, makes sense. Will do.

This comment has been minimized.

Copy link
@dminkovsky

dminkovsky Oct 6, 2017

Author Contributor

Added a few tests.

@dminkovsky dminkovsky force-pushed the dminkovsky:decode-field-parameters branch from 6555dd7 to 7742cbc Oct 5, 2017

@dminkovsky

This comment has been minimized.

Copy link
Contributor Author

commented Oct 5, 2017

@fenik17 just looked again at your previous comment and yes, of course, it's much better this way. thank you for your reviews!

@dminkovsky dminkovsky force-pushed the dminkovsky:decode-field-parameters branch 2 times, most recently from 5bf0c4c to 5de5c86 Oct 5, 2017

dminkovsky added a commit to dminkovsky/netty that referenced this pull request Oct 6, 2017

dminkovsky added a commit to dminkovsky/netty that referenced this pull request Oct 6, 2017

@dminkovsky dminkovsky force-pushed the dminkovsky:decode-field-parameters branch 2 times, most recently from 1354351 to 2b5dd05 Oct 6, 2017

dminkovsky added a commit to dminkovsky/netty that referenced this pull request Oct 6, 2017

@carl-mastrangelo
Copy link
Member

left a comment

One comment but otherwise LGTM

final DefaultFullHttpRequest req = new DefaultFullHttpRequest(HttpVersion.HTTP_1_1,
HttpMethod.POST,
"http://localhost",
Unpooled.wrappedBuffer(body.getBytes()));

This comment has been minimized.

Copy link
@carl-mastrangelo

carl-mastrangelo Oct 6, 2017

Member

@normanmaurer do your tests require you to deref all buffers?

This comment has been minimized.

Copy link
@normanmaurer

normanmaurer Oct 23, 2017

Member

yep I think we should call req.release()

This comment has been minimized.

Copy link
@normanmaurer

normanmaurer Oct 24, 2017

Member

@dminkovsky please address this

This comment has been minimized.

Copy link
@dminkovsky

dminkovsky Oct 24, 2017

Author Contributor

@normanmaurer done. Please have a look. Thank you.

if (HttpHeaderValues.FILENAME.contentEquals(name)) {
// filename value is quoted string so strip them
int last = value.length() - 1;
if (value.charAt(0) == HttpConstants.DOUBLE_QUOTE && value.charAt(last) == HttpConstants.DOUBLE_QUOTE) {

This comment has been minimized.

Copy link
@fenik17

fenik17 Oct 6, 2017

Contributor

if (last > 0 && ...

This comment has been minimized.

Copy link
@dminkovsky

dminkovsky Oct 7, 2017

Author Contributor

Ah yes, thank you

value         = token / quoted-string

https://tools.ietf.org/html/rfc5987#section-3.2.1

@dminkovsky dminkovsky force-pushed the dminkovsky:decode-field-parameters branch from 2b5dd05 to 06dbf04 Oct 7, 2017

dminkovsky added a commit to dminkovsky/netty that referenced this pull request Oct 7, 2017

}
} else if (FILENAME_ENCODED.equals(name)) {
try {
name = name.substring(0, name.length() - 1);

This comment has been minimized.

Copy link
@fenik17

fenik17 Oct 7, 2017

Contributor

We can avoid an extra allocation:

name = HttpHeaderValues.FILENAME.toString();

This comment has been minimized.

Copy link
@dminkovsky

dminkovsky Oct 21, 2017

Author Contributor

Yes, makes sense. Fixed. Thanks.

@dminkovsky

This comment has been minimized.

Copy link
Contributor Author

commented Oct 13, 2017

By the way, could this make 4.1.17? I see only a 4.0 tag. I am wondering because I am on 4.1

@normanmaurer

This comment has been minimized.

Copy link
Member

commented Oct 13, 2017

Everything that is merged into 4.0 will also be merged Into 4.1

@dminkovsky dminkovsky force-pushed the dminkovsky:decode-field-parameters branch from 06dbf04 to 9f4b58a Oct 21, 2017

dminkovsky added a commit to dminkovsky/netty that referenced this pull request Oct 21, 2017

@dminkovsky

This comment has been minimized.

Copy link
Contributor Author

commented Oct 21, 2017

Good to hear. Thank you.

@dminkovsky dminkovsky force-pushed the dminkovsky:decode-field-parameters branch from 9f4b58a to c485b39 Oct 24, 2017

@normanmaurer

This comment has been minimized.

Copy link
Member

commented Oct 24, 2017

Cherry-picked into 4.1 (8aeba78) and 4.0 (82b7103).

@dminkovsky thanks!

@fenik17

This comment has been minimized.

Copy link
Contributor

commented Oct 30, 2017

@normanmaurer this is not pushed into 4.0?

@normanmaurer

This comment has been minimized.

Copy link
Member

commented Oct 30, 2017

@fenik17 somehow I did not... Thanks for pinging. Cherry-picked as 82b7103

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.