Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HttpPostMultipartRequestDecoder should decode header field parameters #7265

Closed
wants to merge 1 commit into from

Conversation

dminkovsky
Copy link
Contributor

@dminkovsky dminkovsky commented Sep 29, 2017

Motivation:

I am receiving a mutlipart/form_data upload from a Mailgun webhook. This webhook used to send parts like this:

--74e78d11b0214bdcbc2f86491eeb4902
Content-Disposition: form-data; name="attachment-2"; filename="attached_�айл.txt"
Content-Type: text/plain
Content-Length: 32

This is the content of the file

--74e78d11b0214bdcbc2f86491eeb4902--

but now it posts parts like this:

--74e78d11b0214bdcbc2f86491eeb4902
Content-Disposition: form-data; name="attachment-2"; filename*=utf-8''attached_%D1%84%D0%B0%D0%B9%D0%BB.txt

This is the content of the file

--74e78d11b0214bdcbc2f86491eeb4902--

This new format uses field parameter encoding described in RFC 5987. More about this encoding can be found here.

Netty does not parse this format. The result is the filename is not decoded and the part is not parsed into a FileUpload.

Modification:

  • Added failing test in HttpPostRequestDecoderTest.java and updated HttpPostMultipartRequestDecoder.java
  • Refactored to please Netkins

Result:

Fixes #7265 (this):

  • HttpPostMultipartRequestDecoder identifies the RFC 5987 format and parses it.
  • Previous functionality is retained.

@dminkovsky dminkovsky changed the title add test HttpPostMultipartRequestDecoder should decode header field parameters Sep 29, 2017
@dminkovsky
Copy link
Contributor Author

dminkovsky commented Sep 30, 2017

First commit on PR is the actual patch.

Second commit was required to please netkins. I was already 5 indents in.

if (!shouldDecode) {
value = value.substring(1, value.length() - 1);
} else {
String[] split = value.split("''", 2);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider replacing String#split with use precompiled Pattern constant or a value.indexOf("''").

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. I don't have much experience in performant Java string work, so I was following the lead of existing code a few lines up where you have value.split("=", 2). Does the compiler/runtime optimize this case because it's a unit-length string and therefore effectively a char?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. For one-char string a String.split in OpenJDK doesn't use Pattern. But the string "''" has two chars.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah okay thanks. I opted for the precompiled pattern because it seemed simpler.

@@ -805,6 +795,35 @@ private InterfaceHttpData findMultipartDisposition() {
}
}

private Attribute getContentDispositionAttribute(String... values) {
Attribute attribute;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it necessary to declare the attribute here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is how it was before my patch on 4.1. For the actual test/patch, please see my first commit (e4a9853). My second commit, which this comes from, was done because Netkins complained I had gone above 5 indents and should refactor. So I extracted that whole block as its own private function. I would rather not include this second commit in this PR because it obfuscates the patch. But I wanted the green check mark in the list of PRs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. But now there is no need for this. Just use fast return without variable:

    return factory.createAttribute(request, name, value);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes of course. Fixed this.

// See http://www.w3.org/Protocols/rfc2616/rfc2616-sec19.html
if (HttpHeaderValues.FILENAME.contentEquals(name)) {
// filename value is quoted string so strip them
if (!shouldDecode) {
Copy link
Contributor

@fenik17 fenik17 Sep 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC the RFC 5987 prescribe decoding not only filename header value? Otherwise we should not cut * from name for other headers..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that too. But I'm not sure that's the case (what other headers are there? they aren't being tested for flow control in this code). I decided to just cover this case and establish how it might be done if someone encountered this problem with other headers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, if we want apply new RFC only for filename header, we should not change the name attribute for other headers. But now you are doing this: name = name.substring(0, last);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes that makes sense. I think I avoid this now. Could add a test for it...

@@ -805,6 +796,35 @@ private InterfaceHttpData findMultipartDisposition() {
}
}

private static Pattern doubleSingleQuote = Pattern.compile("''");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

final


boolean shouldDecode = false;
int last = name.length() - 1;
if (name.charAt(last) == '*' && HttpHeaderValues.FILENAME.contentEquals(name.substring(0, last))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be better to reduce the possible allocations using filename* constant. Something like this:

    private static final String FILENAME_ENCODED = HttpHeaderValues.FILENAME.toString() + '*';

    private Attribute getContentDispositionAttribute(String... values) {
        String name = cleanString(values[0]);
        String value = values[1];

        // See http://www.w3.org/Protocols/rfc2616/rfc2616-sec19.html
        if (HttpHeaderValues.FILENAME.contentEquals(name)) {
            // filename value is quoted string so strip them
            value = value.substring(1, value.length() - 1);
        } else if (FILENAME_ENCODED.equals(name)) {
            // filename value is encoded. See https://tools.ietf.org/html/rfc5987
            name = name.substring(0, name.length() - 1);
            String[] split = doubleSingleQuote.split(value, 2);
            value = QueryStringDecoder.decodeComponent(split[1], Charset.forName(split[0]));
        } else {
            // otherwise we need to clean the value
            value = cleanString(value);
        }
        return factory.createAttribute(request, name, value);
    }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Pushed this version.

@fenik17
Copy link
Contributor

fenik17 commented Oct 1, 2017

@dminkovsky RFC says that ext-value can optional contains language information:

foo: bar; title*=iso-8859-1'en'%A3%20rates

Maybe it makes sense to support for this too? At least, take this into account when parsing the value.

@dminkovsky
Copy link
Contributor Author

dminkovsky commented Oct 1, 2017

Yes, I thought about that, but what would we do with it? Add a FileUpload attribute?

@dminkovsky dminkovsky force-pushed the decode-field-parameters branch 2 times, most recently from 382ca6c to c24a4d4 Compare October 2, 2017 02:03
@fenik17
Copy link
Contributor

fenik17 commented Oct 2, 2017

Yes, I thought about that, but what would we do with it?

Ignoring? Just split value through single quote "'".

String[] split = value.split("'", 3);
value = QueryStringDecoder.decodeComponent(split[2], Charset.forName(split[0]))

@dminkovsky
Copy link
Contributor Author

dminkovsky commented Oct 2, 2017 via email

@fenik17
Copy link
Contributor

fenik17 commented Oct 2, 2017

I suggest not doing anything with the language field. Just skip this.
But the possibility of its presence compels us to split the header value through single quote into three parts.

Copy link
Member

@normanmaurer normanmaurer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

// See http://www.w3.org/Protocols/rfc2616/rfc2616-sec19.html
if (HttpHeaderValues.FILENAME.contentEquals(name)) {
// filename value is quoted string so strip them
value = value.substring(1, value.length() - 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider checking that the first and last chars are actually quotes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was here before this PR. should i modify this behavior in this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding this check too.

// filename value is encoded. See https://tools.ietf.org/html/rfc5987
name = name.substring(0, name.length() - 1);
String[] split = DOUBLE_SINGLE_QUOTE.split(value, 2);
value = QueryStringDecoder.decodeComponent(split[1], Charset.forName(split[0]));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here too, you might consider check that the array size is 2 so that you don't throw an NPE.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considered this, but if not an NPE, then what? There are other spots in the code around this PR that split and don't check.


// https://github.com/netty/netty/pull/7265
@Test
public void testDecodeContentDispositionFieldParameters() throws Exception {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a negative case? perhaps a malformed header?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, makes sense. Will do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a few tests.

@dminkovsky
Copy link
Contributor Author

@fenik17 just looked again at your previous comment and yes, of course, it's much better this way. thank you for your reviews!

@dminkovsky dminkovsky force-pushed the decode-field-parameters branch 2 times, most recently from 5bf0c4c to 5de5c86 Compare October 6, 2017 15:35
@dminkovsky dminkovsky force-pushed the decode-field-parameters branch 2 times, most recently from 1354351 to 2b5dd05 Compare October 6, 2017 16:30
Copy link
Member

@carl-mastrangelo carl-mastrangelo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One comment but otherwise LGTM

final DefaultFullHttpRequest req = new DefaultFullHttpRequest(HttpVersion.HTTP_1_1,
HttpMethod.POST,
"http://localhost",
Unpooled.wrappedBuffer(body.getBytes()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@normanmaurer do your tests require you to deref all buffers?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep I think we should call req.release()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dminkovsky please address this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@normanmaurer done. Please have a look. Thank you.

if (HttpHeaderValues.FILENAME.contentEquals(name)) {
// filename value is quoted string so strip them
int last = value.length() - 1;
if (value.charAt(0) == HttpConstants.DOUBLE_QUOTE && value.charAt(last) == HttpConstants.DOUBLE_QUOTE) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (last > 0 && ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, thank you

value         = token / quoted-string

https://tools.ietf.org/html/rfc5987#section-3.2.1

}
} else if (FILENAME_ENCODED.equals(name)) {
try {
name = name.substring(0, name.length() - 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can avoid an extra allocation:

name = HttpHeaderValues.FILENAME.toString();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, makes sense. Fixed. Thanks.

@dminkovsky
Copy link
Contributor Author

By the way, could this make 4.1.17? I see only a 4.0 tag. I am wondering because I am on 4.1

@normanmaurer
Copy link
Member

Everything that is merged into 4.0 will also be merged Into 4.1

@dminkovsky
Copy link
Contributor Author

Good to hear. Thank you.

@normanmaurer
Copy link
Member

normanmaurer commented Oct 24, 2017

Cherry-picked into 4.1 (8aeba78) and 4.0 (82b7103).

@dminkovsky thanks!

@fenik17
Copy link
Contributor

fenik17 commented Oct 30, 2017

@normanmaurer this is not pushed into 4.0?

@normanmaurer
Copy link
Member

@fenik17 somehow I did not... Thanks for pinging. Cherry-picked as 82b7103

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants