New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ';' to HttpUrl.QUERY_COMPONENT_ENCODE_SET #1710

Closed
wants to merge 1 commit into
base: master
from

Conversation

3 participants
@GUIpsp

GUIpsp commented Jun 18, 2015

This fixes a real world issue found on GAE.

Add ';' to HttpUrl.QUERY_COMPONENT_ENCODE_SET
This fixes a real world issue found on GAE.
@GUIpsp

This comment has been minimized.

Show comment
Hide comment
@GUIpsp

GUIpsp Jun 18, 2015

Also, I think having different encoded sets for different parts of the URL is kinda silly, and we should just encode everything the RFC says we should encode, but that's better left for another patch. This is just a quick fix.

GUIpsp commented Jun 18, 2015

Also, I think having different encoded sets for different parts of the URL is kinda silly, and we should just encode everything the RFC says we should encode, but that's better left for another patch. This is just a quick fix.

@JakeWharton

This comment has been minimized.

Show comment
Hide comment
@JakeWharton

JakeWharton Jun 18, 2015

Collaborator

The HttpUrl class is not implemented to the URI RFC.

On Thu, Jun 18, 2015, 4:53 PM Guilherme Espada notifications@github.com
wrote:

Also, I think having different encoded sets for different parts of the URL
is kinda silly, and we should just encode everything the RFC
http://www.faqs.org/rfcs/rfc3986.html says we should encode, but that's
better left for another patch. This is just a quick fix.


Reply to this email directly or view it on GitHub
#1710 (comment).

Collaborator

JakeWharton commented Jun 18, 2015

The HttpUrl class is not implemented to the URI RFC.

On Thu, Jun 18, 2015, 4:53 PM Guilherme Espada notifications@github.com
wrote:

Also, I think having different encoded sets for different parts of the URL
is kinda silly, and we should just encode everything the RFC
http://www.faqs.org/rfcs/rfc3986.html says we should encode, but that's
better left for another patch. This is just a quick fix.


Reply to this email directly or view it on GitHub
#1710 (comment).

@swankjesse

This comment has been minimized.

Show comment
Hide comment
@swankjesse

swankjesse Jun 18, 2015

Member

What's the bug?

Member

swankjesse commented Jun 18, 2015

What's the bug?

@GUIpsp

This comment has been minimized.

Show comment
Hide comment
@GUIpsp

GUIpsp Jun 19, 2015

The HttpUrl class is not implemented to the URI RFC.

Any reason why it shouldn't?

What's the bug?

Try posting the equivalent of this form with a script that contains ; (say, print ";"). A browser will urlencode the ; and it will work. OKHttp won't, and it will fail.

That form posts to /exec with input set to the python code which returns and id. You then need to get /exec with id set to the ID returned by the post to fetch the result of the evaluation.

GUIpsp commented Jun 19, 2015

The HttpUrl class is not implemented to the URI RFC.

Any reason why it shouldn't?

What's the bug?

Try posting the equivalent of this form with a script that contains ; (say, print ";"). A browser will urlencode the ; and it will work. OKHttp won't, and it will fail.

That form posts to /exec with input set to the python code which returns and id. You then need to get /exec with id set to the ID returned by the post to fetch the result of the evaluation.

@swankjesse

This comment has been minimized.

Show comment
Hide comment
@swankjesse

swankjesse Jun 19, 2015

Member

Got it. This is about form encoding, which we're already planning on fixing.

Member

swankjesse commented Jun 19, 2015

Got it. This is about form encoding, which we're already planning on fixing.

@swankjesse

This comment has been minimized.

Show comment
Hide comment
@swankjesse
Member

swankjesse commented Jun 19, 2015

@swankjesse swankjesse closed this Jun 19, 2015

@GUIpsp

This comment has been minimized.

Show comment
Hide comment
@GUIpsp

GUIpsp Jun 19, 2015

@swankjesse Neat, but this PR also fixes the issue, and I already signed the CLA.

GUIpsp commented Jun 19, 2015

@swankjesse Neat, but this PR also fixes the issue, and I already signed the CLA.

@swankjesse

This comment has been minimized.

Show comment
Hide comment
@swankjesse

swankjesse Jun 19, 2015

Member

Yeah, but this changes the behavior of URL canonicalization, which we don't want. We only want to change form encoding. If you'd like to submit a PR that does that, please do!

Member

swankjesse commented Jun 19, 2015

Yeah, but this changes the behavior of URL canonicalization, which we don't want. We only want to change form encoding. If you'd like to submit a PR that does that, please do!

@GUIpsp

This comment has been minimized.

Show comment
Hide comment
@GUIpsp

GUIpsp Jun 19, 2015

Is the only char that needs to be added ;?

GUIpsp commented Jun 19, 2015

Is the only char that needs to be added ;?

@swankjesse

This comment has been minimized.

Show comment
Hide comment
@swankjesse

swankjesse Jun 19, 2015

Member

@GUIpsp unsure. You could write a little program to see which characters Chrome and Firefox and Safari and IE escape in forms.

Member

swankjesse commented Jun 19, 2015

@GUIpsp unsure. You could write a little program to see which characters Chrome and Firefox and Safari and IE escape in forms.

@GUIpsp

This comment has been minimized.

Show comment
Hide comment
@GUIpsp

GUIpsp Jun 20, 2015

From MDN, encodeURIComponent escapes all characters except the following: alphabetic, decimal digits, - _ . ! ~ * ' ( ). This seems to match the WHATWG's recommendations If the byte is in the range 0x2A, 0x2D, 0x2E, 0x30 to 0x39, 0x41 to 0x5A, 0x5F, 0x61 to 0x7A leave the byte as is. Otherwise encode. However the current code present in okhttp uses a blacklist, not a whitelist. Is it worth it to implement a whitelist-like system for forms?

GUIpsp commented Jun 20, 2015

From MDN, encodeURIComponent escapes all characters except the following: alphabetic, decimal digits, - _ . ! ~ * ' ( ). This seems to match the WHATWG's recommendations If the byte is in the range 0x2A, 0x2D, 0x2E, 0x30 to 0x39, 0x41 to 0x5A, 0x5F, 0x61 to 0x7A leave the byte as is. Otherwise encode. However the current code present in okhttp uses a blacklist, not a whitelist. Is it worth it to implement a whitelist-like system for forms?

@swankjesse

This comment has been minimized.

Show comment
Hide comment
@swankjesse

swankjesse Jun 20, 2015

Member

Yup, OkHttp 2.4.0's behavior is completely wrong, and it's my mistake!

I confirmed with Firefox & Chrome; both do percent-encoding of everything beyond 0x7f which isn't what we're doing.

https://gist.github.com/swankjesse/e46017775bb30f7e52c8

Member

swankjesse commented Jun 20, 2015

Yup, OkHttp 2.4.0's behavior is completely wrong, and it's my mistake!

I confirmed with Firefox & Chrome; both do percent-encoding of everything beyond 0x7f which isn't what we're doing.

https://gist.github.com/swankjesse/e46017775bb30f7e52c8

@swankjesse swankjesse added the bug label Jun 20, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment