New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should percent encode the query key and value when being encoded as str #100
Comments
Is the URL |
Sorry, I was wrong, the But should not be decoded in query neither? import furl
url = 'http://host/?q=://'
assert furl.furl(url).url == url
# not consistent
url_percent_encoded = 'http://host/?q=%3A%2F%2F'
assert furl.furl(url_percent_encoded).url != url_percent_encoded |
It's ambiguous. Both
and
are perfectly valid URLs, so furl has a choice of what to do here.
The equality url == furl(url).url is not a goal of furl. There are many URLs that furl happily accepts where that >>> from furl import furl
>>> url = 'https://www.google.com/a b/?a=a b'
>>> f = furl(url)
>>> f.url
'https://www.google.com/a%20b/?a=a+b'
>>> url == f.url
False Zooming out, is there a reason you'd prefer furl to encode |
Yes. I used furl in a proxy server.
I did not know why the client encode the query as percent encoding. I suggest furl only decode or encode the reserved characters. But not the decodeURI('%2f') == "%2f"
encodeURI('/') == '/'
encodeURI(' ') == '%20'
decodeURI('%20') == ' ' |
Interesting. Which software complained about the URL
? And what was the warning or error message, if available? As aforementioned, If URLs like
to
to be safe. |
I think the client encode the query component with
UPDATE:
|
I'm afraid I don't quite follow. Please elaborate on what you mean by
So if
? |
# client side
# the pseudo code for simulating what client do
target = encodeURIComponent("http://target.com/") # js function encodeURIComponent
print(target) # -> "http%3A%2F%2Ftarget.com%2F"
hash = hash_with_md5(target)
print(hash) # -> "2063c1608d6e0baf80249c42e2be5804"
original_url_sent_by_client = f"http://server/?q={target}&h={hash}"
# proxy server side
parsed_url = furl(original_url_sent_by_client)
# do log some values in the parsed_url
...
# proxy client side
url_to_be_sent = parsed_url.url
print(url_to_be_sent) # -> "http://server/?q=http://target.com/&h=2063c1608d6e0baf80249c42e2be5804"
httpclient.send(url_to_be_sent)
# because the query value changes from "http%3A%2F%2Ftarget.com%2F" to "http://target.com/"
# so the hash value will not match, and the real server complains about that. From https://en.wikipedia.org/wiki/Query_string:
According to the algorithm, only decode(encode(' ')) == ' '
encode(decode('%20'))) == '%20'
decode(encode('=')) == '='
encode(decode('%3D'))) == '%3D'
decode(encode(':')) == ':'
encode(decode('%3A')) == '%3A' # the current furl will return encoded value as ':' not '%3A'
decode(encode('/')) == '/'
encode(decode('%2F')) == '%2F' # the current furl will return encoded value as '/' not '%2F' |
Aha! Thank you for the example with pseudo code. Your situation makes perfect sense Fundamentally,
In all these scenarios, the hash of While as per RFC 3986
furl's decision not to encode them clearly leads to problems. I'll tweak furl's encoding scheme to percent encode sub-delim characters, too. |
Fixed in 68b0cb9. This fix will ship in the next version of furl, v1.2. |
Thanks for the library. |
The aforementioned query encoding overhaul shipped in Furl v1.2. >>> f = furl('http://example.com/')
>>> f.args['redirect'] = 'http://www.example.com/'
>>> f.url
'http://example.com/?redirect=http%3A%2F%2Fwww.example.com%2F' Upgrade with
Thank you for bringing this issue to my attention, @xiren7. Don't hesitate to |
I cannot upgrade to furl 1.2 because of the change in 68b0cb9. I am using furl to return URLs containing query params that are also URLs to applications developed by third parties. Some of them may not use proper decoding, and if I upgrade to version 1.2 I will be breaking compatibility with those apps. |
@youtux Thank you for weighing in! Before I add a parameter to If so, is it realistic to reach out and ask them to switch to proper URL |
It is not easy for me to reach them, and to be honest it would be quite time-consuming. But I saw many application just taking "parsing" query params by splitting the URL with the |
I added a So, to leave URLs in the query unencoded, use >>> f = furl('https://www.google.com/')
>>> f.args['url'] = 'https://lolsup.ru/pepp'
>>> f.tostr()
'https://www.google.com/?url=https%3A%2F%2Flolsup.ru%2Fpepp'
>>> f.tostr(query_dont_quote=True)
'https://www.google.com/?url=https://lolsup.ru/pepp'
Does this suffice for your needs, @youtux? |
Thanks for your support, Ansgar. I’m going to check it out on Wednesday and
I’ll let you know, but from your explaination of the change it’s looking
good.
…On Sat, 6 Oct 2018 at 23:23, Ansgar Grunseid ***@***.***> wrote:
I added a query_dont_quote parameter to furl.tostr() that exempts valid
query characters from percent-encoding, either in their entirety
(query_dont_quote=True) or selectively (e.g. query_dont_quote='/?').
So to leave URLs in the query unencoded as in your example, use
query_dont_quote=True:
>>> f = furl('https://www.google.com/')>>> f.args['url'] = 'https://lolsup.ru/pepp'>>> f.tostr()'https://www.google.com/?url=https%3A%2F%2Flolsup.ru%2Fpepp'>>> f.tostr(query_dont_quote=True)'https://www.google.com/?url=https://lolsup.ru/pepp'
query_dont_quote will ship in the next version of furl, v2.1.0.
Does this suffice for your needs, @youtux <https://github.com/youtux>?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#100 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAvhz8_83cdmm18dzaQshdpDbVatGvpcks5uiR9IgaJpZM4UXpI1>
.
|
I am trying it using commit c30edb2, but I think I found a bug: import furl
embedded_url = furl.furl('https://example.com/foos/21312312321312312?a=1&b=2')
assert set(embedded_url.args) == {'a', 'b'}
wrapping_url = furl.furl('https://mydomain.com/').set(args={'foo': embedded_url}).tostr(query_dont_quote=True)
assert set(furl.furl(wrapping_url).args) == {'foo'} # This fails, 'b' is parsed as an query param of wrapping_url. The problem is that we don't escape |
Thank you for testing, @youtux!
The problem is that it's impossible to satisfy both the above and:
If the receiving application extracts query parameters by splitting on
will be incorrectly parsed as the raw strings This is the original problem I highlighted: furl can only do so much to insulate Nonetheless, if you want encode >>> from furl import furl, Query
>>> url = 'https://example.com/foos/21312312321312312?a=1&b=2'
>>> safe = ''.join(set(Query.SAFE_VALUE_CHARS) - set('?&'))
>>> furl('https://mydomain.com/').set(args={'foo': url}).tostr(query_dont_quote=safe)
'https://mydomain.com/?foo=https://example.com/foos/21312312321312312%3Fa=1%26b=2' Does this suffice for your needs? If not, I will remove the |
The failing test case that I reported has little to do with my specific problems of application doing a naive split on |
@gruns any update on this? |
@youtux Thank you for the ping. I'll look into this again soon. At first glance, your suggestion
seems solid. |
@youtux I removed import furl
embedded_url = furl.furl('https://example.com/foos/21312312321312312?a=1&b=2')
assert set(embedded_url.args) == {'a', 'b'}
wrapping_url = furl.furl('https://mydomain.com/').set(args={'foo': embedded_url}).tostr(query_dont_quote=True)
assert set(furl.furl(wrapping_url).args) == {'foo'} # This fails, 'b' is parsed as an query param of wrapping_url. now runs without Does this solve your original problem? |
Yes, it does. Would you mind adding this scenario to the test suite? |
I added more tests in e9855e8. The aforementioned >>> from furl import furl
>>> url = 'https://example.com/?a=1&b=2'
>>> set(furl(url).args)
{'a', 'b'}
>>> f = furl(url).set(args={'foo': url})
>>> f.tostr(query_dont_quote=True)
'https://example.com/?foo=https://example.com/?a=1%26b=2'
>>> set(f.args)
{'foo'}
>>> f.args['foo']
'https://example.com/?a=1&b=2' Does this resolve your problem, @youtux? If so, I'll close this Issue once v2.1.0 ships. |
It looks like it does. Thanks for taking care of it, @gruns, much appreciated. |
@gruns Any plan to release 2.1.0 anytime soon? |
I use furl(v1.1) in http proxy app, and receive url in server side(the http proxy received from client):
After do something with the parsed url, and generate url in client side(to be sent to the real server):
Should the "redirect=http://www.example.com/" part be percent encoded as original query "redirect=http%3A%2F%2Fwww.example.com%2F"?
The text was updated successfully, but these errors were encountered: