New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
URI malformed when trying to get URL with URL-encoded chars #420
Comments
Explanation of failure: This website is Take Why this fails now, but not before: The introduction of caching via lukechilds/cacheable-request introduced the package sindresorhus/normalize-url which uses decodeURI internally. This module could perform a best-effort decoding - falling back to the encoded value - when the string is not I don't think a fix, if any, would be applied here directly in Got. |
Thanks for elaborating @brandon93s. I think the correct fix here is to detect the case early in Got and throw a user-friendly error about the URL having an invalid encoding. |
@sindresorhus Are we okay with a brute force Glad to implement... |
Yes |
FWIW, the assumption that any valid http(s) URI must be a UTF-8 octet sequence after percent-decoding is incorrect. (Speaking as co-author of the HTTP specification(s)) |
For a generic HTTP library, not enforcing http/https URLs to be UTF-8 is the right decision. But such a library should make it easy to use UTF-8 for URIs, And wherever possible, servers should use UTF-8 for their URIs if they contain non-ASCII characters, and should use a suitable baseXX encoding for binary data such as digital signatures and the like. Btw, contrary to what @Brandon93 says at the start of this thread, https://www.kinopoisk.ru/community/city/%D2%E0%EB%EB%E8%ED/ is not in Windows-1252 (Western Europe), but in Windows-1251 (Russia). This of course makes sense because the site has a Russian domain name. The city is Таллин, in Latin letters this is Tallin. You can easily check this by using the URL in a browser. Using Windows-1252 makes no sense because there is no language that contains words like "Òàëëèí" (accented vowels only). This shows the advantage of using UTF-8. It avoids the mess of regional encodings, and because of its internal structure cannot easily be mistaken for some other encoding. |
Hello,
Catched URI malformed error in new "got", when I trying to send request to URL with URL-encoded chars.
URL examples:
https://www.kinopoisk.ru/community/city/%D2%E0%EB%EB%E8%ED/
https://www.kinopoisk.ru/news/keyword/%C7%E2%E5%E7%E4%ED%FB%E5+%E2%EE%E9%ED%FB/
nodejs: 9.2.0
got: 8.0.0
Failed code:
Broken at this commit: 3c79205
Any ideas?
The text was updated successfully, but these errors were encountered: