You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Jsoup should handle the encoding on its side and connect without problem.
Already-tested ideas
I tried the following:
Encode the whole URL with URI.toASCIIString()
val url ="https://example.com/unicode/שלום"val encodedUrl =URI(url).toASCIIString()
val doc:Document=Jsoup.connect(encodedUrl)
.followRedirects(true)
.get()
val url ="https://example.com/unicode/שלום"val encodedUrl =URLEncoder.encode(url, StandardCharsets.UTF_8.toString())
val doc:Document=Jsoup.connect(encodedUrl)
.followRedirects(true)
.get()
returns:
java.lang.IllegalArgumentException: The supplied URL, 'https%3A%2F%2Fexample.com%2Funicode%2F%D7%A9%D7%9C%D7%95%D7%9D', is malformed. Make sure it is an absolute URL, and starts with 'http://' or 'https://'. See https://jsoup.org/cookbook/extracting-data/working-with-urls
Encode just the path with URLEncoder
val url ="https://example.com/unicode/"+URLEncoder.encode("שלום", StandardCharsets.UTF_8.toString()) // yields https://example.com/unicode/%D7%A9%D7%9C%D7%95%D7%9Dval doc:Document=Jsoup.connect(url)
.followRedirects(true)
.get()
Also with 0121311, if the query string contains non-ascii, we normalize that to ascii. Any existing escapes are preserved (which is why this impl is more complicated than just decoding the URL components and then constructing a URI and letting that encode -- existing escapes would get incorrectly smooshed)
Hello,
Testing with Jsoup 1.15.4 (latest on this date), it seems Jsoup is unable to retrieve pages published on URLs with a Unicode path.
Description of the problem
returns:
Expected behaviour
Jsoup should handle the encoding on its side and connect without problem.
Already-tested ideas
I tried the following:
Encode the whole URL with
URI.toASCIIString()
returns:
Encode the whole URL with
URLEncoder
returns:
Encode just the path with
URLEncoder
returns:
It seems that Jsoup is doing a double encoding on its end, but I might be wrong.
Thank you in advance for your help!
The text was updated successfully, but these errors were encountered: