Skip to content

Commit

Permalink
Escape URLs via URI components
Browse files Browse the repository at this point in the history
Vs previous method of using u.toExternalForm. This way, URI will encode characters in the path correctly.
  • Loading branch information
jhy committed Jan 5, 2023
1 parent b129bc9 commit 45ed002
Show file tree
Hide file tree
Showing 3 changed files with 9 additions and 7 deletions.
4 changes: 4 additions & 0 deletions CHANGES
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,10 @@ Release 1.15.3 [2022-Aug-24]
lead to incorrect results when parsing from chunked server responses, for e.g.
<https://github.com/jhy/jsoup/issues/1807>

* Bugfix: URLs containing characters such as [ and ] were not escaped correctly, and would throw a
MalformedURLException when fetched.
<https://github.com/jhy/jsoup/issues/1873>

* Build Improvement: added implementation version and related fields to the jar manifest.
<https://github.com/jhy/jsoup/issues/1809>

Expand Down
8 changes: 3 additions & 5 deletions src/main/java/org/jsoup/helper/HttpConnection.java
Original file line number Diff line number Diff line change
Expand Up @@ -125,11 +125,9 @@ private static String encodeUrl(String url) {
static URL encodeUrl(URL u) {
u = punyUrl(u);
try {
// odd way to encode urls, but it works!
String urlS = u.toExternalForm(); // URL external form may have spaces which is illegal in new URL() (odd asymmetry)
urlS = urlS.replace(" ", "%20");
final URI uri = new URI(urlS);
return new URL(uri.toASCIIString());
// run the URL through URI, so components are encoded
URI uri = new URI(u.getProtocol(), u.getUserInfo(), u.getHost(), u.getPort(), u.getPath(), u.getQuery(), u.getRef());
return uri.toURL();
} catch (URISyntaxException | MalformedURLException e) {
// give up and return the original input
return u;
Expand Down
4 changes: 2 additions & 2 deletions src/test/java/org/jsoup/helper/HttpConnectionTest.java
Original file line number Diff line number Diff line change
Expand Up @@ -255,9 +255,9 @@ public void caseInsensitiveHeaders(Locale locale) {
}

@Test public void encodeUrl() throws MalformedURLException {
URL url1 = new URL("http://test.com/?q=white space");
URL url1 = new URL("https://test.com/foo bar/[One]?q=white space#frag");
URL url2 = HttpConnection.encodeUrl(url1);
assertEquals("http://test.com/?q=white%20space", url2.toExternalForm());
assertEquals("https://test.com/foo%20bar/%5BOne%5D?q=white%20space#frag", url2.toExternalForm());
}

@Test public void noUrlThrowsValidationError() throws IOException {
Expand Down

0 comments on commit 45ed002

Please sign in to comment.