Skip to content

Commit

Permalink
Fixed doctype tokeniser to allow whitespace between name and public i…
Browse files Browse the repository at this point in the history
…dentifier.
  • Loading branch information
jhy committed Aug 28, 2011
1 parent c98349a commit 70b2cf9
Show file tree
Hide file tree
Showing 3 changed files with 13 additions and 1 deletion.
2 changes: 2 additions & 0 deletions CHANGES
Expand Up @@ -18,6 +18,8 @@ jsoup changelog
* Tweaked escaped entity detection in attributes to not treat &entity_... as an entity form.
<https://github.com/jhy/jsoup/issues/129>

* Fixed doctype tokeniser to allow whitespace between name and public identifier.

*** Release 1.6.1 [2011-Jul-02]
* Fixed Java 1.5 compatibility.
<https://github.com/jhy/jsoup/issues/103>
Expand Down
4 changes: 3 additions & 1 deletion src/main/java/org/jsoup/parser/TokeniserState.java
Expand Up @@ -1364,7 +1364,9 @@ void read(Tokeniser t, CharacterReader r) {
t.transition(Data);
return;
}
if (r.matches('>')) {
if (r.matchesAny('\t', '\n', '\f', ' '))
r.advance(); // ignore whitespace
else if (r.matches('>')) {
t.emitDoctypePending();
t.advanceTransition(Data);
} else if (r.matchConsumeIgnoreCase("PUBLIC")) {
Expand Down
8 changes: 8 additions & 0 deletions src/test/java/org/jsoup/parser/ParserTest.java
Expand Up @@ -623,4 +623,12 @@ public class ParserTest {
Document doc = Jsoup.parse("<a \n href=\"one\" \r\n id=\"two\" \f >");
assertEquals("<a href=\"one\" id=\"two\"></a>", doc.body().html());
}

@Test public void handlesWhitespaceInoDocType() {
String html = "<!DOCTYPE html\n" +
" PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\"\n" +
" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">";
Document doc = Jsoup.parse(html);
assertEquals("<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">", doc.childNode(0).outerHtml());
}
}

0 comments on commit 70b2cf9

Please sign in to comment.