Reversing order of DTD & ProcessingInstruction #388

Open
pwim opened this Issue Dec 21, 2010 · 5 comments

Projects

None yet

3 participants

@pwim
pwim commented Dec 21, 2010

The following input

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//WAPFORUM//DTD XHTML Mobile 1.1//EN" "http://www.openmobilealliance.org/tech/DTD/xhtml-mobile11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
    <head>
        <title>Hello world</title>
    </head>
    <body>
        <p>Hello <a href="http://example.org/">world</a>.</p>
    </body>
</html>

gets transformed to

<!DOCTYPE html PUBLIC "-//WAPFORUM//DTD XHTML Mobile 1.1//EN" "http://www.openmobilealliance.org/tech/DTD/xhtml-mobile11.dtd">
<?xml version="1.0" encoding="UTF-8"??>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
    <title>Hello world</title>
  </head>
  <body>
    <p>Hello <a href="http://example.org/">world</a>.</p>
  </body>
</html>

Note that the order of the DTD and ProcessingInstruction are switched. Is this intentional? Is there a way to avoid this?

@flavorjones
Member

Hi!

Questions should go to the nokogiri mailing list. Please read http://nokogiri.org/tutorials/getting_help.html for rationale and guidelines.

Since we're here already, though, you should read the XML 1.0 spec, specifically this section:

http://www.w3.org/TR/REC-xml/#sec-prolog-dtd

which clearly indicates that the DOCTYPE declaration should come first. libxml2 is doing the right thing by emitting the DOCTYPE before the xmldecl.

@zenspider

Mike, read it again. It says the exact opposite:

prolog ::= XMLDecl? Misc* (doctypedecl Misc*)?
@zenspider

I suspect this is a bug in his version of libxml. It seems to work fine on our side with his input.

pwim, what version of libxml are you using?

@pwim
pwim commented Jan 4, 2011

I'm using version 2.7.8 of libxml and version 1.4.4 of nokogiri. The code I'm using is

Nokogiri::HTML(s).to_xhtml

if that helps.

@flavorjones
Member

I believe this is a bug in libxml2. We should probably write a C example of this and report it upstream.

@flavorjones flavorjones added libxml2 and removed flavorjones labels Dec 31, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment