ddfreyne commented Sep 4, 2011

When using HTML, no extra whitespace (indentation) is added. For example:

# => "<pre><code>moo</code></pre>\n"

However, with XHTML (and XML), the original line is split in three lines (pre, code, and /pre) and the middle line is erroneously indented, like this:

# => "<pre>\n  <code>moo</code>\n</pre>\n"
l3x4 commented Oct 1, 2011


I've faced a similar issue with sanitize first, but the author forwarded this to nokogiri:

>> Nokogiri::HTML.fragment('<b><a href="http://foo.com/">foo</a></b><img src="http://foo.com/bar.jpg" />').to_xhtml
=> "<b>\n  <a href=\"http://foo.com/\">foo</a>\n</b><img src=\"http://foo.com/bar.jpg\" />"

Here's the original issue discussion: rgrove/sanitize#47 (comment)

It happens on nokogiri 1.5.0 and ree 1.8.7 (probably ruby 1.8.7 as well)


For XML, you can turn off formatting by calling #to_xml as follows:

doc.to_xml(:save_with => 0)

There's some strangeness in the serializers. It's worth a discussion on the core team. We'll probably revamp this for the next major release.

