You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Tidy transforms some valid XHTML file into an invalid one.
For instance, the source has:
<ul class="ul"><li class="li"></li></ul>
which is valid. Tidy removes the empty li, but not the ul (this
doesn't happen if one removes the class attribute), so that one
gets:
<ul class="ul"></ul>
which is invalid (there must be at least one li).
Sample test case:
<?xml version="1.0" encoding="utf-8"?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><htmlxmlns="http://www.w3.org/1999/xhtml"><!-- $Id: tidy-empty-list.html 43963 2011-05-26 12:08:28Z vinc17/ypig $ --><head><title>Test of tidy on an empty list</title></head><body><p>Debian's <cite>Tidy</cite> 20091223cvs-1 transforms this valid XHTML
file into an invalid one: it removes the empty <samp>li</samp> but keeps
the <samp>ul</samp> element due to its <samp>class</samp> attribute!</p><ulclass="ul"><liclass="li"></li></ul></body></html>
The text was updated successfully, but these errors were encountered:
@hosiet thank you for cross posting this here... and the sample xhtml...
I can confirm that even current tidy 5.7.16, will drop the empty <li>, as does that old20091223cvs-1 version...
In the current version you can add --drop-empty-elements no option to the config to avoid this...
But this ref - https://www.w3.org/2010/04/xhtml10-strict.html#elem_ul - says At least one of li, thus as you suggest, an empty list is invalid in XHTML - need more W3C references - and libtidy needs a fix... should not be difficult...
Appreciate further feedback, patches or PR... thanks...
@hosiet looking further into this... at first I though it might be a HTML4/one or more li, versus HTML5/0 or more li, something addressed in #396... but now think this is maybe a configuration issue...
If you tell tidy the input is to be treated as well formed XML, with either -xml, or --input-xml yes, then the TY_(ParseXMLDocument)(TidyDocImpl* doc) would be used, which does not end the parsing with TY_(DropEmptyElements)(doc, &doc->root); and I think you will get the desired output...
F:\Projects\tidy-test\test>tidy5 -v
HTML Tidy for Windows version 5.7.16
F:\Projects\tidy-test\test>tidy5 -xml input5\in_768.html
No warnings or errors were found.
<?xml version="1.0" encoding="utf-8"?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><htmlxmlns="http://www.w3.org/1999/xhtml"><!-- $Id: tidy-empty-list.html 43963 2011-05-26 12:08:28Z vinc17/ypig $ --><head><title>Test of tidy on an empty list</title></head><body><p>Debian's
<cite>Tidy</cite>20091223cvs-1 transforms this valid XHTML file
into an invalid one: it removes the empty
<samp>li</samp>but keeps the
<samp>ul</samp>element due to its
<samp>class</samp>attribute!</p><ulclass="ul"><liclass="li"></li></ul></body></html>
F:\Projects\tidy-test\test>tidy-2009 -v
HTML Tidy for Windows released on 25 March 2009
**same output**
As can be seen, this also works for the tidy-2009 release...
To repeat, this only happens if tidy is allowed to default to using its HTML parser... where, at least in HTML5, such a deletion is not a problem... and can be overridden with the option --drop-empty-elements no, as a user choice...
The static Bool CanPrune(...) service could be enhanced to do some check on the tidy mode, if this problem needs to be addressed in HTML4 documents... but maybe that could be addressed as a separate new issue... thanks...
Does this solve the problem of deleting the empty <li>... in valid xhtml... thanks...
I'm forwarding some longstanding downstream issues here, one of which is about empty list. Previous reports:
Tidy transforms some valid XHTML file into an invalid one.
For instance, the source has:
<ul class="ul"><li class="li"></li></ul>
which is valid. Tidy removes the empty li, but not the ul (this
doesn't happen if one removes the class attribute), so that one
gets:
<ul class="ul"></ul>
which is invalid (there must be at least one li).
Sample test case:
The text was updated successfully, but these errors were encountered: