Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add --doctype option to NOT MEDDLE with the doctype found in the input #435

Closed
jidanni opened this issue Jul 9, 2016 · 13 comments
Closed

Comments

@jidanni
Copy link
Contributor

jidanni commented Jul 9, 2016

I think it is unconscionable that you have no --doctype option,
including the default (auto) that will LEAVE THE USER'S DOCTYPE ALONE NO
MATTER WHAT.

No matter what he picks, there is a danger, however slight, that the
doctype currently sitting in his input file will get changed under his
feet, behind his back, or whatever you want to call it.

Why don't you add a option to not mess with his doctype, and just print
errors against that doctype, but don't switch it for him. OK?

tidy:
Installed: 1:5.2.0-1.1

@jidanni
Copy link
Contributor Author

jidanni commented Jul 9, 2016

OK maybe --doctype auto doesn't in fact change the doctype found. But at least the man page should say that it is guaranteed not to.

Also the way the man page is written, the user might really try "--doctype user".

@jidanni
Copy link
Contributor Author

jidanni commented Jul 9, 2016

P.S.,

tidy --doctype '"-//ACME//DTD HTML 3.14159//EN"'``` didn't do anything, and

tidy --doctype: '"-//ACME//DTD HTML 3.14159//EN"'``` made an error.

@geoffmcl
Copy link
Contributor

@jidanni thanks for the (evolving) issue...

I certainly agree, the use of --doctype [enum] has some very weird problems...

Enumerate each, and file a separate issue... of course each issue should include tidy libtidy version, sample html, out html, config used, expected html, comments, etc...

And absolutely agree, the present docs help, does imply --doctype user will work! And certainly needs enhancing... a docs issue... please help...

So I agree, you have opened a can of worms... and that is good... separating and isolating the problems, one by one, may lead to a faster solution...

Or do you have a PR for review?

For sure, several --doctype [enum] problems exist, and should be isolated, addressed, and fixed... thanks...

@geoffmcl geoffmcl added this to the 5.3 milestone Jul 10, 2016
smcv added a commit to smcv/html-tidy that referenced this issue Jul 22, 2016
tidy-html5 currently doesn't preserve user-supplied DOCTYPEs
in output: <htacg/tidy-html5#435>

Signed-off-by: Simon McVittie <smcv@debian.org>
@jidanni
Copy link
Contributor Author

jidanni commented Dec 8, 2016

Add a switch to make tidy promise not to change the doctype when in auto
mode. One likes how tidy detects doctypes, but doesn't like it changing
them. Here's my workaround:

$ cat dantidy
#!/bin/sh
# Copyright       : http://www.gnu.org/licenses/gpl.html
# Author          : http://jidanni.org/
# Last Modified On: Thu Dec  8 20:07:16 2016
# Update Count    : 33
set -eu
f=/tmp/dantidytmp_$USER
cat > $f
# Ensure tidy won't change the DOCTYPE it finds:
case $(sed 2q $f) in
    *strict*) DT=strict;;
    *loose*)  DT=loose;;
    '<!DOCTYPE html>'*) DT=html5;;
    *) exit 99;;
esac
perl -C -pwe 's/[^[:ascii:]]/sprintf "\\x{%04x}",ord $&/ge' $f| #my uni2ascii(1)
    tidy --doctype $DT --gnu-emacs yes --indent-spaces 1 --indent auto -utf8 \
	 --tidy-mark no -quiet --fix-backslash no --enclose-block-text yes \
	 --vertical-space yes "$@" |
    perl -C -pwe 's/\\x\{([[:xdigit:]]{4})\}/chr eval "0x$1"/eg' #my ascii2uni(1)
#(not sure if I still need all the uni2ascii stuff.)

@geoffmcl
Copy link
Contributor

Also see issue #472, with suggested patch, and a related config issue #468. which needs testing...

Specifically concerning the failure to use the user supplied <fpi> string, marking this as a bug!

@geoffmcl geoffmcl added the Bug label Dec 16, 2016
@brlin-tw
Copy link
Contributor

brlin-tw commented Dec 29, 2016

Provide my case:

I'm working on a few mixed doctype webpages using legacy attributes, and this particular XHTML 1.0 Transitional webpage:

line 23 column 5 - Warning: <table> proprietary attribute "bordercolor"
(..stripped...)
line 339 column 5 - Warning: <table> proprietary attribute "bordercolor"
Info: Doctype given is "-//W3C//DTD XHTML 1.0 Transitional//EN"
Info: Document content looks like HTML Proprietary

the doctype: auto option causes the XHTML 1.0 Transitional DTD being removed and on the second call it adds a XHTML5 DTD, claiming the page looks like XHTML5, and output a bunch of compliance warning since the page is in fact, not XHTML5:

line 2 column 1 - Warning: missing <!DOCTYPE> declaration
(... a bunch of these, stripped)
line 1127 column 5 - Warning: The summary attribute on the <table> element is obsolete in HTML5
line 20 column 3 - Warning: <div> attribute "align" not allowed for XHTML5
(... a bunch of these, stripped)
line 1169 column 13 - Warning: <div> attribute "align" not allowed for XHTML5
line 21 column 5 - Warning: <table> proprietary attribute "bordercolor"
(..stripped...)
line 337 column 5 - Warning: <table> proprietary attribute "bordercolor"
Info: Document content looks like XHTML5
Tidy found 197 warnings and 0 errors!

@geoffmcl
Copy link
Contributor

Since we are about to release 5.4, and no work has been done on this, can only move it out to next 5.5.

Although I would have to say it is a little unclear exactly what is expected of tidy, so certainly need further clarification... thanks...

@geoffmcl geoffmcl modified the milestones: 5.5, 5.3 Feb 28, 2017
@jidanni
Copy link
Contributor Author

jidanni commented Feb 28, 2017 via email

@geoffmcl
Copy link
Contributor

geoffmcl commented Mar 1, 2017

@jidanni I guess I am still very confused... tidy does not change the doctype...

If you leave the --doctype auto, the default, you do not seem to have given us one simple example where tidy changed your existing valid doctype in the output... it may, if none, or is invalid...

So while it is agreed the documentation presently says Tidy will use an educated guess based upon the contents of the document., that seems to only mean it will use your valid doctype in the output, but may add an Info: output suggesting what it found...

So it seems tidy does not decide a different doctype would be more suitable, and change the output doctype, but merely shows, as an Info: type message, of what it would suggest...

So give us an example where tidy did change the doctype, while in auto mode, and maybe we can do something...

Otherwise this seems closed, except for improving the quickref/man tidy description output... and I hope that will be addressed in 5.5, when we will address many message outputs...

Or have I got something really wrong here? Pleas help... thanks...

@jidanni
Copy link
Contributor Author

jidanni commented Mar 1, 2017 via email

@geoffmcl
Copy link
Contributor

geoffmcl commented Mar 1, 2017

Sounds good... am adding the Docs. and Feature Request tags in the hope that this will get picked up in the full 5.5 documentation review, and removing the Bug tag... thanks...

@balthisar
Copy link
Member

Added #614 PR to address the documentation issue.

balthisar added a commit that referenced this issue Sep 26, 2017
Addresses #435 by updating documentation.
@balthisar
Copy link
Member

Documentation now live as requested, so I will clear this from the queue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants