New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
--quote-ampersand yes doesn't work #876
Comments
On 4/25/20, 積丹尼 Dan Jacobson ***@***.***> wrote:
```
$ echo '&' |tidy --show-body-only yes -q --quote-ampersand yes
--preserve-entities yes
& #Hey, the man page said it would convert this to & !
$ echo '&'|tidy --show-body-only yes -q --quote-ampersand yes
--preserve-entities yes
& #Well, at least it didn't destroy it, like the following:
$ echo '&'|tidy --show-body-only yes -q --quote-ampersand yes
&
```
HTML Tidy for Linux version 5.6.0
Two things:
#207
--preserve-entities clashes with --quote-ampersand
…--quote-ampersand works (at least for me) on html 4 docs, so add
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
at the beginning of the file. Html 5 allows plain ampersands so [I'm
guessing that] if you don't tell tidy it's an html 4 doc it uses html
5 rules
|
Well all I know is
But does the spec say isolated, naked |
On 4/30/20, 積丹尼 Dan Jacobson wrote:
Well all I know is
- I want to use HTML5
- I want my naked isolated &'s to be left alone, and not be converted
back to plain &'s,
even if they are allowed. But does the spec say isolated, naked &'s are
no longer allowed?
HTML 4 has a specification. HTML 5 .. is a moving target. Which is a
long way of saying I don't know, but tidy seems to do what you want
$ cat x.htm
<!DOCTYPE HTML>
<html><head><title>foo</title></head>
<body>
<p>This is a plain ampersand: & <br>
this is an encoded ampersand: & <br>
this is an incompletely encoded ampersand: & <br>
and these are unambigious ampersands: S&P S&P500</p>
</body></html>
$ tidy -q --tidy-mark no --preserve-entities yes x.htm
line 6 column 44 - Warning: entity "&" doesn't end in ';'
line 7 column 41 - Warning: unescaped & or unknown entity "&P"
line 7 column 46 - Warning: unescaped & or unknown entity "&P500"
<!DOCTYPE html>
<html>
<head>
<title>foo</title>
</head>
<body>
<p>This is a plain ampersand: &<br>
this is an encoded ampersand: &<br>
this is an incompletely encoded ampersand: &<br>
and these are unambigious ampersands: S&P S&P500</p>
</body>
</html>
|
Oh yeah, I forgot. I also want my raw naked &s to become |
On 5/1/20, 積丹尼 Dan Jacobson ***@***.***> wrote:
Oh yeah, I forgot. I also want my raw naked &s to become `&`s too...
You can try changing the doctype to html 4, running it thru tidy and
then changing it back to html5. But take a look at the "Errors
involving fragile syntax constructs" section here
https://html.spec.whatwg.org/multipage/introduction.html#syntax-errors
I don't know if changing ampersands in hrefs to the ampersand named
entity is going to mess things up for you or not.
$ cat x.htm
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html><head><title>foo</title></head>
<body>
<p>This is a plain ampersand: & <br>
this is an encoded ampersand: & <br>
this is an incompletely encoded ampersand: & <br>
and these are unambigious ampersands: S&P S&P500<br>
An ampersand in an href: <a href="?bill&ted">Bill and Ted</a><br>
</p>
</body></html>
$ tidy -q --tidy-mark no --quote-ampersand yes x.htm
line 4 column 31 - Warning: unescaped & which should be written as &
line 6 column 44 - Warning: entity "&" doesn't end in ';'
line 7 column 41 - Warning: unescaped & or unknown entity "&P"
line 7 column 46 - Warning: unescaped & or unknown entity "&P500"
line 8 column 40 - Warning: unescaped & or unknown entity "&ted"
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<title>foo</title>
</head>
<body>
<p>This is a plain ampersand: &<br>
this is an encoded ampersand: &<br>
this is an incompletely encoded ampersand: &<br>
and these are unambigious ampersands: S&P S&P500<br>
An ampersand in an href: <a href="?bill&ted">Bill and
Ted</a><br></p>
</body>
</html>
|
@jidanni, @ler762 thanks for raising this In addition to #207, see open #861, and maybe others...
The man page is only partially right... Below is a patch to fix this, and #892 - diff --git a/src/language_en.h b/src/language_en.h
index 60bde02..8d0eb7a 100644
--- a/src/language_en.h
+++ b/src/language_en.h
@@ -1117,7 +1117,7 @@ static languageDefinition language_en = { whichPluralForm_en, {
be translated. */
TidyQuoteAmpersand, 0,
"This option specifies if Tidy should output unadorned <code>&</code> "
- "characters as <code>&amp;</code>. "
+ "characters as <code>&amp;</code>, in legacy doctypes only. "
},
{/* Important notes for translators:
- Use only <code></code>, <var></var>, <em></em>, <strong></strong>, and
@@ -2337,7 +2337,7 @@ static languageDefinition language_en = { whichPluralForm_en, {
" of \"--some-option <value>\", for example, \"--indent-with-tabs yes\".\n"
"\n"
" You can also specify a file containing configuration options with the \n"
- " -options <file> directive, or in one or more files specific to your \n"
+ " -config <file> directive, or in one or more files specific to your \n"
" environment (see next section). \n"
"\n"
" For a list of all configuration options, use \"-help-config\" or refer\n" As prepared for issue #807, I think if you use a legacy doctype, like in my in_207-5.html, and you should note the different output using And maybe the Please apply the patch, and test with both legacy and html5 If merged, would this close this issue? Feedback, discussion, samples, other patches, PR, welcome... thanks... |
Docs change for the Is there anything else outstanding here? seems this can be closed... Please feel free to re-open, or file a new issue... thanks... |
HTML Tidy for Linux version 5.6.0
The text was updated successfully, but these errors were encountered: