-
Notifications
You must be signed in to change notification settings - Fork 413
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tidy seems to get confused by HTML strings in JavaScript blocks. #700
Comments
@petdance thank you for the issue. Rather than Try passing the following to the validator, and you may see what I mean - <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>Is #700-1</title>
</head>
<body>
<script type="text/javascript"><!--
str = '</p>';
--></script>
</body>
</html> So yes, it seems a tidy bug in a As stated seems yet another case where the tidy behaviour should be modified, depending on the current document mode, established by the I am sure there are probably more of these html4 vs html5 parsing as tidy tries to catch up and deal with both fully correctly in one library... Look forward to further feedback, a patch or a PR... should not be too difficult to fix... thanks... |
I'm not sure what you're saying here. Are you saying that the behavior should change if it's HTML 4 vs. HTML 5? |
@petdance in essence, yes... And internally in the library we have many cases of this behavior changing, depending whether tidy detects a legacy html4 or earlier See But there are other places where Unfortunatley, this html4 vs. html5 You may remember back in the history there was a discussion if we should have two separate libraries, But still some obvious work to do on this... thanks... |
@petdance started looking at this, and noted we already have an option, The code is in lexer.c, in the service /*\ if javascript insert backslash before /
* Issue #348 - Add option, escape-scripts, to skip
\*/
if ((TY_(IsJavaScript)(container)) && cfgBool(doc, TidyEscapeScripts))
{
/* Issue #281 - only warn if adding the escape! */
TY_(Report)(doc, NULL, NULL, BAD_CDATA_CONTENT); Thus extending that code a little, with - diff --git a/src/lexer.c b/src/lexer.c
index 3d6a489..ca66aee 100644
--- a/src/lexer.c
+++ b/src/lexer.c
@@ -2384,7 +2384,8 @@ static Node *GetCDATA( TidyDocImpl* doc, Node *container )
/*\ if javascript insert backslash before /
* Issue #348 - Add option, escape-scripts, to skip
\*/
- if ((TY_(IsJavaScript)(container)) && cfgBool(doc, TidyEscapeScripts))
+ if ((TY_(IsJavaScript)(container)) && cfgBool(doc, TidyEscapeScripts) &&
+ !TY_(IsHTML5Mode)(doc) ) /* Is #700 - This only applies to legacy html4 mode */
{
/* Issue #281 - only warn if adding the escape! */
TY_(Report)(doc, NULL, NULL, BAD_CDATA_CONTENT); Still testing this, but looks good... Will get to pushing it to an Meantime, appreciate it if others could apply the patch, build, test and report... thanks |
@petdance have pushed this fix to an Have also now run our regression tests, and find a difference in one(1) test, case 443576... It is a test where the script element contains But the input does not have a The W3C validator chooses to default to Adding a So I see no problem adjusting our test 443576, to either -
Given the legacy nature of our tests I think adding a doctype is best... Look forward to further feedback on this and testing of the |
This seems related to #475. |
@petdance it certainly is... in a way the same, but there the use of Here this patch completely drops the warning and the escaping if still in html5 mode, where it is not needed... but keeps it if in legacy html4 mode, as indicated by the Have create PR #703, to merge this As always, look forward to further feedback on this and testing of the |
I don't know what I would be doing to test the branch's regression tests. Point me in the right direction and I'll see what I can do. |
@petdance the In re-reading the README.md there I am not sure we have kept the But it should be true if you set both to the default Then as the README.md states, there is a directory Of course the This is another reason why by default the console app And in this The tests are in sort of two phases. The first is to run tidy on a test case input, with a config in some cases, and write an ouput html and error file, and the tidy exit is checked, but this is only the first part of the test. Then the There are also several sets of tests, but only the There are also But we do encourage devs to run the |
@petdance have now merged PR #703, and in the Have re-run the Acordingly closing this, but if I missed something, please feel free to re-open, or open a new issue... thanks... |
I believe that the
<script>
tag should be telling tidy not to check its contents, but tidy complains about the</p>
in a string.The text was updated successfully, but these errors were encountered: