Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tidy gets confused with a <span> around a block element #709

Closed
petdance opened this issue Apr 3, 2018 · 10 comments
Closed

Tidy gets confused with a <span> around a block element #709

petdance opened this issue Apr 3, 2018 · 10 comments
Labels

Comments

@petdance
Copy link
Contributor

petdance commented Apr 3, 2018

Consider the following HTML file with a <span> wrapped around an <h2>.

$ cat -n bogus.html
     1  <!DOCTYPE html>
     2  <html>
     3      <head>
     4          <title>Blah</title>
     5      </head>
     6      <body>
     7          <span id="bongo">
     8              <h2>heading</h2>
     9          </span>
    10      </body>
    11  </html>

Running tidy on this file gives an incorrect error <span> anchor "bongo" already defined.

$ tidy -v
HTML Tidy for Linux version 5.6.0
$ tidy -q -e bogus.html
line 7 column 9 - Warning: missing </span> before <h2>
line 8 column 17 - Warning: inserting implicit <span>
line 9 column 9 - Warning: discarding unexpected </span>
line 8 column 17 - Warning: <span> anchor "bongo" already defined

The anchor "bongo" has not been defined anywhere else in the file. Note also that the error for "already defined" is on line 8, which also doesn't make sense.

@petdance petdance added the Bug label Apr 3, 2018
@geoffmcl
Copy link
Contributor

geoffmcl commented Apr 4, 2018

@petdance this is the result of tidy trying to fix the bad document...

It correctly closes the span before the <h1>, and then, since some time before the last 2009 csv source release, propagates it, together with the id="bongo", to inside the <h1> ... </h1>... and advises inserting implicit <span>...

See the tidy output...

<body>
  <span id="bongo"></span>
  <h2><span id="bongo">heading</span></h2>
</body>

And the W3C validator also reports this error in your sample -

Error: Element h2 not allowed as child of element span in this context. (Suppressing further errors from this subtree.)
From line 8, column 11; to line 8, column 14
          <h2>headin
Content model for element span:
Phrasing content.

I guess if anything is to be done to improve this, the warning message in this case could be expanded to implicit <span> anchor "bongo" already defined, indicated that tidy added this, or something...

Given the nearly 10+ year history of this particular tidy action, this is verging on Won't fix, except for perhaps the slightly improved message text...

What else can tidy do? Ideas, comments, patches, PR very welcome...

Look forward to further feedback on this... thanks...

@petdance
Copy link
Contributor Author

petdance commented Apr 4, 2018

I guess if anything is to be done to improve this, the warning message in this case could be expanded to implicit anchor "bongo" already defined, indicated that tidy added this, or something...

Or have tidy not report on actions that it has taken itself?

It makes sense to fix the bad tag, but it's confusing for tidy to report on something that is not in the original source HTML.

@geoffmcl
Copy link
Contributor

geoffmcl commented Apr 4, 2018

@petdance thanks for the further feedback...

Or have tidy not report on actions that it has taken itself?

Well I now think tidy has an obligation to report on all changes it makes, as a sort of ever helpful, educational tool, so the user can avoid these in their own html creation....

In fact I think there are already other issues where tidy makes silent fixes, that others see should be reported, or at least reported much more clearly...

See Issue 21 for the first mention I can find on silent fixes. And again Issue 88; and #132, #412, and you can see my opinion on these silent fixes has fluctuated... probably several other issues as well...

I agree that is the case here. Tidy has applied a fix, and then has to warn about that fix duplicating an id, but can not see any other choice yet, aside from adding the implicit to the message, clearly indicating that it is now in html that tidy had to add, as part of the fix, and not in the original user html...

Will certainly consider a patch or PR making this additional statement, or some other way to make the message emitted clearer... but do not see it should be suppressed... thanks...

@petdance
Copy link
Contributor Author

petdance commented Apr 4, 2018

Well I now think tidy has an obligation to report on all changes it makes,

Yes, I agree. It should report all the changes it makes. I just don't think it should report warnings on the changes it makes.

@geoffmcl
Copy link
Contributor

geoffmcl commented Apr 5, 2018

@petdance I am afraid I do not understand! You agree report all, then not report a warning???

It seems you want it reported, but not as a warning. Is that it?

If you read back into some of the comments on the silent changes you will see there was a suggestion at one time that perhaps there should be some other category of messages for these. This was discussed but eventually agreed a warning is a warning...

The html is faulty, be that what the user wrote, or as generated by tidy as part of a fix, and needs to be further fixed, which tidy can not do...

I have agreed the warning message should be improved, and have experimented. Using my in_709.html sample the present output is -

line 7 column 7 - Warning: missing </span> before <h2>
line 8 column 14 - Warning: inserting implicit <span>
line 9 column 7 - Warning: discarding unexpected </span>
line 8 column 14 - Warning: <span> anchor "bongo" already defined

In an issue-709 branch have experimented with -

line 7 column 7 - Warning: missing </span> before <h2>
line 8 column 14 - Warning: inserting implicit <span>
line 9 column 7 - Warning: discarding unexpected </span>
line 8 column 14 - Warning: Implicit <span> anchor "bongo" duplicated.

That much more accurately describes the problem. In creating an implicit span tidy was forced to duplicate the anchor... an implied sorry about that but had no other choice... and maybe you, or others, have further suggestions on improving the message...

And if you still think this should not be a warning, then the same logic could be applied to the second message - warning tidy inserted a <span> - not in the original user html to try to fix the document... and so on to any and probably every implicit action of tidy, including for instance Warning: inserting missing 'title' element...

I think these have to stay as warnings, until a better idea is proposed, fully spec'ed, agreed, coded, and tested...

Still to fully test and push my issue-709 branch, but this is the patch to date -

diff --git a/include/tidyenum.h b/include/tidyenum.h
index 949d464..97131dc 100644
--- a/include/tidyenum.h
+++ b/include/tidyenum.h
@@ -171,6 +171,7 @@ extern "C" {
 #define FOREACH_REPORT_MSG(FN)        \
     FN(ADDED_MISSING_CHARSET)         \
     FN(ANCHOR_NOT_UNIQUE)             \
+    FN(ANCHOR_DUPLICATED)             \
     FN(APOS_UNDEFINED)                \
     FN(ATTR_VALUE_NOT_LCASE)          \
     FN(ATTRIBUTE_IS_NOT_ALLOWED)      \
diff --git a/src/attrs.c b/src/attrs.c
index 2aaf638..9d634ed 100644
--- a/src/attrs.c
+++ b/src/attrs.c
@@ -1659,7 +1659,10 @@ void CheckName( TidyDocImpl* doc, Node *node, AttVal *attval)
 
         if ((old = GetNodeByAnchor(doc, attval->value)) &&  old != node)
         {
-            TY_(ReportAttrError)( doc, node, attval, ANCHOR_NOT_UNIQUE);
+            if (node->implicit) /* Is #709 - improve warning text */
+                TY_(ReportAttrError)(doc, node, attval, ANCHOR_DUPLICATED);
+            else
+                TY_(ReportAttrError)( doc, node, attval, ANCHOR_NOT_UNIQUE);
         }
         else
             AddAnchor( doc, attval->value, node );
@@ -1687,7 +1690,10 @@ void CheckId( TidyDocImpl* doc, Node *node, AttVal *attval )
 
     if ((old = GetNodeByAnchor(doc, attval->value)) &&  old != node)
     {
-        TY_(ReportAttrError)( doc, node, attval, ANCHOR_NOT_UNIQUE);
+        if (node->implicit) /* Is #709 - improve warning text */
+            TY_(ReportAttrError)(doc, node, attval, ANCHOR_DUPLICATED);
+        else
+            TY_(ReportAttrError)( doc, node, attval, ANCHOR_NOT_UNIQUE);
     }
     else
         AddAnchor( doc, attval->value, node );
diff --git a/src/language_en.h b/src/language_en.h
index c84a7d8..e09b62b 100644
--- a/src/language_en.h
+++ b/src/language_en.h
@@ -1924,6 +1924,7 @@ static languageDefinition language_en = { whichPluralForm_en, {
      ********************************************/    
     { ADDED_MISSING_CHARSET,        0,   "Added appropriate missing <meta charset=...> to %s"                      },
     { ANCHOR_NOT_UNIQUE,            0,   "%s anchor \"%s\" already defined"                                        },
+    { ANCHOR_DUPLICATED,            0,   "Implicit %s anchor \"%s\" duplicated."                                   },
     { APOS_UNDEFINED,               0,   "named entity &apos; only defined in XML/XHTML"                           },
     { ATTR_VALUE_NOT_LCASE,         0,   "%s attribute value \"%s\" must be lower case for XHTML"                  },
     { ATTRIBUTE_IS_NOT_ALLOWED,     0,   "%s attribute \"is\" not allowed for autonomous custom tags."             },
diff --git a/src/message.c b/src/message.c
index eb097de..37aacb2 100644
--- a/src/message.c
+++ b/src/message.c
@@ -261,6 +261,7 @@ static struct _dispatchTable {
 } dispatchTable[] = {
     { ADDED_MISSING_CHARSET,        TidyInfo,        formatStandard          },
     { ANCHOR_NOT_UNIQUE,            TidyWarning,     formatAttributeReport   },
+    { ANCHOR_DUPLICATED,            TidyWarning,     formatAttributeReport   },
     { APOS_UNDEFINED,               TidyWarning,     formatStandard          },
     { ATTR_VALUE_NOT_LCASE,         TidyWarning,     formatAttributeReport   },
     { ATTRIBUTE_VALUE_REPLACED,     TidyInfo,        formatAttributeReport   },
@@ -583,6 +584,7 @@ TidyMessageImpl *formatAttributeReport(TidyDocImpl* doc, Node *element, Node *no
             return TY_(tidyMessageCreateWithNode)(doc, node, code, level, tagdesc, name, HTMLVersion(doc));
 
         case ANCHOR_NOT_UNIQUE:
+        case ANCHOR_DUPLICATED:
         case ATTR_VALUE_NOT_LCASE:
         case PROPRIETARY_ATTR_VALUE:
         case XML_ID_SYNTAX:

Anyway, what do you, and others, think of the amended message? Is it worth the effort? Any other suggestions? Thanks...

@petdance
Copy link
Contributor Author

petdance commented Apr 5, 2018

line 8 column 14 - Warning: Implicit anchor "bongo" duplicated.

Sounds good. Even better, something like

line 8 column 14 - Warning: Implicit anchor "bongo" duplicated (added by tidy)

Just so the user knows why the error appears. The problem is that tidy complains about something that's not in the original source, so the user is confused. At least I was.

Also, note that I'm talking about this strictly from the point of error detection. I have never used tidy's cleaned-up code. I just want to know about the errors so I can fix them.

geoffmcl added a commit that referenced this issue Apr 6, 2018
@geoffmcl
Copy link
Contributor

geoffmcl commented Apr 6, 2018

@petdance thanks for the idea... always for more clarity...

How about -

line 8 column 14 - Warning: Implicit <span> anchor "bongo" duplicated by Tidy.

The Implicit <span> already indicates added by tidy, and now it is clear the anchor was duplicated by Tidy at the same time, since it has no way to choose anything else... needs human intervention...

OT: Interestingly, like you, when I used to work on my site, do not do that so much these days, my MO was to usually manually fix warnings/errors tidy showed, and then accept its clean Pretty Printer indented output...

Have now pushed this to the issue-709 branch, and will get around to setting up a PR... are we good to go? thanks...

@petdance
Copy link
Contributor Author

petdance commented Apr 6, 2018

line 8 column 14 - Warning: Implicit <span> anchor "bongo" duplicated by Tidy.

Sure. That makes it clear that it wasn't in my original source code. I might even do it as

line 8 column 14 - Warning: Implicit <span> anchor "bongo" (duplicated by Tidy)

to make it clear that it's a normal error message, but that Tidy was the cause.

@geoffmcl
Copy link
Contributor

geoffmcl commented Apr 6, 2018

@petdance oops, you lost me... how can the addition of bracket improve it?

Not sure how that makes it any more a "normal error message", whatever that is, and it already indicates it was by tidy... but maybe just a question of different styles... hope you can accept mine in this case...

And could you please use triple back-ticks, ```, to quote a block that has html. Seems the single block-quote, >, will lose the display of the <span>, quite important to this message...

Have now created PR #714 to eventually merge this to next... thanks for the collaboration on this...

geoffmcl added a commit that referenced this issue Apr 23, 2018
Is #709 - Improve message if 'implicit' - PR #714
@geoffmcl
Copy link
Contributor

@petdance have merged PR #714, so guess this can be closed...

If I missed something, please feel free to re-open, or file a new issue... thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants