Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Too many title elements in <title>" should say "Too many title elements in <head>" #692

Closed
petdance opened this issue Mar 13, 2018 · 7 comments
Labels
Milestone

Comments

@petdance
Copy link
Contributor

petdance commented Mar 13, 2018

There are no title elements in <title>. They're in .

$ cat titles.html
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
    <HEAD>
        <TITLE>Test stuff</TITLE>
        <TITLE>As if one title isn't enough</TITLE>
    </HEAD>
    <BODY>
        <P>This is my paragraph</P>
    </BODY>
</HTML>
$ tidy -q -e titles.html
line 2 column 5 - Warning: too many title elements in <title>
@petdance petdance added the Bug label Mar 13, 2018
@petdance
Copy link
Contributor Author

petdance commented Mar 13, 2018

In src/message.c, at line 852, this seems to be the problem:

        case COERCE_TO_ENDTAG:
        case NON_MATCHING_ENDTAG:
        case TOO_MANY_ELEMENTS_IN:
            return TY_(tidyMessageCreateWithNode)(doc, rpt, code, level, node->element, node->element );

I would think that the second node->element should be something else. But I don't know enough about tidy internals to say for sure.

@geoffmcl
Copy link
Contributor

@petdance thank you for the issue... good to have you back...

Yes, that is indeed a message bug.

The following patch fixes it, but still testing, checking if there is an alternative, and if what I have done will not lead to other problems, like is the node and element always valid for this message... it seems so, but...

diff --git a/src/message.c b/src/message.c
index eb097de..0c1ff4c 100644
--- a/src/message.c
+++ b/src/message.c
@@ -851,8 +851,10 @@ TidyMessageImpl *formatStandard(TidyDocImpl* doc, Node *element, Node *node, uin
 
         case COERCE_TO_ENDTAG:
         case NON_MATCHING_ENDTAG:
-        case TOO_MANY_ELEMENTS_IN:
             return TY_(tidyMessageCreateWithNode)(doc, rpt, code, level, node->element, node->element );
+        case TOO_MANY_ELEMENTS_IN:
+            return TY_(tidyMessageCreateWithNode)(doc, rpt, code, level, node->element, element->element);
+
     }
 
     return NULL;

Now the warning is correct -

line 2 column 5 - Warning: too many title elements in <head>

But tidy does nothing about fixing the problem, like dropping any 2nd or subsequent <title> and thus the output xhtml, will not pass the W3C validator...

Or maybe tidy should only keep the last, which is probably what a browser would show...

I think it should do a fix... it does for many other warnings...

tidy also has a changed doctype to strict, not sure why! But that does not effect the validator results...

And without the added option --add-meta-charset yes, will get a validator note about no meta charset declaration in the document...

So there seems more to do here... perhaps as separate issues...

Further feedback, testing, welcome, or an alternate patch or PR... thanks...

@geoffmcl geoffmcl added this to the 5.7 milestone Mar 14, 2018
geoffmcl added a commit that referenced this issue Mar 17, 2018
@geoffmcl
Copy link
Contributor

@petdance Pushed the above simple message fix to branch issue-692 for consideration...

But seek further feedback on whether this should also include deleting such multiple <title> elements, and as suggested maybe trying to keep the last, if possible, but at least all after the first... to ensure tidy outputs valid html... thanks...

@geoffmcl
Copy link
Contributor

@petdance Have done further testing on this, and in the case of multiple <title> elements it seems in the 4 browsers I tested, I was surprised to find that they only showed the first. Others seem to just be dropped...

That makes deleting subsequent <title> elements relatively easy in tidy...

So will work on extending the issue-692 branch to do this...

As always further feedback very welcome... thanks...

geoffmcl added a commit that referenced this issue Mar 27, 2018
@geoffmcl
Copy link
Contributor

@petdance have now pushed a further fix to the issue-692 branch to delete any subsequent <title> tags...

In this case it was not so easy. It was not possible to delete the additional <title> node in the parser.c module where the first message comes from. So I created a new service TY_(CleanHead)(doc) in clean.c, called during the post parsing, in tidyDocCleanAndRepair phase.

At this moment now we get two warnings, the original, during the parsing phase, and the second during the post clean-up phase, and on your sample get -

line 2 column 5 - Warning: too many title elements in <head>
line 2 column 5 - Info: <head> previously mentioned
line 4 column 9 - Warning: discarding unexpected <title>

Maybe we could delete the warning during the parsing, and only leave the clean-up phase. Or leave them both? What do you or others think? Thanks...

And this seems to show up another possible item, and that is the line numbers.

In the actual document the second <title> is actually on line 4 column 9, as the second warning correctly states. The first warning shows the location of the <head> tag, which is technically correct in that the message mentions the <head>, but now seems less informative.

Maybe this is an additional reason to to drop the first warning...

Also, I used a message that already existed. Maybe we need to have a more specific message like say discard duplicate <title> tag, or something, but that could be a separate issue...

Look forward to further feedback and testing on this issue-692 branch... thanks...

@petdance
Copy link
Contributor Author

Is there something specific you want me to look at? My time right now is being spent on the HTML::Tidy5 Perl wrapper around it.

balthisar added a commit that referenced this issue Jun 30, 2021
@balthisar
Copy link
Member

Closing due to merging of the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants