Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should tidy allow an empty title element? #839

Closed
lifecrisis opened this issue Sep 1, 2019 · 9 comments
Closed

Should tidy allow an empty title element? #839

lifecrisis opened this issue Sep 1, 2019 · 9 comments

Comments

@lifecrisis
Copy link

Tidy will accept the following minimal document:

<!DOCTYPE html><title></title>

The validator at https://html5.validator.nu/ will accept the same document only if the <title> element is not empty:

<!DOCTYPE html><title>.</title>

Who is correct?

@balthisar
Copy link
Member

No idea where my comment went, but the reason I added "feature request" is because @lifecrisis is right. Tidy even does the wrong thing and adds empty title elements if one is not provided.

@geoffmcl
Copy link
Contributor

@lifecrisis thank you for the issue...

I guess one could ask is <title>.</title> any better than <title></title> ;=)) both bad, but that is not the point here...

Maybe need to also look at legacy documents... using https://validator.w3.org/... it seems to accept <title></title>...

So with experimenting, came up with my first solution... sort of

  • IFF HTML5 - if none, warn, but do not add a blank, AND if a blank, warn...
  • IFF NOT HTML5 - if none, warn, and add a blank, AND if an existing blank, no warning...
diff --git a/src/language_en.h b/src/language_en.h
index 60bde02..65e9ee3 100644
--- a/src/language_en.h
+++ b/src/language_en.h
@@ -2009,7 +2009,7 @@ static languageDefinition language_en = { whichPluralForm_en, {
     { MISSING_SEMICOLON_NCR,        0,   "numeric character reference \"%s\" doesn't end in ';'"                   },
     { MISSING_SEMICOLON,            0,   "entity \"%s\" doesn't end in ';'"                                        },
     { MISSING_STARTTAG,             0,   "missing <%s>"                                                            },
-    { MISSING_TITLE_ELEMENT,        0,   "inserting missing 'title' element"                                       },
+    { MISSING_TITLE_ELEMENT,        0,   "blank or missing 'title' element"                                        },
     { MOVED_STYLE_TO_HEAD,          0,   "moved <style> tag to <head>! fix-style-tags: no to avoid."               },
     { NESTED_EMPHASIS,              0,   "nested emphasis %s"                                                      },
     { NESTED_QUOTATION,             0,   "nested q elements, possible typo."                                       },
diff --git a/src/parser.c b/src/parser.c
index 8569ed5..2f90ebd 100644
--- a/src/parser.c
+++ b/src/parser.c
@@ -4713,7 +4713,8 @@ void TY_(ParseDocument)(TidyDocImpl* doc)
         TY_(ParseHTML)(doc, html, IgnoreWhitespace);
     }
 
-    if (!TY_(FindTITLE)(doc))
+    node = TY_(FindTITLE)(doc);
+    if (!node)
     {
         Node* head = TY_(FindHEAD)(doc);
         /* #72, avoid MISSING_TITLE_ELEMENT if show-body-only (but allow InsertNodeAtEnd to avoid new warning) */
@@ -4721,7 +4722,23 @@ void TY_(ParseDocument)(TidyDocImpl* doc)
         {
             TY_(Report)(doc, head, NULL, MISSING_TITLE_ELEMENT);
         }
-        TY_(InsertNodeAtEnd)(head, TY_(InferredTag)(doc, TidyTag_TITLE));
+
+        /* Is #839 - do not add blank title in HTML5 mode */
+        if (!TY_(IsHTML5Mode)(doc))
+        {
+            TY_(InsertNodeAtEnd)(head, TY_(InferredTag)(doc, TidyTag_TITLE));
+        }
+    } 
+    else if (!node->content)
+    {
+        /* Is #839 - warn node is blank in HTML5 */
+        if (TY_(IsHTML5Mode)(doc))
+        {
+            if (!showingBodyOnly(doc))
+            {
+                TY_(Report)(doc, node, NULL, MISSING_TITLE_ELEMENT);
+            }
+        }
     }
 
     AttributeChecks(doc, &doc->root);

@balthisar good to see you back... I guess the tidy founders did not think it a wrong thing to add a blank... back in 2000... but does seems a bit silly by html5 standards... hope my patch covers it all...

Will try to get around to setting up a PR so others can easily test... unless someone beats me to it, or have another, maybe better idea... and still to run the regression tests, to see if anything pops...

There may be an argument for setting up a separate, clearer, warning messages... for each case...

Look forward to further comments... thanks...

PS: Added 4 tests to https://github.com/geoffmcl/tidy-test/tree/master/test/input5 - showing pass or fail validator

  • in_839.html - html5 blank title - fail
  • in_839-1.html - html5 no title - fail
  • in_839-2.html - html4 blank title - PASS
  • in_839-3.html - html4 no title - fail

@balthisar
Copy link
Member

@geoffmcl, thanks. I'm not sure how back I am yet, and in any case, it will be a lot less involvement than pre-child.

@geoffmcl
Copy link
Contributor

@lifecrisis, @balthisar now done a regression tests, using the above patch... AND some 45 cases poped...

Still to fully review this temp-839.diff... in general they all seem ok...

Maybe some could be avoided by adding different messages for each case... this seems like a good idea anyway...

Maybe some by adjusting their config, if they have one...

The most common consequence is accepting fully the following output change for html5 documents -

 <!DOCTYPE html>
 <html>
-<head>
-  <title></title>
-</head>
 <body>
...

Need to complete this review, but would certainly appreciate any feedback, comments, etc... thanks...

@geoffmcl
Copy link
Contributor

@lifecrisis, @balthisar Experimented with using 3 different messages, keeping MISSING_TITLE_ELEMENT as before, and adding a 2 and 3 more specific messages... applied as follows -

diff --git a/include/tidyenum.h b/include/tidyenum.h
index 3daee5b..0ce0c9c 100644
--- a/include/tidyenum.h
+++ b/include/tidyenum.h
@@ -235,6 +235,8 @@ extern "C" {
     FN(MISSING_SEMICOLON)             \
     FN(MISSING_STARTTAG)              \
     FN(MISSING_TITLE_ELEMENT)         \
+    FN(MISSING_TITLE_ELEMENT2)        \
+    FN(MISSING_TITLE_ELEMENT3)        \
     FN(MOVED_STYLE_TO_HEAD)           \
     FN(NESTED_EMPHASIS)               \
     FN(NESTED_QUOTATION)              \
diff --git a/src/language_en.h b/src/language_en.h
index 65e9ee3..e16ba0d 100644
--- a/src/language_en.h
+++ b/src/language_en.h
@@ -2009,7 +2009,9 @@ static languageDefinition language_en = { whichPluralForm_en, {
     { MISSING_SEMICOLON_NCR,        0,   "numeric character reference \"%s\" doesn't end in ';'"                   },
     { MISSING_SEMICOLON,            0,   "entity \"%s\" doesn't end in ';'"                                        },
     { MISSING_STARTTAG,             0,   "missing <%s>"                                                            },
-    { MISSING_TITLE_ELEMENT,        0,   "blank or missing 'title' element"                                        },
+    { MISSING_TITLE_ELEMENT,        0,   "inserting missing 'title' element"                                       },
+    { MISSING_TITLE_ELEMENT2,       0,   "missing 'title' element"                                                 },
+    { MISSING_TITLE_ELEMENT3,       0,   "blank 'title' element"                                                   },
     { MOVED_STYLE_TO_HEAD,          0,   "moved <style> tag to <head>! fix-style-tags: no to avoid."               },
     { NESTED_EMPHASIS,              0,   "nested emphasis %s"                                                      },
     { NESTED_QUOTATION,             0,   "nested q elements, possible typo."                                       },
diff --git a/src/parser.c b/src/parser.c
index 2f90ebd..23a8a49 100644
--- a/src/parser.c
+++ b/src/parser.c
@@ -4720,7 +4720,10 @@ void TY_(ParseDocument)(TidyDocImpl* doc)
         /* #72, avoid MISSING_TITLE_ELEMENT if show-body-only (but allow InsertNodeAtEnd to avoid new warning) */
         if (!showingBodyOnly(doc))
         {
-            TY_(Report)(doc, head, NULL, MISSING_TITLE_ELEMENT);
+            if (TY_(IsHTML5Mode)(doc))
+                TY_(Report)(doc, head, NULL, MISSING_TITLE_ELEMENT2);
+            else
+                TY_(Report)(doc, head, NULL, MISSING_TITLE_ELEMENT);
         }
 
         /* Is #839 - do not add blank title in HTML5 mode */
@@ -4736,7 +4739,7 @@ void TY_(ParseDocument)(TidyDocImpl* doc)
         {
             if (!showingBodyOnly(doc))
             {
-                TY_(Report)(doc, node, NULL, MISSING_TITLE_ELEMENT);
+                TY_(Report)(doc, node, NULL, MISSING_TITLE_ELEMENT3);
             }
         }
     }

The test differences are in temp-839-2.diff. It is reduced to some 36, vs the previous 45, cases... so that seems good... positive?

Of course, for each case number, there are usually 2 diffs - one for the html output, and one for the message text output... but 3 are the text only...

Could certainly do with some feedback, comments, etc... especially about, if html5 and now no title added, the dropping of the whole, now empty, head element... are we replacing one broken document, with another? ... thanks...

Still pondering all this... will try to get around to pushing my local issue-839 branch... if anyone is interested...

@geoffmcl
Copy link
Contributor

Oops, missed adding the 2 new messages to message.c -

diff --git a/src/message.c b/src/message.c
index ee2e6c6..dde5cd1 100644
--- a/src/message.c
+++ b/src/message.c
@@ -324,6 +324,8 @@ static struct _dispatchTable {
     { MISSING_SEMICOLON,            TidyWarning,     formatStandard          },
     { MISSING_STARTTAG,             TidyWarning,     formatStandard          },
     { MISSING_TITLE_ELEMENT,        TidyWarning,     formatStandard          },
+    { MISSING_TITLE_ELEMENT2,       TidyWarning,     formatStandard          },
+    { MISSING_TITLE_ELEMENT3,       TidyWarning,     formatStandard          },
     { MOVED_STYLE_TO_HEAD,          TidyWarning,     formatStandard          },
     { NESTED_EMPHASIS,              TidyWarning,     formatStandard          },
     { NESTED_QUOTATION,             TidyWarning,     formatStandard          },
@@ -816,6 +818,8 @@ TidyMessageImpl *formatStandard(TidyDocImpl* doc, Node *element, Node *node, uin
         case MALFORMED_DOCTYPE:
         case MISSING_DOCTYPE:
         case MISSING_TITLE_ELEMENT:
+        case MISSING_TITLE_ELEMENT2:
+        case MISSING_TITLE_ELEMENT3:
         case NESTED_QUOTATION:
         case SUSPECTED_MISSING_QUOTE:
         case XML_DECLARATION_DETECTED:

And re-running the tests, now get temp-839-2.diff, with some 41 cases in the diff... as before, no diff appears to be an actual regression!

But am concerned about the repeated additional trimming empty <head>, and the resultant html output change like -

**expected test result**
<!DOCTYPE html>
<html>
<head>
  <title></title>
</head>
<body>
</body>
</html>

**new test result**
<!DOCTYPE html>
<html>
<body>
</body>
</html>

That seems to be replacing, an agreed, bad document, with what could be called a worse document!

Here I am talking about just from the user's perspective, since to a browser, they are probably the same... and nu flags both, but a simpler, cleaner error message for the first..

I suppose I am suggesting that we retain the insertion of a blank, but for html5 still add the new blank check...

Still experimenting, testing... look forward to feedback, comments, etc ... thanks...

geoffmcl added a commit that referenced this issue Oct 11, 2020
This is a compromise -

   1. Keep insertion of a blank title tag, if none.
   2. Add new warn if title tag is blank, in html5

	modified:   include/tidyenum.h
	modified:   src/language_en.h
	modified:   src/message.c
	modified:   src/parser.c
@geoffmcl
Copy link
Contributor

@lifecrisis well here we are a year later... 2020/10/11... not a lot of feedback, on this exiting issue ;=))

I took another stab at this... and now solidly come down on generating bad document, is better than creating a worse document... and want to leave the insertion of a blank, as is... the idea being, the user should edit the document, adding a suitable title...

But do agree tidy should warn if it finds a html5 doc with a blank title, and created a new message for this... so the full patch is -

diff --git a/include/tidyenum.h b/include/tidyenum.h
index 3daee5b..76ae07d 100644
--- a/include/tidyenum.h
+++ b/include/tidyenum.h
@@ -282,7 +282,8 @@ extern "C" {
     FN(VENDOR_SPECIFIC_CHARS)         \
     FN(WHITE_IN_URI)                  \
     FN(XML_DECLARATION_DETECTED)      \
-    FN(XML_ID_SYNTAX)
+    FN(XML_ID_SYNTAX)                 \
+    FN(BLANK_TITLE_ELEMENT)
 
 
 /** These are report messages added by Tidy's accessibility module. 
diff --git a/src/language_en.h b/src/language_en.h
index 8d0eb7a..f93324e 100644
--- a/src/language_en.h
+++ b/src/language_en.h
@@ -2055,6 +2055,7 @@ static languageDefinition language_en = { whichPluralForm_en, {
     { WHITE_IN_URI,                 0,   "%s discarding whitespace in URI reference"                               },
     { XML_DECLARATION_DETECTED,     0,   "An XML declaration was detected. Did you mean to use input-xml?"         },
     { XML_ID_SYNTAX,                0,   "%s ID \"%s\" uses XML ID syntax"                                         },
+    { BLANK_TITLE_ELEMENT,          0,   "blank 'title' element"                                                   },
 
 
     /***************************************
diff --git a/src/message.c b/src/message.c
index ee2e6c6..de0124b 100644
--- a/src/message.c
+++ b/src/message.c
@@ -372,6 +372,7 @@ static struct _dispatchTable {
     { WHITE_IN_URI,                 TidyWarning,     formatAttributeReport   },
     { XML_DECLARATION_DETECTED,     TidyWarning,     formatStandard          },
     { XML_ID_SYNTAX,                TidyWarning,     formatAttributeReport   },
+    { BLANK_TITLE_ELEMENT,          TidyWarning,     formatStandard          },
 
     { APPLET_MISSING_ALT,                            TidyAccess, formatAccessReport },
     { AREA_MISSING_ALT,                              TidyAccess, formatAccessReport },
@@ -819,6 +820,7 @@ TidyMessageImpl *formatStandard(TidyDocImpl* doc, Node *element, Node *node, uin
         case NESTED_QUOTATION:
         case SUSPECTED_MISSING_QUOTE:
         case XML_DECLARATION_DETECTED:
+        case BLANK_TITLE_ELEMENT:
             return TY_(tidyMessageCreateWithNode)(doc, rpt, code, level );
 
         case ELEMENT_NOT_EMPTY:
diff --git a/src/parser.c b/src/parser.c
index 8569ed5..cda3200 100644
--- a/src/parser.c
+++ b/src/parser.c
@@ -4713,7 +4713,8 @@ void TY_(ParseDocument)(TidyDocImpl* doc)
         TY_(ParseHTML)(doc, html, IgnoreWhitespace);
     }
 
-    if (!TY_(FindTITLE)(doc))
+    node = TY_(FindTITLE)(doc);
+    if (!node)
     {
         Node* head = TY_(FindHEAD)(doc);
         /* #72, avoid MISSING_TITLE_ELEMENT if show-body-only (but allow InsertNodeAtEnd to avoid new warning) */
@@ -4723,6 +4724,14 @@ void TY_(ParseDocument)(TidyDocImpl* doc)
         }
         TY_(InsertNodeAtEnd)(head, TY_(InferredTag)(doc, TidyTag_TITLE));
     }
+    else if (!node->content && !showingBodyOnly(doc))
+    {
+        /* Is #839 - warn node is blank in HTML5 */
+        if (TY_(IsHTML5Mode)(doc))
+        {
+            TY_(Report)(doc, node, NULL, BLANK_TITLE_ELEMENT);
+        }
+    }
 
     AttributeChecks(doc, &doc->root);
     ReplaceObsoleteElements(doc, &doc->root);

Putting this in an issue-839-2 branch, generated a tidy-5.7.35.I839-2.exe, and ran the regression tests... 5 cases pop - 1062511 1674502 1773932 427675 427676 - changing the full diff to temp-839-2.diff

In each case it is the addition of this new warning, and a change in the warning count... all acceptable changes...

+line 2 column 1 - Warning: blank 'title' element
 ...
 Info: Document content looks like HTML5
-Tidy found 11 warnings and 0 errors!
+Tidy found 12 warnings and 0 errors!

Will get around to pushing this branch, if all agreed to this compromise... done issue-839-2 - and need to add a regression repo patch, to update the 5 cases...

Look forward to further feedback, comments, patches, PR, etc ... so we can progress on this issue... thanks...

@geoffmcl
Copy link
Contributor

PR #930 created, with strong note about also fixing the regression tests...

Look for to further feedback, testing, ideas, comments... thanks...

balthisar added a commit that referenced this issue Jun 30, 2021
balthisar added a commit that referenced this issue Jun 30, 2021
Is. #839 -  new message for 'blank' title
@balthisar
Copy link
Member

Closed due to merging #930.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants