Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTML5 validation warning on "style" #5146

Closed
crystalfp opened this issue Dec 13, 2018 · 7 comments
Closed

HTML5 validation warning on "style" #5146

crystalfp opened this issue Dec 13, 2018 · 7 comments

Comments

@crystalfp
Copy link

Validating Pandoc generated HTML5 page elicit warning during validation.

The source (bug.md):

---
pagetitle: Support
author: Mario Valle
lang: en
...

# Support
Support for bla bla.

The build command (bug.sh):

pandoc -s -c style.css -o bug.html -f markdown+smart -t html bug.md

The validation generates a warning: "The type attribute for the style element is not needed and should be omitted." Also not sure that defining the namespace and xml:lang in the header is needed for HTML5

<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">

Is generating the template pandoc -D html5, modifying and using it the only workaround?
Thanks for clarifying!
mario

pandoc.exe 2.4 (32 bits) on windows10 64bits

@mb21
Copy link
Collaborator

mb21 commented Dec 13, 2018

I think I agree. The type="text/css" is not recommended for HTML5 and apparently not needed for XHTML Polyglot either. @JohnLukeBentley?

The namespace and xml:lang stuff makes the file also a valid XHTML document, which is nice for some use-cases. See #3473

@JohnLukeBentley
Copy link
Contributor

JohnLukeBentley commented Dec 13, 2018

@mb21 you've done well to catch the relevant history of the project here...

@crystalfp, as @mb21 points to, I managed to persuade @jgm to use a slightly stricter standard than HTML (5). That is, the use polyglot markup.

Essentially that entails using HTML (5) with a XHTML conforming syntax (a syntax that is XML valid) and a small number of additional restrictions. That results in a markup that is consistent with the HTML (5) spec thereby facilitating an easy switching between serving a web page as "application/xhtml+xml" or "text/html" MIME type and having it render, in the browser, identically.

So on the assumption that polyglot markup continues to be our standard...

On the html element, default namespace attribute: this is required under polyglot

https://www.w3.org/TR/html-polyglot/#element-level-namespaces

Polyglot markup declares the default namespaces on the root HTML element, html," ...

That is also exemplified in the minimal example, https://www.w3.org/TR/html-polyglot/#minimal-polyglot-html-document

On the html element, language attributes there are two relevant rules.

Firstly, If you specify the language you MUST use lang and xml:lang with identical values. http://www.w3.org/TR/html-polyglot/#language-attributes

Secondly, http://www.w3.org/TR/html-polyglot/#language-attributes

The root element SHOULD always specify the language.

And by a very extraordinary coincidence I was today looking at the type=text/css issue as, on an unrelated personal project, I came across the same validation warning quoted. I haven't finished a detailed look. But on a glance the HTML spec merely permits it to be dropped

The default value for the type attribute, which is used if the attribute is absent, is "text/css". https://www.w3.org/TR/html5/document-metadata.html#the-style-element

... and, as @mb21 has already established, polyglot examples drop it from the style block.

So it seems we can be confident to say that, in a style block, type="text/css" is not required by HTML 5 nor polyglot and it COULD be dropped.

It is strange the validation warning is stronger than either standard suggests, in claiming it SHOULD be dropped.

However I think it would be good to take our cue from the validation message by dropping type="text/css" . It would result in cleaner markup.

Whether type="text/css"` should be removed from the link element, however, would be a separate matter. I hope to do the relevant reading on that soon ™ and report back. Anyone else, of course, could beat me to the punch on that. It might be worth settling that issue too before closing this issue (or creating a pull request).

Edit:

Given all that I think dropping type="text/css" from the style block is a good thing to do. It would result in cleaner markup.

to

However I think it would be good to take our cue from the validation message by dropping type="text/css" . It would result in cleaner markup.

@crystalfp
Copy link
Author

I was not aware of this polyglot thing. So seems my best option is to edit the template file, reduce it to pure HTML5 and use it during conversion.
Just a side question. The Polyglot specification is no longer maintained (see beginning of the specification). Is keeping it inside pandoc really necessary?
Thanks!
mario

@jgm
Copy link
Owner

jgm commented Dec 13, 2018 via email

@crystalfp
Copy link
Author

OK, understand the rationale for polyglot. Thanks for fixing the style issue.

@jgm
Copy link
Owner

jgm commented Dec 14, 2018

Seems the text/css isn't really needed in link either:
see https://developer.mozilla.org/en-US/docs/Web/HTML/Element/link

@jgm jgm closed this as completed in 9fe6d91 Dec 14, 2018
@JohnLukeBentley
Copy link
Contributor

JohnLukeBentley commented Dec 15, 2018

TL;DR:

After further research I can verify the decisions already made are likely to be the right things to do. The commit 'Remove unnecessary type="text/css" on style and link for HTML5' looks correct.

For completeness I'll detail the following points:

  • Indeed the HTML spec seems to show type=text/css as optional on a link element.
  • Therefore there was no notion there that goes against the initial conclusion that for style block type=text/css should be removed.
  • @crystalfp, if you have a free choice in the matter then - using polyglot is a good thing to do;
  • @crystalfp whether you use polygot or not, having non empty language tags is a good thing to do;
  • @crystalfp, even if you use polyglot you may very well want to (as you asked about) generate a default template copy, customize that copy, and use that custom template.

Link element and type=text/css

The HTML spec doesn't encourage or discourage the use of type="text/css" in the link element. However, the HTML spec makes it clear it is optional. It has an example as follows:

<link rel="stylesheet" href="A" type="text/plain">
<link rel="stylesheet" href="B" type="text/css">
<link rel="stylesheet" href="C">

https://www.w3.org/TR/html5/document-metadata.html#processing-link-type

The validator (https://validator.w3.org/nu/) doesn't provide any warning if type="text/css" is used on a link element.

However, for consistency with our prior conclusions on the style block I'd discourage its use on the link block, when rel="stylesheet" is also used (not that you'd used it when rel="stylesheet" is not used).

I should propose that conclusion to the W3C.

I note that in pandoc type=text/css on the link element was absent anyway. Even before the commit.

Style element and type=text/css

As previously touched on, the HTML spec stipulates that type=text/css is defaulted to if omitted from a style element.

The HTML spec does not specifically discourage the use of type="text/css" in style blocks.

However the validator (https://validator.w3.org/nu/) warns against its use with "Warning: The type attribute for the style element is not needed and should be omitted".

Given there's no surprises from looking at the link element and type=text/css the initial conclusion that we should take our cue from the validator, for the style element, seems verified.

I should also propose to the W3C that the spec be made consistent with the validation message.

Polyglot is good to use

Yes, the Polyglot standard has not been maintained for some while now. However, part of the reason may well be that is being nothing further to do with respect to the standard. That is, the Polglot seems to be complete and robust (although it could be represented more clearly).

Whatever the motives for not maintaining it @jgm nails it with

It's quite useful to ensure that the pandoc output is both valid XML and valid HTML5, which is what polyglot gives you. (For one thing, epub3 requires XHTML5.) And there's no cost to this ...

In that sense Polyglot produces pure HTML5 markup. The "XML syntax" is stipulated in the HTML spec as one of the two kinds of syntax to use: https://www.w3.org/TR/html/xhtml.html#xhtml

But even if you weren't using pandoc, and embarking on some general web project, polyglot is a good thing to use.

Choosing to use polyglot markup helps if one's predilection for using the more strict XHTML (XML) syntax is meet with resistance from the HTML (non-XML) Syntax crowd, the more popular choice, as one move's one's web pages into that social environment. One need only changes one's file extension from .xhtml to .html and the pages are going to render as before.

You may well be a member of the the HTML (non-XML) Syntax crowd.

But, if one had the freedom to choose, why would one choose to use XHTML (XML) syntax over HTML (non-xml) syntax?

  • When you serve XHTML as "application/xhtml+xml" a browser will firstly check that it is valid XML. If there's any error that entails it is not valid XML the browser will throw an error message pointing to the problem. The helps ensure that many basic markup errors are caught early … before having to run the file through an external validator, like https://validator.w3.org/unicorn/ (although using an external validator is still good to do, after the basic errors have been caught).

  • XML tools are available to use both in the creation of, and the consumption of, the page. On the consumption of the page, for example, you could issue an HTTP request from one's favoured programming language and parse the incoming data as XML.

    Even if one doesn't plan to use XML tools to create or consume web pages leaving that option open gives you an open option that you wouldn't otherwise have. You never know when life might be made easier if your web pages are already XML valid. This is evidenced by the example given for pandoc by @jgm: "epub3 requires XHTML5". If I recall correctly this was nowhere part of the original motive for changing to polyglot. If that memory is correct then this has become a bonus made apparent down the track (would this be right John?).

It's probably still a religious war thing but I can't see why the W3C shouldn't deprecate the HTML (non-xml) Syntax, in favour of XHTML. None of the historical problems (e.g. lack of browser support for the "application/xhtml+xml" MIME type) seem to apply.

I should probably restart that religious war at the W3C.

Language tags are good

In the HTML spec there's no explicit exhortation to include a language tag (or language tags in the case of XHTML). That is, there's nothing like the phrase found above in the Polyglot spec. However, implicitly the HTML spec regards the inclusion of a language tag (or language tags in the case of XHTML) as something you should do:

It provides an explicit indication to user agents about the language of content in order to enable language specific behavior. For example, use of an appropriate language dictionary; selection of an appropriate font or glyphs for characters shared between different languages; or in the case of screen readers and similar assistive technologies with voice output, pronunciation of content using the correct voice / language library.

Incorrect or absent lang attributes can produce unexpected results in other circumstances, as they are also used to determine quotation marks for q elements, styling such as hyphenation, case conversion, line-breaking, and spell-checking in some editors, etc. [Emphasis added].

(W3C HTML5, 2017. HTML 5.2 W3C Recommendation, 14 December 2017) , "The lang and xml:lang attributes", https://www.w3.org/TR/html/dom.html#the-lang-and-xmllang-attributes

Custom Pandoc templates

Even if that's persuasive, and you use Polyglot for your projects, you may very well want to use custom pandoc templates (as you ask about). For example for personal preference reasons I remove the IE shiv and I wrap $body$ in (html5) <main> tags.

Of course if you want to use HTML (non-XML) syntax then you can use custom pandoc templates too.

@jgm has made this an easy thing to do and it looks like you may already be across it. But, as an example procedure, in Windows (edit: using Powershell) (my environment):

  • To create a custom template: output a default template for the output format you are interested in (ensure to force utf8 encoding), save that output to a file named my-default.*FORMAT* (where "my-" is any arbitrary string to differentiate it as different from default.*FORMAT*):

    pandoc -D html5 | Out-File .\my-pandoc-templates\my-default.html5 -Encoding utf8
    

    @jgm points to iconv for UTF-8 output. That may well be needed on other OSs (like linux). http://pandoc.org/MANUAL.html#character-encoding

  • Customize that template as desired.

  • Use the custom template: Use with the --template switch

    pandoc TestBasic.md --from markdown --to html5 --output .\output\TestBasic.xhtml --standalone --template .\my-pandoc-templates\my-default.html5
    

    Edit 01: You could just as well use a different output extension in the above command, e.g. .\output\TestBasic.html

Edit 02: Added "(edit: using Powershell)" above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants