New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue237 html5 polyglot: Make default.html5 polyglot markup conformant. #3473
Conversation
From https://github.com/jgm/pandoc-templates/pull/248
No worries. I suppose part of your motivation for the change it to be able to offer the continuous integration tests against the templates (as for the rest of pandoc). As you can see I've modified the some of the test files (writer.html5, lhs-test.html+lhs, lhs-test.html) in response to a previous run of your CI setup. All the tests now pass except for number 6, under "Allowed Failures". I'll presume "allowed failures" applies in this case until you tell me otherwise. Thinking out aloud, I wonder:
One question for you:
If "All checks have passed" is reliable in this situation you'd be welcome to now assess my (content) commits before I push the remaining content and concomitant test file commits. I'm off for the night. |
Congrats! Excellent PR so far (I'll wait to merge until you make the other changes -- or if you like I can merge now and you can do a new PR for the others). Your assumptions are correct, the lhs-test.html and lhs-test.html+lhs are run against html5 (since that is now the default 'html'). (These date from back when we only had 'html'.) The commit messages are excellent, I appreciate the links and detail. Please keep them hard-wrapped to 80 columns, though. (No big deal if it's hard to change that now, but going forward.) I'm okay with the lang attribute proposal, but I'm not sold on the title proposal. I'd rather just leave it empty (as with the lang attribute). If you think you might make more contributions, it's probably worth installing Haskell locally. I recommend installing stack, which will give you everything you need (assuming you have lots of disk space). Who knows? You might get into Haskell. The org writer was written by someone with no prior exposure, who just looked at the code for another writer. |
Thanks for all that! Yeah, keep the PR open and I'll push more commits. I'll push the lang commit (in a day's time) and get you to look at it. I know that by convention the first line of commit messages ought be less than 50 chars. (and you mention this in CONTRIBUTING.md) ... by my overstepping that rule I'm reminded of it. I wasn't aware of the 80 char hard wrap convention of the commit message overall. So thanks for mentioning it. Having been alerted to it, I see why that's a good convention when I I'm disinclined to correct the current commit messages given my recent investigations into the problems in doing so (with an interactive rebase and force push): http://stackoverflow.com/questions/42405984/is-git-push-force-ever-harmful-when-applied-to-your-origin-contributor-fork (feel free not to read any of that, or correct any misunderstanding there). On the other hand I'm quite able to interactively rebase and force push, and I suppose we can be reasonably confident no one has pulled down JohnLukeBentley:issue237-html5-polyglot (if you haven't), so if you'd rather I clean up my messages I'd be happy to make the attempt (that might teach me something). Either way I intend to hard-wrap in the future. The title commit message (not yet pushed) is intended to be as follows. Is it persuasive? html5. Polyglot. Add default title value: "My Title".One in a series of commits to make default.html5 polyglot markup conformant. This changes default.html5 only. If an author provides a title in their source document, that value will be used by default.html5 otherwise a non empty default, I've chosen "My Title", will be output. The Polyglot Markup spec lists its "Minimal HTML template" with an empty As in:
However, by the HTML 5 spec
https://validator.w3.org/nu/ enforces that rule. That is, ...
... will return ...
Therefore, if the Polyglot Markup spec was correctly written it would require a In addition to having a default
Issue jgm/pandoc-templates#237, "default.html5. Polyglot Markup", contains a prior discussion although the relevant resulting conclusion is expressed here (and in the pull request). |
Without wanting to hurry you along (take all the time you need) I thought I clarify that I'm waiting on you to:
|
Go ahead and rebase, that's no problem for me. I don't know about the title issue, but I really don't like "My Title" as a default. I think I'd prefer to have pandoc emit a warning message if they've not specified a title. (This would be done in the HTML writer, which contructs a |
The first in a series of commits to make default.html5 polyglot markup conformant. This changes default.html5 only. You must have the default xhtml namespace on the root html element, as in ... <html xmlns="http://www.w3.org/1999/xhtml"> "Polyglot markup declares the default namespaces on the root HTML element, html," https://www.w3.org/TR/html-polyglot/#element-level-namespaces This is also reflected in the Polyglot "Minimal HTML document". https://www.w3.org/TR/html-polyglot/#minimal-polyglot-html-document Issue jgm/pandoc-templates#237, "default.html5. Polyglot Markup", contains a prior discussion although the relevant resulting conclusion is expressed here (and in the pull request). This is my first in-production git and github commit.
One in a series of commits to make default.html5 polyglot markup conformant. This changes default.html5 only. Make all `meta` elements self closing. E.g. Change: * `<meta charset="utf-8">` to * `<meta charset="utf-8" />` Although the HTML5 spec permits void elements that aren't self closing, as in `<meta charset="utf-8">`, this is not permitted in Polyglot Markup .... > In the HTML syntax, void elements are elements that always are empty and never have an end tag. All elements listed as void in the HTML specification or in an extension spec, MUST in polyglot markup have the syntactic form of an XML empty-element tag (`<foo/>`). https://www.w3.org/TR/html-polyglot/#empty-elements Without the suggested change my current pandoc converted .xhtml file, using default.html5, will choke in the browser (Firefox 50.1.0) with an error "XML Parsing Error: mismatched tag. Expected: `</meta>` ...", because it can't handle `<meta charset="utf-8">` . In addition I keep a space between then closing forward slash and the last character. That is, I recommend `<meta charset="utf-8" />` rather than `<meta charset="utf-8"/>`. This is generally unnecessary these days as modern browsers handle either form. However, the former guarantees backward compatibility (at no cost). Issue jgm/pandoc-templates#237, "default.html5. Polyglot Markup", contains a prior discussion although the relevant resulting conclusion is expressed here (and in the pull request).
…lank. One in a series of commits to make default.html5 polyglot markup conformant. This changes default.html5 only. This commit changes, where the document author provides a language value, an output html element from something like: `<html xmlns="http://www.w3.org/1999/xhtml" lang="en">` to `<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">` This commit changes, where the document author DOES NOT provide a language value, an output html element from something like: `<html xmlns="http://www.w3.org/1999/xhtml" lang="">` to `<html xmlns="http://www.w3.org/1999/xhtml" lang="" xml:lang="">` The change entails two considerations: 1. Add `xml:lang=""` to conform to the Polyglot Markup spec ... > When specifying the language mapping of an element, polyglot markup uses both the lang and the xml:lang attributes. Neither attribute is to be used without the other, and polyglot markup maintains identical values for both lang and xml:lang. https://www.w3.org/TR/html-polyglot/#language-attributes 2. That `lang="" xml:lang=""` will be output when the document author does not provide a language value. That is, the language values default to blank. The key justification for having language values default to blank: it turns out the HTML5 spec requires it (as I read it). Under [the HTML5 spec, section "3.2.5.3. The lang and xml:lang attributes"]( https://www.w3.org/TR/html/dom.html#the-lang-and-xmllang-attributes), providing attributes with blank contents both: * Has meaning, "unknown", and * Is a MUST (written as "must") if a language value is not provided ... > The lang attribute (in no namespace) specifies the primary language for the element's contents and for any of the element's attributes that contain text. Its value must be a valid BCP 47 language tag, or the empty string. Setting the attribute to the empty string indicates that the primary language is unknown. In short, it seems that where a language value is not provided then a blank value MUST be provided for Polyglot Markup conformance, because the HTML5 spec stipulates a "must". So although the Polyglot Markup spec is unclear on this issue it would seem that if it was correctly written, it would therefore require blank attributes. Further justifications are found at https://github.com/jgm/pandoc-templates/issues/237#issuecomment-275584181 (but the HTML5 spec justification given above would seem to be the clincher). --- In addition to having lang-values-default-to-blank I recommend that, when an author does not provide a lang value, then upon on pandoc command execution a warning message like the following be provided: > Polyglot markup stipulates that 'The root element SHOULD always specify the language'. It is therefore recommended you specify a language value in your source document. See https://www.w3.org/International/articles/language-tags/ for valid language values.
This adds `lang="" xml:lang=""` to the test files.
Thanks. So, as you can see, the rebase was done (and looks good from my side). I then pushed "html5. Polyglot. html element. Add xml:lang. lang values default to blank." and the concomitant test file modification commit. On title, I'll suggest a slight change to the above proposal and break it down a little differently to facilitate consensus or disagreement.
Under that scheme the resulting html will always have a non empty title. What remains is the question of whether to adjust default.html5. There are up to three decisions to make: A. To leave it as is, and trust the html writer (after being coded) will always supply a title; or have a default value in case the html writer breaks; and On A, I'd suggest it's good defensive programming to have a default value even though we expect it to be overwritten. On B, if your objection to "My Title" is based on an aesthetic objection to seeing "My" anywhere then I sympathize with that. I too have some sort of intuitive distaste for "My" (I was gladdened, for example, when windows switched from calling the folder "My Documents" to just "Documents"). But my idea was to have a self documenting value that indicated to the author that this is a value they have to supply (which beginner HTML folk might not be aware of). But alternatives could be simply "Title" or "There is something wrong with the HTML writer: it should always supply a pagetitle". On C, perhaps something like:
So tying one set of those options together I'll suggest, for the sake of something for you to push against:
But each of the decisions A, B, and C are probably largely arbitrary. So if you stipulated an answer for each of those decisions I'd be surprised if I would have any serious counterpoint. |
I had been thinking that if the result is not valid because of a missing title, that's on the user, not pandoc (especially if pandoc issues a warning). We can't guarantee that the document pandoc produces is valid HTML5, anyway, since raw HTML can be inserted by the user. But I guess I wouldn't mind too much inserting the word I'd rather just handle this in the writer, to avoid making the template more complex than it already is. So, I'll merge your changes and look into making some further changes in the writer. |
If we are to have a default title then
Yes there is a line of thinking from that point that might change my mind. Perhaps, indeed, we ought have it on the user. If the user doesn't supply a title in the source doc, and they ignore the pandoc warning, then we might want a blank title value in order to let the resulting HTML(5) be invalid. That way if a user validates their HTML(5) they'll get an error message that would otherwise be suppressed with "Untitled". If the is right way to go: we keep the default.html5 title as it is but add the pandoc html writer warning. I'm unsure. What do you think? |
Yes, that's what I've done. |
Still need to add a warning about lang, but I think I'll explore a more general mechanism that also checks for the right format. |
I meant: have the warning but don't even pass Noted your lang warning comment. Incidentally, from my end I've merged in your repo:master with my:master and deleted my topic branch. My commit history looks like a noodly mess.
|
Normally the workflow with github is to create a branch for
a single pull request. This can be rebased when needed,
until ready for merging. After it is merged, the branch
can be deleted and you can pull from origin's master into
your master.
So, you could just get a fresh clone of master, assuming
you don't have other stuff on your master that you care
about.
+++ John Luke Bentley [Mar 04 17 02:31 ]:
… Incidentally, from my end I've merged in your repo:master with
my:master and deleted my topic branch. My commit history looks like a
noodly mess.
* Is this just an artefact of the rebase or have I done something
(else) that is wrong?
* Is this something you'd want to clean up (or have me clean up); or
is it a matter of leaving the mess as it is?
—
You are receiving this because you modified the open/close state.
Reply to this email directly, [1]view it on GitHub, or [2]mute the
thread.
References
1. #3473 (comment)
2. https://github.com/notifications/unsubscribe-auth/AAAL5AXXcUTgiO628jvm1YGGOS4MTmB4ks5riT2XgaJpZM4MM-YC
|
I'd describe the github workflow slightly differently, but we might have had the same sort of thing in mind:
To fix the current issue I acted on your helpful tip. I deleted the local repo and got a fresh clone of my origin master. The result: the noodly history has gone. But now my individual commits are missing. The following command ...
... used to return ...
... now it returns an error. I can currently see, however, the commit representing the pull request.
Is it standard that when you, as project owner, merge in a pull request all the individual commits that comprise it are lost (possibly because you, the project owner, chose to rebase when you merge, rather than merge-commit, for a simpler history)? |
+++ John Luke Bentley [Mar 06 17 02:18 ]:
Is it standard that when you, as project owner, merge in a pull request
all the individual commits that comprise it are lost (possibly because
you, the project owner, chose to rebase when you merge, rather than
merge-commit, for a simpler history)?
Yes, one can do it either way, but in this case it made more
sense to put everything in one commit, I thought. (Usually
one thing on my mind is -- what if we find we need to revert
this? Easier to revert one commit than a whole bunch.) In fact
GitHub makes this easy by having a single button to do this.
|
That makes sense. The revert consideration is good to point to. I suppose, then, you (project owner) could just as well in another case be wanting to preserve the commits within a (multi-commit) pull request, if you anticipated the possibility of reverting only some of those commits. |
This is a pull request (pr) to make default.html5 polyglot markup conformant. default.html4 is left alone.
Issue jgm/pandoc-templates#237, "default.html5. Polyglot Markup",
(the first two commit messages wrongly reference #237 in jgm/pandoc)contains a prior discussion although the resulting conclusions are expressed here, in the pr comments and commit messages. The following commits are intended to be pushed to this PR. Those ticked have been pushed to this PR:html5. Polyglot. Add default title value: "My Title".html5. Polyglot. Add default title value: "Untitled".In-principle consensus has been reached in issue jgm/pandoc-templates#237 on the changes contained in the first two commits. Consensus has not yet been achieved on the changes contained in the last two commit (yet to be pushed).
Polyglot Markup is a W3C standard: https://www.w3.org/TR/html-polyglot/. Polyglot Markup essentially specifies a syntax that facilitates switching between serving a web document either as HTML or as XHTML, the later requiring a document to be "well-formed" in XML terms. Without these changes default.html5 currently results in a web file that can't be served as XHTML.
Although Polyglot Markup results in a XHTML syntax the specification is not "Use XHTML syntax over HTML syntax", rather it is (using my own words):
(In practice the mime type is generally set by changing the file extension of your (x)html file: your web server generally looks at the file extension and sends the corresponding mime type in the response header).
These commits either include changes that:
This is my first in-production git and github pull request.
References:
Edit 01: Struck out "the first two commit messages wrongly reference #237 in jgm/pandoc" due to rebase.
Edit02: Changed last commit message first line to "html5. Polyglot. Add default title value: "Untitled"."
Edit03: struck out "html5. Polyglot. Add default title value: "Untitled"."