Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Markdown checkboxes #3051

Closed
1 of 2 tasks
graymalkin opened this issue Aug 1, 2016 · 55 comments · Fixed by #5139
Closed
1 of 2 tasks

Markdown checkboxes #3051

graymalkin opened this issue Aug 1, 2016 · 55 comments · Fixed by #5139

Comments

@graymalkin
Copy link

graymalkin commented Aug 1, 2016

It would be nice if Pandoc's GFM supported checkboxes, either through an extension or native to the GFM.

  • example unchecked
  • example checked
 - [ ] example unchecked
 - [x] example checked

I hope I'm not just being thick here, I couldn't find anything about it in the manual, and no one appears to have mentioned it in the issue tracker.

@humanfactors
Copy link

I agree, this would be an excellent enhancement. I'm pretty sure it would parse GFM through pdflatex right? There are plenty of simple and elegant solutions to creating a checkbox in LaTeX.

@alok
Copy link

alok commented Oct 13, 2016

I'm also interested in seeing this added to the Pandoc Markdown syntax. It's a common and rather obvious bit of markup, and with it, I think Pandoc Markdown becomes a strict superset of every common Markdown variant.

@dimus
Copy link

dimus commented Oct 27, 2016

I would vote on this as well, great feature to have

@ickc
Copy link
Contributor

ickc commented Nov 20, 2016

with it, I think Pandoc Markdown becomes a strict superset of every common Markdown variant.

Not opposing the idea of supporting this feature, but this statement is far from true.


An important question to ask is how it can be done. It seems it's an "AST Change"-level of difficulty. If indeed it does, I guess it won't be implemented for a long time.

I personally also like to use markdown to manage todo list. For example, the taskpaper style syntax @todo does similar thing (but in pandoc @ has special meaning), which has been used in a markdown variant.

@graymalkin
Copy link
Author

It seems reasonable that github flavoured markdown should implement GitHub markdown. (Checkboxes aren't part of daringfireball's original markdown, iirc - it's a GitHub extension)

@ickc
Copy link
Contributor

ickc commented Nov 23, 2016

Yes, it should, but again, how?

As far as I understand: pandoc's unique feature is to implement different markdown extensions seperately, such that when different extensions turned on/off, those combinations of extensions becomes another markdown variant. But the premise is pandoc already have that extension.

When pandoc don't have such extenson, then you need to ask, what it takes to add this extension? Is it a document model pandoc already support? e.g. in the case of multimarkdown inline footnote, pandoc do not support it, but pandoc's internal surely can handle footnote (inline or not), so in this case adding such extension only requires a change in the markdown parser.

But in this case, if I am not mistaken, it requires an AST change. Because pandoc's internal just can't handle that. If I'm correct, this put the required change into the most difficult category: "AST change". It requires a change in all existing writers and readers. And if you see the graph in <pandoc.org>, it certainly is many.

Now, among all these "AST change" level of feature request (see a list of them by clicking GitHub's label), how many of them are older? or more important? e.g. 1 of the important feature request that involve AST change is column/row span in tables. Now, to me this is something much more important and general to have.

So what I'm suggesting is not it should not happen, I want it happens too. But I'm saying given the level of difficulties, the workloads of the core developers, the amount of open issues, the priorities they might have, this is unlikely to happen in the foreseeable future.

1 thing people often mistaken by those markdown variants in pandoc is that it is supposed to be fully compatible (say, markdown_github, markdown_mmd, etc.), and whenever it falls short of that, people kind of think it should be added because pandoc said it supported it. But it is not. I know that the manual has mentioned its limitation, but perhaps not emphasized enough. So I guess the most urgent "bug fix" is to emphasize what a markdown variants provided by pandoc means. Note that it is not useless even if not fully compatible. But it is only useful when the limitation is understood.

Again, some of the points above based on the assumption that it requires AST change. Feel free to correct me if I'm wrong.

@graymalkin
Copy link
Author

Ah, sorry I misunderstood your previous comment.

I don't know how the AST in Pandoc works, I'm here ignorantly opening bug reports with a "would-be-nice-if" flavour. Naively pecking through the code base it looks like the only way to propagate the checkboxes will be a new AST node type.

Is there a node type available for "fallback"? So you can specify 2 different ways to represent a given part of the AST, in the case of Checkbox some Markdown.Checkbox type as well as Teletype with value "[ ]" or "[x]"?

This sort of thing might give developers more freedom to add features in a way which doesn't trample on every single backend. It also encourages a bad behaviour of adding features which aren't going to be visible in every backend, but perhaps that's okay.

@ickc
Copy link
Contributor

ickc commented Nov 23, 2016

As far as I understand, it seems a fallback would not work. I've suggested a similar approach for column/row spans in tables, but they say it won't work. So unfortunately any AST change will be a very daunting task: at least all writers and readers and pandoc-type needed to be changed (sometimes involves more things, say, pandoc-citeproc, templates, etc.)

I think the core developers have been thinking about AST changes. I don't know much about it, but if I were to make such a big change, I would want to do it correctly the second time and incluides as many features which is useful and requires AST changes as possible (so that there's no third time), which only makes the task more daunting.

However, another unique feature in pandoc is its filter system. So I suggest if it is something you sorely needed right now, you should write a filter to do it. How it should be done depends on your need, e.g. is it write to or read from GFM, is pandoc markdown only an intermediate format you need (e.g. you want to gfm -> PDF)? If you are interested in writing a filter or need help on that, you can open a thread on pandoc-discuss, lots of experts there can give you advices and some might even write one for you (don't count on that, however).

@jgm
Copy link
Owner

jgm commented Nov 23, 2016

Yes, you can write a filter that finds list items
of the form

[Plain (Str "[":Space:Str "]":xs)]

and replace these with e.g.

[Plain (RawInline "html" htmlCheckbox : xs)]

This should work fine for HTML output.
RawBlock and RawInline are your "fallback."

@graymalkin
Copy link
Author

I've written some pandoc filters in the past to do something similar, and I can't say I'm desparate. Like I say it's a "would-be-nice-if".

If it's going to involve such heavy re-work I'm happy to leave this as WontFix, and use a filter. Thanks for your input :)

@tajmone
Copy link
Contributor

tajmone commented Dec 13, 2016

I think that implement GFM task lists is important. Here I bring a real case scenario of the problems that this lacking feature can cause.

If I task lists in a markdown document, likes this:

- [ ] Mercury
- [x] Venus
- [x] Earth (Orbit/Moon)

and then I use pandoc to clean up the markdown source:

pandoc -f markdown_github -t markdown_github

the file gets cleaned up, except for the Task List, which gets corrupted by escaping the brackets:

-   \[ \] Mercury
-   \[x\] Venus
-   \[x\] Earth (Orbit/Moon)

That's a pity. Pandoc is a great tool for cleaning up markdown source files (especially with --smart --wrap=none --normalize options): you get properly aligned tables, a standard syntax (where multiple syntaxes are possible), normalization of extra whitespaces, etc. — all of which is not only good for the eye, but also in Git controlled projects, because it reduces diffing nightmares and false positives in status changes.

But right now, this can't be used on GFM docs which make use of Tasks List — else they break up. In many GitHub projects I use batch scripts to clean up all markdown files via pandoc (from GFM to GFM) before commiting. This REALLY helps: I work with "lazy" markdown syntax, but after cleanup all files are up to pandoc standard (eg: I work with Atx-style header, but commit with Setext-style headers, ecc.); but most of all, it makes a much cleaner diffing when merging in contributions and solving conflicts.

Then I have to choose: either I don't use task lists in markdown docs, or I don't use scripts automation to clean up source files.

Tasks Lists being part of the GFM standard, they ought be implemented in pandoc's markdown_github.

@ickc
Copy link
Contributor

ickc commented Dec 14, 2016

@tajmone

Also see this thread in pandoc-discuss. @jgm has specifically said pandoc is not designed as a linter. So we are on our own when we push pandoc beyond what it is designed for.

And please read my comments in this thread. It is likely you don't understand what markdown_github means, the philosophy behind pandoc, and the level of difficulties involved in supporting this feature.

@jgm
Copy link
Owner

jgm commented Dec 14, 2016 via email

@tajmone
Copy link
Contributor

tajmone commented Dec 14, 2016

@ickc

Thanks. I am aware of the AST problems and complexities. Nonetheless, I wanted to put forth this particular usage case.

So, it seems that the only solution for now would be to create a filter that preserves [ ] and [x] when they are the first three chars at the beginning of a list element. But couldn't this be implement outside the AST, by having pandoc simply leave them verbatim on the text leaf when working with markdown_github format?

Unfortunately I have no knowledge of Haskell, so I can't contribute much on this issue. But I could look into creating a filter.

But I did look into pandoc sources, to inspect the AST structure. From what I gather, a checklist is just a list subtype -- like a roman letters is just a subtype of an ordered list. Couldn't the AST accomodate some extra attribute to specify that checkboxes are unordered/bullet list items with an extra checkbox qualifier (with an on/off boolean status). After all, in GFM - [ ] becomes a checkbox which substitutes the original bullet. This approach would mean that checkbox items will convert to normal bullet items during convertion to formats which don't support them, but it would allow at least to preserve them in conversion from and to GFM.

@jgm has specifically said pandoc is not designed as a linter. So we are on our own when we push pandoc beyond what it is designed for.

That's a pity though. Pandoc does a good job at cleaning up documents (because of the AST). Maybe in future editions it could have a special --cleanup option to implicitly carry out a -from -to sameformat operation on the input file.

After all, people look for pandoc because they like the idea of having a standalone single binary (ok, + citepro) tool to handle formats conversion. But if we need to install Node.js, or Python or Ruby just to access a linter than its benefits tend to dilute down (an possible, you end up installing a different linter for each format, with dozens of dependencies).

@jgm
Copy link
Owner

jgm commented Dec 14, 2016 via email

@ickc
Copy link
Contributor

ickc commented Dec 15, 2016

Yes, we could special-case this in the markdown writer. @jgm

It seems unnatural to special-case this as such a case (GFM checklist) would only happens when it is both from and to markdown_github, outside this markdown variant, it doesn't mean anything (or, no meaning has been assigned yet).

But I could look into creating a filter. @tajmone

If the only thing you need is to change \[x\] and \[ \] back to [x] and [ ], a post-processor might be simpler, as long as there aren't such pairs which doesn't mean a checklist in your document. A filter is definitely more strict and worry-free.

That's a pity though. Pandoc does a good job at cleaning up documents (because of the AST). Maybe in future editions it could have a special --cleanup option to implicitly carry out a -from -to sameformat operation on the input file. @tajmone

I've made a similar suggestion before. But the problem of using pandoc as some sort of "linter" is 2-fold:

  1. configurable styling
  2. being "idempotent", i.e. after it is linted, further linting would not change the document further.

The first one is optional but nice to have as a linter. It can already be done partially by +/- markdown extensions. And this will probably never be the goal of pandoc.

The second one is more critical, but is currently not true. It is very hard to achieve this, and @jgm has mentioned this is the area he wants to improve (but cannot guarantee). The reason it is important is it guarantees the output captures what the AST represents. i.e. its importance is not only for being a linter but any reader/writer pairs in general.

See more on this topic in How to programmatically enforcing a pandoc markdown style - Google Groups. (I clicked the link I referred to this in the last post, but the link is wrong. This is not the first time I have problem posting a link to a certain post to pandoc-discuss, probably related to the mobile version of Google Groups. If the link doesn't work, search the topic there and you'll find it.)

After all, people look for pandoc because they like the idea of having a standalone single binary (ok, + citepro) tool to handle formats conversion. But if we need to install Node.js, or Python or Ruby just to access a linter than its benefits tend to dilute down (an possible, you end up installing a different linter for each format, with dozens of dependencies). @tajmone

  1. Although pandoc isn't designed as a linter, but it can't stop us from trying to use it like that. For example, I use pandoc to "define" a markdown variants I like (by using +/- extensions, filters, pre-post processors, etc.) and uses that to lint some of my md. This cannot be done by any other linter since it is a "custom markdown variant".

  2. On the other hand, if there exists a linter designed for the markdown variant you specifically use, you should almost definitely use that since that will be more reliable (since it is designed as so rather than pushing beyond what it is designed for).

I agree, it would be good to support this somehow. One
option that wouldn't require an AST change would be to parse

- [x] foo

as

[BulletList
 [[Plain [Span ("",["checkbox checked"],[]) [Str "[",Space,Str "]"],
   Space,Str "m"]]]]

@jgm

This approach is interesting, since it circumvent the need of "AST change". On one hand, it feels unnatural. But on the other hand, if it is functionally equivalent to an "AST change" without an "AST change", might be we shouldn't care too much about being "syntactically correct".

Just to bring this out explicitly, the example at the beginning of this thread is rendered by GitHub as

<ul class="contains-task-list">
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled=""> example unchecked</li>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" checked="" disabled=""> example checked</li>
</ul>

@tarleb
Copy link
Collaborator

tarleb commented Dec 15, 2016

It seems unnatural to special-case this as such a case (GFM checklist) would only happens when it is both from and to markdown_github, outside this markdown variant, it doesn't mean anything (or, no meaning has been assigned yet).

Not sure what you mean here. Org-mode has a similar feature, and checkboxes can be output in HTML and probably in docx and odf. There's a textile issue about implementing a feature like this, too.

@tajmone
Copy link
Contributor

tajmone commented Dec 15, 2016

Lots of cool suggestions here! Hopefully this feature might be first implemented via some filters or custom readers and writers, to test the grounds with different approaches ...

What could be a way to represent checkboxes in non-html formats? I remember coming across various solutions in doc/pdf, like using some common dingbats (of the sort you should find on all OSs’).

I've found an MS Office Help article suggesting use of Wingdings font.

Unicode symbols could be a more universal approach, provided the font being used contains the glyphs (I think there is a fallback mechanism for missing glyphs, resorting to use default fonts). On Wikipedia I've found some unicode symbols that might do the job:

Code Character Symbol
U+237B NOT CHECK MARK
U+2611 BALLOT BOX WITH CHECK
U+2705 WHITE HEAVY CHECK MARK
U+2713 CHECK MARK
U+2714 HEAVY CHECK MARK
U+2610 BALLOT BOX (checkbox)
U+2612 BALLOT BOX WITH X (square with cross)
U+2717 BALLOT X (cross)
U+2718 HEAVY BALLOT X (bold cross)

The choice is between using a pair of checkbox symbols (ticked box / empty box) or check marks (tick / cross). The latter is often confusing.

Here in Italy we use both systems, and the checkboxes can be interpreted differently, depending on whether or not there is a distinction between check- and X-marks:

(3-states approach)
 [v] = yes           |   [x] = no   |   [ ] = irrelevant

or:

(boolean approach)
 [x]  or  [v] = yes  |   [ ] = no

I like the GFM checkbox because it is clearly a yes/no binary choice. But maybe for formats other than html there might be some other standard ways in place, which I am not aware of.

@ickc
Copy link
Contributor

ickc commented Dec 15, 2016

Not sure what you mean here. Org-mode has a similar feature, and checkboxes can be output in HTML and probably in docx and odf. There's a textile issue about implementing a feature like this, too. @tarleb

If you put the quote in the context, I guess @jgm means special-casing that paricular combination in markdown writer (while leaving AST, any other writer untouched). That discussion is independent of implementing the whole feature of checklist, which in that case is no longer "special-case".

@cdornan
Copy link

cdornan commented Feb 13, 2017

I would also like this feature and am having to work around their absence. GFM task lists have become really quite prevalent. I understand the difficulties of adding them but it is surely a matter of time before they get added?

@craigforr
Copy link

+1 vote for GFM checkboxes support when exported as markdown_github.

As for presentation, I would vote for the simple <input type="checkbox" checked disabled> that GitHub itself renders, or Ballot Box and Ballot Box with Check if Unicode characters are used.

Bullets with hyphens/dashes:

  • Not checked
  • Checked

Bullets with asterisks/stars:

  • Not checked
  • Checked

And the case of the X chars used to check the box is irrelevant when GitHub renders them.

@ec1oud
Copy link

ec1oud commented Feb 27, 2017

I'm voting for this too.

@ickc
Copy link
Contributor

ickc commented Feb 27, 2017

Sorry for being the bad guy, but if one want to vote, check the emoji on the right of each message. The difference is that won't notify people and causes spams.

e.g. you can read more about this in Reactions to Pull Requests, Issues, and Comments · Issue #141 · dear-github/dear-github.

In some repo, thread like this will be locked very soon (but not pandoc because developers here are nice). It is not that developers don't see a value in this issue (see @jgm's comment above for example), but it is difficult to handle it properly (and if one want a hackish approach, suggestions has already been made above).

@ec1oud
Copy link

ec1oud commented Feb 28, 2017

Sorry.

I actually wanted to start by just showing some formatting (including checkboxes) in a terminal-based viewer. One that I can use as a less filter without needing to invoke a web browser just for that. There are hacks where less invokes pandoc to generate man output which is then processed by groff and then viewed with man; or, have it generate html and then view with w3m or lynx. But it's practically plain text with just a little formatting, usually, so that all seems like overkill to me. I'm kindof surprised that when working with github repos that typically contain markdown files, nobody has a better way to just view the markdown nicely. Or edit it in anything other than a plain text-editor, or some ridiculous webkit/javascript mashup.

So I started throwing one together in go for now: https://github.com/ec1oud/mdcat (and using my fork of blackfriday https://github.com/ec1oud/blackfriday ) mainly because the blackfriday parser seemed like a good starting point, and because I've been curious about go. (Probably Haskell is better, but I haven't gotten around to climbing that learning curve yet.)

At some point hopefully the world will stop calling this feature something from "github markdown" and expect it to be part of markdown itself. A de-facto extension, or even part of the standard. IMO it's one of the most useful extensions of all, and it's also easy to implement.

I think pandoc should also have an output mode for ANSI terminal codes (to style some text spans, like headings and emphasized phrases) plus unicode (for checkboxes, bullets, fractions, "smartypants" quotes and ellipses etc., block quote indentation bars, and box drawing around tables). Then it could be used directly as a filter for less.

@ickc
Copy link
Contributor

ickc commented Feb 28, 2017

IMO it's one of the most useful extensions of all, and it's also easy to implement.

If I understand you correctly that you mean it is easy to implement GitHub checklist in pandoc, then my point all along is that it is actually not. Try to follow the discussion above.

P.S. I'd consider discussion like this helpful though, unlike the voting message above. And I've been there too, so don't worry.

@ec1oud
Copy link

ec1oud commented Feb 28, 2017

Because you have an AST, it needs to be extended for this. I get it.

@nichtich
Copy link
Contributor

nichtich commented Nov 2, 2018

@lollipopman I'd expect the html writer to output with extension +checkboxes:

<ul>
<li><input type="checkbox"> example unchecked</li>
<li><input type="checkbox" checked> example checked</li>
</ul>

and with -checkboxes:

<ul>
<li>☐ example unchecked</li>
<li>☒ example checked</li>
</ul>

what more do you want?

@lollipopman
Copy link
Contributor

I was thinking something like this:

<ul>
  <li><input type="checkbox" checked />checked</li>
  <li><input type="checkbox" unchecked />unchecked</li>
</ul>  

2018-11-02-144436_215x69_scrot

@tarleb
Copy link
Collaborator

tarleb commented Nov 2, 2018

@lollipopman the task-list filter posted above does exactly that. It serves as a stopgap measure until the best way to handle checkboxes has been decided.

@mb21
Copy link
Collaborator

mb21 commented Nov 3, 2018

This requires no change to the AST because checkboxes are internally represented by Unicode characters.

That sounds like a great idea! The only output format that has a special way to represent checkboxes is HTML anyway, and for the others the unicode is a perfect fallback – which we get for free in this proposal.

@quasicomputational
Copy link
Contributor

If the HTML writer does some magic for that character, how would you literally have that character in HTML output?

@mb21
Copy link
Collaborator

mb21 commented Nov 3, 2018

@quasicomputational Yes, the markdown writer should also be sensitive to the checkboxes option. But I think it's fine to enable by default... alternatively, you could use a filter to transform to raw html or generic raw attributes.

@OleMussmann
Copy link

@lollipopman

I was thinking something like this:

<ul>
  <li><input type="checkbox" checked />checked</li>
  <li><input type="checkbox" unchecked />unchecked</li>
</ul>  

2018-11-02-144436_215x69_scrot

As far as I know the Github checkboxes don't have bullets and are disabled. Should be something along these lines, then:

<ul style="list-style-type: none; padding: 0 7px;">
    <li><input type="checkbox" checked disabled> pet kittens </li>
    <li><input type="checkbox" disabled> world domination </li>
</ul>

Some padding might be necessary for indentation.

screenshot from 2018-11-27 15-40-07

@lollipopman
Copy link
Contributor

@OleMussmann I agree that it looks better without the bullet point

@jgm
Copy link
Owner

jgm commented Nov 27, 2018

One possibility would be to parse

- [x] Foo
- [ ] Bar

into the pandoc structure

Div ("",["checklist"],[])
 [ BulletList
   [ [Plain [Span ("",["checkbox","checked"],[]) [Str "",Space], Str "Foo"]]
   , [Plain [Span ("",["checkbox","unchecked"],[]) [Str "",Space], Str "Bar"]]
   ]
 ]

In most formats, this would come out as a bullet list with the unicode checkbox characters.
But specific writers could be taught to give special output, e.g. the HTML output suggested above.

@mb21
Copy link
Collaborator

mb21 commented Nov 28, 2018

I suppose the extra Span wrapping the unicode string lowers the probability of a clash even further, though I guess it's not strictly necessary. But yeah, maybe it's somewhat cleaner...

@jgm
Copy link
Owner

jgm commented Nov 28, 2018 via email

mb21 added a commit to mb21/pandoc that referenced this issue Dec 10, 2018
mb21 added a commit to mb21/pandoc that referenced this issue Dec 15, 2018
mb21 added a commit to mb21/pandoc that referenced this issue Dec 15, 2018
mb21 added a commit to mb21/pandoc that referenced this issue Dec 17, 2018
mb21 added a commit to mb21/pandoc that referenced this issue Dec 22, 2018
mb21 added a commit to mb21/pandoc that referenced this issue Jan 1, 2019
closes jgm#3051

changes CommonMark Writer to output raw "markdown"
mb21 added a commit to mb21/pandoc that referenced this issue Jan 1, 2019
closes jgm#3051

changes CommonMark Writer to output raw "markdown"
mb21 added a commit to mb21/pandoc that referenced this issue Jan 1, 2019
closes jgm#3051

changes CommonMark Writer to output raw "markdown"
mb21 added a commit to mb21/pandoc that referenced this issue Jan 1, 2019
closes jgm#3051

changes CommonMark Writer to output raw "markdown"
mb21 added a commit to mb21/pandoc that referenced this issue Jan 1, 2019
closes jgm#3051

changes CommonMark Writer to output raw "markdown"
mb21 added a commit to mb21/pandoc that referenced this issue Jan 1, 2019
closes jgm#3051

changes CommonMark Writer to output raw "markdown"
@jgm jgm closed this as completed in #5139 Jan 2, 2019
jgm pushed a commit that referenced this issue Jan 2, 2019
@zethdubois
Copy link

I have a hack that I'm using to render a box next to a text list.

I'm writing Markdown in Obsidian, and it allows for LaTeX math symbols, natively.

$\Box$ This is left side checkbox

Check box to the right of this line of text $\Box$

Pandoc conversions of the Markdown to pdf using --pdf-engine=xelatex render to PDF with no issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.