New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New syntax for raw source #38

Closed
fletcher opened this Issue Mar 6, 2017 · 17 comments

Comments

Projects
None yet
4 participants
@fletcher
Owner

fletcher commented Mar 6, 2017

Since the early days of MMD, HTML comments could be used to pass raw LaTeX (or whatever) to the exported file. This was a bit of a hack, and while useful, it's time to rethink it.

I think the best approach now would be a new syntax that allows passing raw text to the output, based on which output format is selected. This would allow customizing the output based on the chosen format. This can already be done using wildcard transclusion, but in that case the text would still be processed as MultiMarkdown.

@jasedit

This comment has been minimized.

Show comment
Hide comment
@jasedit

jasedit Mar 7, 2017

Contributor

The ability to do this is something that I use extensively in writing academic papers, so I'm very interested in the direction you take with this.

I'm going to guess one proposal under consideration would be to introduce a new block syntax which can be annotated with some type information. I suppose you would potentially want to have a way to indicate that blocks are alternatives of one another, so something like:

|latex|
\LaTeX text goes here
||
|html|
<p> HTML text goes here </p>
||

or (with marking variants of the same block)

/// latex
\LaTeX text goes here
\\\ html
<p> HTML text goes here </p>
///

Do you have a list of options in mind, or are you still brainstorming this change?

Contributor

jasedit commented Mar 7, 2017

The ability to do this is something that I use extensively in writing academic papers, so I'm very interested in the direction you take with this.

I'm going to guess one proposal under consideration would be to introduce a new block syntax which can be annotated with some type information. I suppose you would potentially want to have a way to indicate that blocks are alternatives of one another, so something like:

|latex|
\LaTeX text goes here
||
|html|
<p> HTML text goes here </p>
||

or (with marking variants of the same block)

/// latex
\LaTeX text goes here
\\\ html
<p> HTML text goes here </p>
///

Do you have a list of options in mind, or are you still brainstorming this change?

@fletcher

This comment has been minimized.

Show comment
Hide comment
@fletcher

fletcher Mar 7, 2017

Owner

Still brainstorming. And realizing that there are actually several things happening here, and maybe I should make sure they all play nicely together (it could be pretty powerful if done correctly).

  1. Transclusion places the contents of one file "inside" another file, and then processes the MMD syntax.

  2. Transclusion offers a wildcard syntax that transcludes different files depending on the output format.

  3. Passing raw source is "sort of" a form of transclusion, but occurs after the MMD has been processed and doesn't require a separate file. Obviously, passing raw HTML into a LaTeX document is not particularly useful, but that doesn't necessarily mean I should prevent that from happening.

  4. Including different text based on the chosen output is useful, and would not necessarily need to be restricted to including raw source. For example, there are times where it is helpful to include {{TOC}} to trigger a table of contents, but this is not necessarily needed in LaTeX documents that do this automatically.

So, it seems to me that there are three separate, somewhat orthogonal, things:

  1. Transclude contents of a file

  2. Pass text without parsing it as MMD

  3. Change behavior based on the chosen output format.

Proper syntax for each of these three things, that allows combinations of options for each of the three would be very powerful indeed.

Owner

fletcher commented Mar 7, 2017

Still brainstorming. And realizing that there are actually several things happening here, and maybe I should make sure they all play nicely together (it could be pretty powerful if done correctly).

  1. Transclusion places the contents of one file "inside" another file, and then processes the MMD syntax.

  2. Transclusion offers a wildcard syntax that transcludes different files depending on the output format.

  3. Passing raw source is "sort of" a form of transclusion, but occurs after the MMD has been processed and doesn't require a separate file. Obviously, passing raw HTML into a LaTeX document is not particularly useful, but that doesn't necessarily mean I should prevent that from happening.

  4. Including different text based on the chosen output is useful, and would not necessarily need to be restricted to including raw source. For example, there are times where it is helpful to include {{TOC}} to trigger a table of contents, but this is not necessarily needed in LaTeX documents that do this automatically.

So, it seems to me that there are three separate, somewhat orthogonal, things:

  1. Transclude contents of a file

  2. Pass text without parsing it as MMD

  3. Change behavior based on the chosen output format.

Proper syntax for each of these three things, that allows combinations of options for each of the three would be very powerful indeed.

@fletcher

This comment has been minimized.

Show comment
Hide comment
@fletcher

fletcher Mar 7, 2017

Owner

Actually, the three things may better be described as:

  1. Pass text without processing as MMD

  2. Include text from a file

  3. Change behavior based on output format.

One could therefore:

  • Transclude text from a file but don't process as MMD.

  • Include a section of text as raw LaTeX, but only if exporting to LaTeX

  • Use different variations of an image depending on which export format we are using

  • etc, etc,

Owner

fletcher commented Mar 7, 2017

Actually, the three things may better be described as:

  1. Pass text without processing as MMD

  2. Include text from a file

  3. Change behavior based on output format.

One could therefore:

  • Transclude text from a file but don't process as MMD.

  • Include a section of text as raw LaTeX, but only if exporting to LaTeX

  • Use different variations of an image depending on which export format we are using

  • etc, etc,

@jasedit

This comment has been minimized.

Show comment
Hide comment
@jasedit

jasedit Mar 7, 2017

Contributor

It seems as though the syntax to disable processing MMD cannot be a block style, because enclosing the other operations inside it would disable the ability to execute those commands (or have the disabling command only disable some aspects of the MMD parsing, which seems counter-intuitive.) My current thought would be to have the syntax to disable MMD processing work more as a decorator than a block syntax (e.g. {{!!raw.mmd}} and !!! text block !!!.)

It would be useful if this could also be used for frontmatter lines, replacing some of the author vs. latex author rules. It could also potentially help with the input/footer commands.

Contributor

jasedit commented Mar 7, 2017

It seems as though the syntax to disable processing MMD cannot be a block style, because enclosing the other operations inside it would disable the ability to execute those commands (or have the disabling command only disable some aspects of the MMD parsing, which seems counter-intuitive.) My current thought would be to have the syntax to disable MMD processing work more as a decorator than a block syntax (e.g. {{!!raw.mmd}} and !!! text block !!!.)

It would be useful if this could also be used for frontmatter lines, replacing some of the author vs. latex author rules. It could also potentially help with the input/footer commands.

@fletcher

This comment has been minimized.

Show comment
Hide comment
@fletcher

fletcher May 2, 2017

Owner

I'm wondering if using a modification of the code span\code block syntax to indicate that the contents are output as is, only if the selected output format matches that which is specified.

Something like:

```[latex]
This is raw \latex only.
```

(I'm not proposing that exact syntax, but demonstrating a concept).

This would indicated that the selected text is not a code block, but rather raw text to be copied verbatim, if the output format is latex.

Owner

fletcher commented May 2, 2017

I'm wondering if using a modification of the code span\code block syntax to indicate that the contents are output as is, only if the selected output format matches that which is specified.

Something like:

```[latex]
This is raw \latex only.
```

(I'm not proposing that exact syntax, but demonstrating a concept).

This would indicated that the selected text is not a code block, but rather raw text to be copied verbatim, if the output format is latex.

@iandol

This comment has been minimized.

Show comment
Hide comment
@iandol

iandol May 2, 2017

Hi, Pandoc is debating a similar issue (lots of discussion) and would be neat if you'd consider a shared syntax (it is a shame when markdown variants diverge).

jgm/pandoc#3537

iandol commented May 2, 2017

Hi, Pandoc is debating a similar issue (lots of discussion) and would be neat if you'd consider a shared syntax (it is a shame when markdown variants diverge).

jgm/pandoc#3537

@fletcher

This comment has been minimized.

Show comment
Hide comment
@fletcher

fletcher May 2, 2017

Owner

There's some additional discussion on the pandoc issues page. It's not clear that a convergent syntax will work:

  1. I understand why pandoc may head in the direction of using something similar to the existing attributes syntax {foo}.

  2. Since MMD doesn't have that syntax, I have more flexibility.

The question may come down to whether it's better for MMD to use a syntax that is similar to, but different than, that used by pandoc? Or is it better to have a different syntax that is less likely to be confused between the two programs?

Owner

fletcher commented May 2, 2017

There's some additional discussion on the pandoc issues page. It's not clear that a convergent syntax will work:

  1. I understand why pandoc may head in the direction of using something similar to the existing attributes syntax {foo}.

  2. Since MMD doesn't have that syntax, I have more flexibility.

The question may come down to whether it's better for MMD to use a syntax that is similar to, but different than, that used by pandoc? Or is it better to have a different syntax that is less likely to be confused between the two programs?

@iandol

This comment has been minimized.

Show comment
Hide comment
@iandol

iandol May 3, 2017

My point as someone who uses both MMD and Pandoc is that I'd rather have as similar syntax as possible.

Curly braces seem fine to me. I agree that Pandoc have to balance their other uses of curly braces and it may require a secondary character like : or = which would be fine for Pandoc but would be technically unnecessary for MMD. But personally, I'd still prefer to have to use {=html} and retain compatibility than use the ever so slightly simpler [html]. But MMD-only users who do not need Pandoc to extend MMD's output flexibility may think otherwise.

iandol commented May 3, 2017

My point as someone who uses both MMD and Pandoc is that I'd rather have as similar syntax as possible.

Curly braces seem fine to me. I agree that Pandoc have to balance their other uses of curly braces and it may require a secondary character like : or = which would be fine for Pandoc but would be technically unnecessary for MMD. But personally, I'd still prefer to have to use {=html} and retain compatibility than use the ever so slightly simpler [html]. But MMD-only users who do not need Pandoc to extend MMD's output flexibility may think otherwise.

@iandol

This comment has been minimized.

Show comment
Hide comment
@iandol

iandol May 3, 2017

Also, {.attributes} would be a cool addition for MMD and if you may consider it in the future, would it weigh your current decision? Using the Pandoc syntax would make much more sense then.

iandol commented May 3, 2017

Also, {.attributes} would be a cool addition for MMD and if you may consider it in the future, would it weigh your current decision? Using the Pandoc syntax would make much more sense then.

@fletcher

This comment has been minimized.

Show comment
Hide comment
@fletcher

fletcher May 3, 2017

Owner

The problem for me is largely the poor aesthetics of {=html}.

I don't intend on supporting pandoc-style attributes. I understand why some people want them, but:

  1. I bet in vast majority of cases they are unnecessary. Clever CSS can do quite a bit without them.

  2. They serve a different purpose/user base -- at some point additional features start to drive unneeded complexity in Markdown documents. The great beauty of Markdown is its simplicity, and every new feature risks tipping the balance towards feature bloat. MultiMarkdown is designed, as much as possible, to minimize the friction between a writer and a reasonably polished final product, without the writer needing to be a computer programmer to make that happen. ;)

Owner

fletcher commented May 3, 2017

The problem for me is largely the poor aesthetics of {=html}.

I don't intend on supporting pandoc-style attributes. I understand why some people want them, but:

  1. I bet in vast majority of cases they are unnecessary. Clever CSS can do quite a bit without them.

  2. They serve a different purpose/user base -- at some point additional features start to drive unneeded complexity in Markdown documents. The great beauty of Markdown is its simplicity, and every new feature risks tipping the balance towards feature bloat. MultiMarkdown is designed, as much as possible, to minimize the friction between a writer and a reasonably polished final product, without the writer needing to be a computer programmer to make that happen. ;)

@iandol

This comment has been minimized.

Show comment
Hide comment
@iandol

iandol May 3, 2017

I do agree that CSS selectors can probably solve many cases, but attributes are a way of attaching semantic labels to items, not just to do styling. And most of these extensions to markdown are all optional, so a user can keep writing simple markdown. For example, I personally don't need raw source, but do recognise other users may, and it doesn't affect me if there is a way to do this in MMD, I just don't need to use it. I personally appreciate that one day, if I suddenly need to be able to generate an INFO box in a document for two output formats, I can use raw code (or attributes in Pandoc), and my future self wins.

Of course there is no black and white in this feature space, some people need raw source, some need more flexible attribute handling, some need both! You need to define what you think is the best balance for MMD (it is your baby and you have the right to say "I think X is fugly!"), and though I would benefit from better convergence, perhaps you and most other users do not have the same workflow requirements.

iandol commented May 3, 2017

I do agree that CSS selectors can probably solve many cases, but attributes are a way of attaching semantic labels to items, not just to do styling. And most of these extensions to markdown are all optional, so a user can keep writing simple markdown. For example, I personally don't need raw source, but do recognise other users may, and it doesn't affect me if there is a way to do this in MMD, I just don't need to use it. I personally appreciate that one day, if I suddenly need to be able to generate an INFO box in a document for two output formats, I can use raw code (or attributes in Pandoc), and my future self wins.

Of course there is no black and white in this feature space, some people need raw source, some need more flexible attribute handling, some need both! You need to define what you think is the best balance for MMD (it is your baby and you have the right to say "I think X is fugly!"), and though I would benefit from better convergence, perhaps you and most other users do not have the same workflow requirements.

@fletcher

This comment has been minimized.

Show comment
Hide comment
@fletcher

fletcher May 3, 2017

Owner

The problem is that adding each feature has an associated cost (in initial implementation, upkeep, and potential conflict with other features in the future), even if a given individual never uses it. Microsoft Word probably has more features than any other word processor, but I'd be hard pressed to find any sane individual who thinks its a good program.

The balance I try to strike with MMD is the minimum feature set to cover 80% of the needs of 80% of users. It's a vague metric, but valuable nonetheless. And I'm not saying it's right for everyone, but that's why we have choice. ;)

We'll see -- maybe the syntax will converge, but even if it doesn't, I suspect the difference would be trivial to interconvert with a regular expression.

Owner

fletcher commented May 3, 2017

The problem is that adding each feature has an associated cost (in initial implementation, upkeep, and potential conflict with other features in the future), even if a given individual never uses it. Microsoft Word probably has more features than any other word processor, but I'd be hard pressed to find any sane individual who thinks its a good program.

The balance I try to strike with MMD is the minimum feature set to cover 80% of the needs of 80% of users. It's a vague metric, but valuable nonetheless. And I'm not saying it's right for everyone, but that's why we have choice. ;)

We'll see -- maybe the syntax will converge, but even if it doesn't, I suspect the difference would be trivial to interconvert with a regular expression.

@fletcher

This comment has been minimized.

Show comment
Hide comment
@fletcher

fletcher May 3, 2017

Owner

One open question is the potential significance of the fact that a fenced code block falls back to a code span in regular Markdown:

```
foo
```

becomes:

<p><code>
foo
</code></p>

I think it's worth considering that using the same syntax for code blocks and spans would be useful in this scenario. I recognize that neither would do any good when processed by a program that doesn't "understand" this feature, but it seems unwise to ignore the possible benefit prematurely. At the very least, we should consider what things will look like when passed through a vanilla Markdown processor.

The first question is where should the format marker go? I see a few options (arbitrarily using {foo} as a placeholder):

  1. Inside the code block/span, at the front. (e.g. {foo} bar)

  2. Inside the code, at the end. (bar {foo})

  3. Before the code. ({foo} bar)

  4. After the code (bar {foo})

Some things to consider are what the raw source looks like, what the result looks like when passed through a plain Markdown processor, and what fits well with existing syntax.

Owner

fletcher commented May 3, 2017

One open question is the potential significance of the fact that a fenced code block falls back to a code span in regular Markdown:

```
foo
```

becomes:

<p><code>
foo
</code></p>

I think it's worth considering that using the same syntax for code blocks and spans would be useful in this scenario. I recognize that neither would do any good when processed by a program that doesn't "understand" this feature, but it seems unwise to ignore the possible benefit prematurely. At the very least, we should consider what things will look like when passed through a vanilla Markdown processor.

The first question is where should the format marker go? I see a few options (arbitrarily using {foo} as a placeholder):

  1. Inside the code block/span, at the front. (e.g. {foo} bar)

  2. Inside the code, at the end. (bar {foo})

  3. Before the code. ({foo} bar)

  4. After the code (bar {foo})

Some things to consider are what the raw source looks like, what the result looks like when passed through a plain Markdown processor, and what fits well with existing syntax.

@jasedit

This comment has been minimized.

Show comment
Hide comment
@jasedit

jasedit May 25, 2017

Contributor

Any update/plans on adding this to MMD6? This is currently the major blocker on switching for me. It isn't a high priority, but it would be nice to be able to make the switch and take advantage of the other nice features.

Contributor

jasedit commented May 25, 2017

Any update/plans on adding this to MMD6? This is currently the major blocker on switching for me. It isn't a high priority, but it would be nice to be able to make the switch and take advantage of the other nice features.

@fletcher

This comment has been minimized.

Show comment
Hide comment
@fletcher

fletcher May 25, 2017

Owner

I don't have a specific timeline -- once I add this, I "can't" undo it. So I really want to implement this in the best possible way. I was hoping to get some more input, or see more discussion on the pandoc side. But it seems to have quieted down.

The things that need to be decided:

  1. Syntax for marking fenced code blocks as raw, and for which format.

  2. Syntax for marking code spans as raw, and for which format.

  3. Can raw blocks/spans be marked for more than one output format? What about wildcards? (I think yes, but it would have implications for the marker syntax.)

  4. If the output format specified is invalid (e.g. 'word'), do we ignore and treat as regular block/span? Or do we ignore the text for all outputs (since there is no valid output)? I think the latter.

Owner

fletcher commented May 25, 2017

I don't have a specific timeline -- once I add this, I "can't" undo it. So I really want to implement this in the best possible way. I was hoping to get some more input, or see more discussion on the pandoc side. But it seems to have quieted down.

The things that need to be decided:

  1. Syntax for marking fenced code blocks as raw, and for which format.

  2. Syntax for marking code spans as raw, and for which format.

  3. Can raw blocks/spans be marked for more than one output format? What about wildcards? (I think yes, but it would have implications for the marker syntax.)

  4. If the output format specified is invalid (e.g. 'word'), do we ignore and treat as regular block/span? Or do we ignore the text for all outputs (since there is no valid output)? I think the latter.

@ttscoff

This comment has been minimized.

Show comment
Hide comment
@ttscoff

ttscoff Jun 8, 2017

I'll admit I'm kind of jumping into this conversation and have only skimmed it. I apologize if these comments are redundant. But just to complicate the issue further without actually helping anything:

Marked 2 uses a syntax like <<[filename] to transclude markdown text files. The brackets can be changed to () to insert as code (lang definition guessed by file extension and wrapped in pre/code blocks), and {} to insert raw. Raw inserts are done post-processing with MMD, bypassing any processing of included text.

I've also added (for the next update) support for IA Writer's "content block" syntax. It just uses a forward slash at the beginning of a line followed by a filename to include any file, automatically determining how to handle it based on extension (image, code block, markdown, etc). It does a pretty good job, and it seems like specific handling for .tex files might be safe to assume in any kind of implementation. (side note, it's conversion of csv files to MMD tables is pretty cool, and I've implemented that with the update).

Given that you want this to be universal and triggered by syntax, though, I really like jasedit's decorator idea above, using {{!!foo.md}} in the same way that Marked uses <<{foo.md}).

For non-transcluded raw text, I do think that a similar code block language modifier (```[latex]) is a syntactically clean option, though because most Markdown implementations use that space for language definitions that affect a class applied to the pre or code block, it wouldn't be portable. I know Fletcher has some reservations about Pandoc attributes, but I would argue for their usefulness. I use them frequently (via Kramdown) on my blog, where I can't just target a bold or emphasis tag within the global CSS. I also use them for adding attributes like rel="nofollow" to links where needed. And anything that keeps me from having to run regex reformats when switching between Markdown processors is welcome in my workflow.

I will, of course, implement whatever is decided for MMD6 in Marked. When MMD or GFM is selected as a processor, Marked normalizes syntax as much as possible, aiding in portability to some extent.

ttscoff commented Jun 8, 2017

I'll admit I'm kind of jumping into this conversation and have only skimmed it. I apologize if these comments are redundant. But just to complicate the issue further without actually helping anything:

Marked 2 uses a syntax like <<[filename] to transclude markdown text files. The brackets can be changed to () to insert as code (lang definition guessed by file extension and wrapped in pre/code blocks), and {} to insert raw. Raw inserts are done post-processing with MMD, bypassing any processing of included text.

I've also added (for the next update) support for IA Writer's "content block" syntax. It just uses a forward slash at the beginning of a line followed by a filename to include any file, automatically determining how to handle it based on extension (image, code block, markdown, etc). It does a pretty good job, and it seems like specific handling for .tex files might be safe to assume in any kind of implementation. (side note, it's conversion of csv files to MMD tables is pretty cool, and I've implemented that with the update).

Given that you want this to be universal and triggered by syntax, though, I really like jasedit's decorator idea above, using {{!!foo.md}} in the same way that Marked uses <<{foo.md}).

For non-transcluded raw text, I do think that a similar code block language modifier (```[latex]) is a syntactically clean option, though because most Markdown implementations use that space for language definitions that affect a class applied to the pre or code block, it wouldn't be portable. I know Fletcher has some reservations about Pandoc attributes, but I would argue for their usefulness. I use them frequently (via Kramdown) on my blog, where I can't just target a bold or emphasis tag within the global CSS. I also use them for adding attributes like rel="nofollow" to links where needed. And anything that keeps me from having to run regex reformats when switching between Markdown processors is welcome in my workflow.

I will, of course, implement whatever is decided for MMD6 in Marked. When MMD or GFM is selected as a processor, Marked normalizes syntax as much as possible, aiding in portability to some extent.

@fletcher

This comment has been minimized.

Show comment
Hide comment
@fletcher

fletcher Jun 9, 2017

Owner

Here's what I've decided.

  1. The visual distinction between the less cluttered variants and the (presumably) pandoc-compatible versions are minimal.

  2. There is a benefit to being compatible with other variants. The magnitude of this benefit is debatable.

So, code spans can be specified as raw source using the following syntax:

foo `*bar*`{=html}

You can use wildcard matching, though this is probably of minimal utility, since raw text will almost never work properly across all formats:

foo `*bar*`{=*}

Code blocks are similar:

```{=latex}
*foo*
```

I pushed this to the develop branch, and will consider whether to keep/change it. Feedback welcome!

Owner

fletcher commented Jun 9, 2017

Here's what I've decided.

  1. The visual distinction between the less cluttered variants and the (presumably) pandoc-compatible versions are minimal.

  2. There is a benefit to being compatible with other variants. The magnitude of this benefit is debatable.

So, code spans can be specified as raw source using the following syntax:

foo `*bar*`{=html}

You can use wildcard matching, though this is probably of minimal utility, since raw text will almost never work properly across all formats:

foo `*bar*`{=*}

Code blocks are similar:

```{=latex}
*foo*
```

I pushed this to the develop branch, and will consider whether to keep/change it. Feedback welcome!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment