Markdown #431

gaborcsardi · 2015-11-07T23:43:58Z

From #365. This is not ready yet for merging, but it is close. :)

This PR adds markdown parsing for fields: title, description, details, references, concept, note, seealso, keywords, return, author, section, format, source, param, slot, field, method.

It should leave Rd notation unaffected. One glitch I noticed is that the commonmark parser removes leading whitespace. This is mostly fine with Rd, whitespace is not significant, except within \preformatted{}, so I have a workaround for that. (FIXME: any other case where leading whitespace matters?)

It is possible that some Rd notation is picked up as markdown, but all the examples I could come up with were quite artificial. Nevertheless we should have a @nomarkdown tag that forbids Markdown parsing.

For the supported notation, see the test cases and the readme at maxygen: https://github.com/gaborcsardi/maxygen

TODO:

Write a vignette, once we decided about all the details.
Add a @nomarkdown tag. Or @noMd is better.

Feedback welcome!

hadley · 2015-11-09T15:24:59Z

NEWS.md

@@ -7,6 +7,10 @@

 * Fixed bug in `@noRd`, where usage would cause error (#418).

+* Most fields can now be written using Markdown markup instead of the


FWIW I think it's more robust to always put new items at the top of the list

gaborcsardi · 2015-11-12T11:04:40Z

What do you think about the @nomarkdown tag? Is that OK?

hadley · 2015-11-12T12:53:06Z

I think it should be @noMarkdown to be consistent with other tags, or maybe just @noMd to be consistent with @noRd

gaborcsardi · 2015-11-12T12:54:15Z

OK, @noMd it is.

gaborcsardi · 2015-11-15T12:00:47Z

Here is a problem I just found. The Md emphasis markers are picked up from different lines, and this is sometimes a problem. E.g.

    #' Title
    #'
    #' Description with some *keywords* included.
    #' So far so good. \\preformatted{ *these are not
    #'   emphasised*. Or are they?
    #' }

results

\description{
Description with some \emph{keywords} included. So far so
good. \preformatted{ \emph{these are not emphasised}. Or are
they? }
}

This is of course correct for Md, but it will screw up Rd. _ is similar, I guess.
Possible solutions:

escape (replace, really) * and _ within \preformatted and \eqn before
the markdown parsing, and then put them back.
do not parse markdown within \preformatted and \eqn at all.

The second is more sensible, I think. I would remove them completely before the parsing,
and then put them back at the appropriate place after. You would not want to write Md within
\preformatted and \eqn, anyway, right?

Toughts? @hadley @jeroenooms ?

gaborcsardi · 2015-11-18T10:14:39Z

I chatted with @jeroenooms about this. Here are three good options to make sure that Rd markup still works as expected after merging this PR.

The first one is simple, we require a @md tag for docs that is in Markdown. If @md is present, everything is parsed as md. This could be specified at the package level or the R object level.
The second one is simple conceptually, it is a bit harder to implement. We would protect all Rd commands from the Markdown parser, so they would be not parsed at all. This also means that if you have an \itemize{}, you cannot use Markdown in it. The implementation would just replace Rd tags with markers before calling the md parser, and then put them back afterwards. This is similar to how Markdown is not interpreted within HTML.
This is similar to the previous one, but we would only apply it to Rd tags that are potentially dangerous: \code, \preformatted, \deqn and \eqn, etc. The complete list can be assembled from https://developer.r-project.org/parseRd.pdf We would protect tags which most probably do not contain markdown markup, but might contain text, code or markup that is accidentally interpreted by the markdown parser. The only annoying thing with this solution is that if your Rd docs is still picked up as md, there might be no errors or warnings, your docs output will just be incorrect. But I don't think this would happen too often.

I think I prefer number three. It is potentially (slightly) dangerous, but it would allow a seamless transition, and I think it would work just fine in almost all cases.

@hadley What do you think?

hadley · 2015-12-02T04:10:33Z

3 sounds good to me.

gaborcsardi · 2016-01-01T23:43:44Z

This is actually quite tricky. Ideally you would want to parse the text with Rd first, so that you can be sure that you find all the Rd tags, without any errors. But of course we don't want to do that, and with the markdown markup included, the text might not even parse as Rd.

So the plan is to have a very simple parser, that only cares about \{}% characters. It would find the Rd macro arguments that we want to ignore, and replace them with markers, before the Md parsing. And then put them back in the end.

I am a bit afraid that writing this in R will be somewhat slow, but we'll see. I'll include the list of all Rd macros, and how we would handle them, in another comment.

gaborcsardi · 2016-01-01T23:46:33Z

Summary

The tables below contain all Rd macros, according to https://developer.r-project.org/parseRd.pdf and Writing R Extensions.

+ in the MD column means that we will parse the contents of the argument(s) of the macro as Markdown. - means that we will not, and roxygen will treat them verbatim.

For some macros, I am not quite sure what to do, these are marked with a question mark, and there is a short discussion about them at the end.

Sectioning macros

Macro	Text type	Roxy tag	MD
`\arguments`	latex	implicit	+
`\author`	latex	`@author`	+
`\concept`	latex	`@concepts`	+?
`\description`	latex	`@description`	+
`\details`	latex	`@details`	+
`\docType`	latex	`@docType`	-
`\encoding`	latex	`@encoding`	-
`\format`	latex	`@format`	+
`\keyword`	latex	`@keywords`	-
`\name`	latex	`@name`	-
`\note`	latex	`@note`	+
`\references`	latex	`@references`	+
`\section`	latex	`@section`	+
`\seealso`	latex	`@seealso`	+
`\source`	latex	`@source`	+
`\title`	latex	`@title`	+
`\value`	latex	`@value`	+

`\examples`	R	`@examples`	-
`\usage`	R	`@usage`	-

`\alias`	verbatim	`@aliases`	-
`\Rdversion`	verbatim		-
`\synopsis`	verbatim		-

`\Sexpr`	R		-
`\RdOpts`	verbatim		-

Markup macros within sections taking LaTeX-like text

Macro	Text type	MD
`\acronym`	latex	-?
`\bold`	latex	+
`\cite`	latex	+
`\command`	latex	+?
`\dfn`	latex	+
`\dQuote`	latex	+
`\email`	latex	-
`\emph`	latex	+
`\file`	latex	-
`\item`	latex	+
`\linkS4class`	latex	-
`\pkg`	latex	-
`\sQuote`	latex	+
`\strong`	latex	+
`\var`	latex	-

`\describe`	latex	+
`\enumerate`	latex	+
`\itemize`	latex	+

`\enc`	latex	+
`\if`	latex	+2
`\ifelse`	latex	+23
`\method`	latex	-
`\S3method`	latex	-
`\S4method`	latex	-
`\tabular`	latex	+
`\subsection`	latex	+

`\link`	latex	-
`\href`	verb+latex	+2

+2 means that MD is parsed in the second argument only. +23 means that MD is parsed in the second and third arguments, but not in the first, etc.

Markup macros within sections taking R-like text, or verbatim text.

Note, that some macros do not take arguments, these are not important for our purposes: \cr, \dots, \ldots, \R, \tab.

Macro	Text type	MD
`\code`	R	-?
`\dontshow`	R	-
`\donttest`	R	-
`\testonly`	R	-

`\dontrun`	verbatim	-
`\env`	verbatim	-
`\kbd`	verbatim	-
`\option`	verbatim	-
`\out`	verbatim	-
`\preformatted`	verbatim	-
`\samp`	verbatim	-
`\special`	verbatim	-
`\url`	verbatim	-
`\verb`	verbatim	-
`\deqn`	verbatim	-
`\eqn`	verbatim	-
`\newcommand`	verbatim	-?
`\renewcommand`	verbatim	-?

New functions, not in the original docs

Macro	Text type	Roxy tag	MD
`\figure`	special?		-

User defined macros

There are in share/Rd/macros/system.Rd and can be used in Rd files in general. Here is the current R-devel version: https://github.com/wch/r-source/blob/trunk/share/Rd/macros/system.Rd

Macro	MD
`\CRANpkg`	-
`\PR`	-
`\sspace`	-
`\packageTitle`	-
`\packageDescription`	-
`\packageAuthor`	-
`\packageMaintainer`	-
`\packageDESCRIPTION`	-
`\packageIndices`	-
`\doi`	-

Comments on some macros

\concept usually don't need MD markup, but maybe sometimes
it does, so I would allow it for now.
I guess we don't need MD within \acronym.
Is it dangerous parse MD within \command?
\code is tricky, because it is usually R code, so we don't want to parse it, but \link and \var are still interpreted. Anyway, within code, people can write Rd syntax.
\newcommand and \renewcommand are tricky, too. For now we don't parse them.

Leading whitespace is usually removed by commonmark, unless we are in ```, or in a list, etc.

This breaks currently, but it should pass once we ignore \preformatted when parsing markdown.

gaborcsardi · 2016-03-14T22:41:16Z

@hadley @jeroenooms

This is pretty close now I think.

I implemented the escaping of the "fragile" Rd tags: \code, \preformatted, etc. Within these there is no markdown parsing.
I also implemented the @noMd tag. It works at the block level, just like noRd, and uses a special marker in the tags environment, called markdown-support. Somewhat abusing the tags, but also seemed like a natural place to put it. I can certainly put it in another environment if you don't like this solution.

For the fragile tag parsing, I modified the rdComplete C code, because I needed almost the same code, to find the end of the arguments after an Rd tag.

Apart from writing the vignette, what would be the best form for documentation? Is a brief manual page, pointing to the vignette, enough? Just trying to avoid duplication, if possible....

Please test if you have time and you are brave. :) Just kidding, I think it will work mostly out of the box, and if not, you can always use @noMd.

Btw. do we (temporarily?) want to have an option to turn on markdown parsing and leave it off by default? Just to be on the safe side.

gaborcsardi · 2016-03-30T03:30:41Z

I fixed some bugs, and found a new one. More precisely, is actually not really a bug, but it is still annoying. This docs:
https://github.com/hadley/ggplot2/blob/9b5e097e0aafdca19b5e8f9d2153177eeba809fb/R/data.R#L117
are interpreted as an ordered list:

❯ cat(markdown_xml("qqqq\n  1. eeee fff ggg\nrrrr xvxcvxcvxcv"))
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE document SYSTEM "CommonMark.dtd">
<document xmlns="http://commonmark.org/xml/1.0">
  <paragraph>
    <text>qqqq</text>
  </paragraph>
  <list type="ordered" start="1" delim="period" tight="true">
    <item>
      <paragraph>
        <text>eeee fff ggg</text>
        <softbreak />
        <text>rrrr xvxcvxcvxcv</text>
      </paragraph>
    </item>
  </list>
</document>

But of course sometimes these are really meant to be ordered lists. So I am not sure what to do about this, if anything. (Other than making sure that there is not error at least, like for ggplot2.)

gaborcsardi · 2016-03-31T12:36:27Z

This is the diff between original roxygen and roxygen with this PR, for ggplot2: https://gist.github.com/gaborcsardi/baed9ffd55c072425bb926595b144324

Leading whitespace is deleted (you don't see this is in the diff, because in the diff I ignored the whitespace).
$ _ and % are not escaped
A whole \section{}{} is removed
four leading spaces are picked up as \preformatted{} when they ideally shouldn't
the spurious ordered list, mentioned in Markdown #431 (comment)

The first three I can fix, those are bugs.

The last two are supposedly features in commonmark, which are better fixed in ggplot2 I think.

I will fix these and then run it on more packages to see other possible bugs.

gaborcsardi · 2016-04-04T00:22:20Z

So, if we don't mind deviating from commonmark, we can preprocess the text, so that

Ordered lists are only picked up after an empty line
The same for block quotes

I.e. this is an ordered list:

blah blah blah

1. first
2. second

but this is not:

blah blah blah April 3,
2016.

@jeroenooms?

gaborcsardi · 2016-04-04T00:24:06Z

Btw. @hadley, thanks for the +1, please comment next time, because it seems that I don't get a notification about +1s. I am not sure if this is for the good or bad. Thanks.

jeroen · 2016-04-04T00:33:07Z

I suggest we to stick with the commonmark standard and put the responsibility of using proper markdown with the package authors. I think these edge cases are quite rare, and defining a different markdown only makes things more confusing and error prone. People will have to deal with the same issue when using markdown in vignettes or elsewhere.

We double-escape them before running the markdown parser.

Ideally we should keep leading ws in general, because that's what Roxygen currently does. But it is also tricky, because the ws might be followed by an ordered or unordered list, or a > for a block quote, etc. The parsed XML does not have the ws any more, so we need to handle it before the markdown run.

hadley · 2016-04-04T13:25:01Z

Yeah, I'd say stick with common as well.

gaborcsardi · 2016-04-05T09:53:07Z

OK, I fixed all the bugs I noticed. Still needs some testing.

I think we should turn it off by default, because I cannot reliably fix the removal of the leading whitespace. Sometimes leading whitespace is meaningful for markdown, e.g.

* This is 
* A list
    - With an embedded
    - List inside

I cannot "escape" the leading whitespace in this case, because then commonmark does not parse the list properly. So I think it is better to turn it off by default for now.

@hadley Is there a standard way of specifying Roxygen options for the whole package? If not, then maybe we could have one using the RoxygenNote field?

In addition it would make sense to have an @Md tag as well, to turn it on for a single chunk.

gaborcsardi · 2016-04-06T13:18:42Z

@hadley If there is no way currently to set options for the whole package, how about using the RoxygeNote DESCRIPTION field for that? We could have something like:

RoxygenNote: 5.0.1, markdown = TRUE

I would do this in another PR.

hadley · 2016-04-06T14:45:13Z

Maybe for this version we should just let people opt in with @md? And then in the next version we could think about global switches?

gaborcsardi · 2016-04-06T14:48:32Z

@hadley I was thinking about that, too. It is kinda painful to opt in for every single chunk.

gaborcsardi · 2016-04-06T14:48:48Z

But sure, for now, we can do that.

@md

So we have an @md tag now, instead of @nomd.

We'll add them back once we'll have a @nomd tag.

gaborcsardi · 2016-04-07T00:36:06Z

OK, it is opt-in now. There is an @md tag instead of the @noMd.

This way I think it is pretty safe to merge, in the end I did not modify any existing test cases, and all of them pass.

There is a single item on the TODO list, the vignette about markdown mentioned in #431 (comment). Maybe I can just write that in another PR, and we can start testing this. :)

EDIT: I mean, start using this. :)

gaborcsardi · 2016-04-07T08:28:16Z

@hadley Btw. when is the next Roxygen release planned?

Just because I am excited to start using this. :) And also, I need to write the vignette before that.....

Anyway, if you missed the previous comment because you are not watching this repo, this is now ready to be merged I think.

jeroen · 2016-04-14T20:38:58Z

@gaborcsardi looking into supporting readthedocs for R documentation with karthik. They also support commonmark as an input format http://docs.readthedocs.org/en/latest/getting_started.html.

gaborcsardi · 2016-04-15T08:03:40Z

@jeroenooms That would be amazing!

Somewhat different from this PR, though. Depending on what exactly you want. I guess there you want some Rd -> markdown translation, whereas this PR is the opposite way.

Anyway, getting something out from Rd and/or roxygen that you can put on readthedocs would be super nice.

hadley · 2016-08-29T19:37:58Z

@gaborcsardi I'm working on roxygen2 again, aiming for a release around September 16. I'm travelling quite a bit so I'm not sure I'll be tackling any big problems for this release, but I'm definitely available to give feedback and discuss options.

gaborcsardi · 2016-08-29T21:24:32Z

OK. I'll rebase this, and also write the vignette.

Since this is opt-in, I think it is fairly safe to merge.

It would be still nice to turn it on for the whole package. :)

gaborcsardi · 2016-08-29T21:32:44Z

@hadley A technical issue. I deleted my fork in the meanwhile, so I cannot change this PR. I'll close it down and open another one.

hadley reviewed Nov 9, 2015
View reviewed changes

gaborcsardi added 8 commits March 13, 2016 17:22

Parse markdown within fields

7e47f53

If \preformatted, then keep leading whitespace

6c59069

Leading whitespace is usually removed by commonmark, unless we are in ```, or in a list, etc.

Support Markdown style links to functions, etc.

73e5b7b

Fix NEWS.md file

0d1ea0b

Test case for Markdown & Rd interplay

6234568

This breaks currently, but it should pass once we ignore \preformatted when parsing markdown.

@Keywords is not parsed as markdown

a52c930

Escape fragile Rd tags before markdown processing

f2e131b

Support @nomd

378dab9

gaborcsardi added 3 commits March 14, 2016 23:12

Fix NEWS file, move new entry up

80eef5b

@method is not parsed as markdown

dfe5f5e

Fix for inline_html -> html_inline commonmark tag name change

30cbb82

Fix list items without contents

68eaaa3

gaborcsardi added 2 commits April 3, 2016 19:33

Don't let commonmark unescape \\%, \\_, \\$

a7c9030

We double-escape them before running the markdown parser.

gaborcsardi mentioned this pull request Apr 5, 2016

Error compiling documentation gaborcsardi/maxygen#5

Closed

gaborcsardi added 3 commits April 7, 2016 01:27

Markdown opt-in instead of opt-out

37f1787

So we have an @md tag now, instead of @nomd.

Test that markdown is off by default

6319527

Remove two skipped markdown tests

444f5e3

We'll add them back once we'll have a @nomd tag.

gaborcsardi closed this Aug 29, 2016

gaborcsardi mentioned this pull request Aug 29, 2016

Markdown, take 2 #496

Merged

gaborcsardi mentioned this pull request Sep 22, 2016

Global option for markdown #515

Merged

r2evans mentioned this pull request Sep 30, 2016

forward tokenization failed within roxygen2 examples emacs-ess/ESS#375

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Markdown #431

Markdown #431

gaborcsardi commented Nov 7, 2015

hadley Nov 9, 2015

gaborcsardi Nov 12, 2015

gaborcsardi commented Nov 12, 2015

hadley commented Nov 12, 2015

gaborcsardi commented Nov 12, 2015

gaborcsardi commented Nov 15, 2015

gaborcsardi commented Nov 18, 2015

hadley commented Dec 2, 2015

gaborcsardi commented Jan 1, 2016

gaborcsardi commented Jan 1, 2016

gaborcsardi commented Mar 14, 2016

gaborcsardi commented Mar 30, 2016

gaborcsardi commented Mar 31, 2016

gaborcsardi commented Apr 4, 2016

gaborcsardi commented Apr 4, 2016

jeroen commented Apr 4, 2016

hadley commented Apr 4, 2016

gaborcsardi commented Apr 5, 2016

gaborcsardi commented Apr 6, 2016

hadley commented Apr 6, 2016

gaborcsardi commented Apr 6, 2016

gaborcsardi commented Apr 6, 2016

gaborcsardi commented Apr 7, 2016

gaborcsardi commented Apr 7, 2016

jeroen commented Apr 14, 2016

gaborcsardi commented Apr 15, 2016

hadley commented Aug 29, 2016

gaborcsardi commented Aug 29, 2016

gaborcsardi commented Aug 29, 2016

		@@ -7,6 +7,10 @@

		* Fixed bug in `@noRd`, where usage would cause error (#418).

		* Most fields can now be written using Markdown markup instead of the

Markdown #431

Markdown #431

Conversation

gaborcsardi commented Nov 7, 2015

hadley Nov 9, 2015

Choose a reason for hiding this comment

gaborcsardi Nov 12, 2015

Choose a reason for hiding this comment

gaborcsardi commented Nov 12, 2015

hadley commented Nov 12, 2015

gaborcsardi commented Nov 12, 2015

gaborcsardi commented Nov 15, 2015

gaborcsardi commented Nov 18, 2015

hadley commented Dec 2, 2015

gaborcsardi commented Jan 1, 2016

gaborcsardi commented Jan 1, 2016

Summary

Sectioning macros

Markup macros within sections taking LaTeX-like text

Markup macros within sections taking R-like text, or verbatim text.

New functions, not in the original docs

User defined macros

Comments on some macros

gaborcsardi commented Mar 14, 2016

gaborcsardi commented Mar 30, 2016

gaborcsardi commented Mar 31, 2016

gaborcsardi commented Apr 4, 2016

gaborcsardi commented Apr 4, 2016

jeroen commented Apr 4, 2016

hadley commented Apr 4, 2016

gaborcsardi commented Apr 5, 2016

gaborcsardi commented Apr 6, 2016

hadley commented Apr 6, 2016

gaborcsardi commented Apr 6, 2016

gaborcsardi commented Apr 6, 2016

gaborcsardi commented Apr 7, 2016

gaborcsardi commented Apr 7, 2016

jeroen commented Apr 14, 2016

gaborcsardi commented Apr 15, 2016

hadley commented Aug 29, 2016

gaborcsardi commented Aug 29, 2016

gaborcsardi commented Aug 29, 2016