{{ message }}

# Markdown#431

Closed
wants to merge 17 commits into from
Closed

# Markdown#431

wants to merge 17 commits into from

## Conversation

This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters

### gaborcsardi commented Nov 7, 2015

 From #365. This is not ready yet for merging, but it is close. :) This PR adds markdown parsing for fields: title, description, details, references, concept, note, seealso, keywords, return, author, section, format, source, param, slot, field, method. It should leave Rd notation unaffected. One glitch I noticed is that the commonmark parser removes leading whitespace. This is mostly fine with Rd, whitespace is not significant, except within \preformatted{}, so I have a workaround for that. (FIXME: any other case where leading whitespace matters?) It is possible that some Rd notation is picked up as markdown, but all the examples I could come up with were quite artificial. Nevertheless we should have a @nomarkdown tag that forbids Markdown parsing. For the supported notation, see the test cases and the readme at maxygen: https://github.com/gaborcsardi/maxygen TODO: Write a vignette, once we decided about all the details. Add a @nomarkdown tag. Or @noMd is better. Feedback welcome! The text was updated successfully, but these errors were encountered:
 @@ -7,6 +7,10 @@ * Fixed bug in @noRd, where usage would cause error (#418). * Most fields can now be written using Markdown markup instead of the

FWIW I think it's more robust to always put new items at the top of the list

Done.

### gaborcsardi commented Nov 12, 2015

 What do you think about the @nomarkdown tag? Is that OK?

### hadley commented Nov 12, 2015

 I think it should be @noMarkdown to be consistent with other tags, or maybe just @noMd to be consistent with @noRd

### gaborcsardi commented Nov 12, 2015

 OK, @noMd it is.

### gaborcsardi commented Nov 15, 2015

 Here is a problem I just found. The Md emphasis markers are picked up from different lines, and this is sometimes a problem. E.g.  #' Title #' #' Description with some *keywords* included. #' So far so good. \\preformatted{ *these are not #' emphasised*. Or are they? #' }  results \description{ Description with some \emph{keywords} included. So far so good. \preformatted{ \emph{these are not emphasised}. Or are they? } }  This is of course correct for Md, but it will screw up Rd. _ is similar, I guess. Possible solutions: escape (replace, really) * and _ within \preformatted and \eqn before the markdown parsing, and then put them back. do not parse markdown within \preformatted and \eqn at all. The second is more sensible, I think. I would remove them completely before the parsing, and then put them back at the appropriate place after. You would not want to write Md within \preformatted and \eqn, anyway, right? Toughts? @hadley @jeroenooms ?

### gaborcsardi commented Nov 18, 2015

 I chatted with @jeroenooms about this. Here are three good options to make sure that Rd markup still works as expected after merging this PR. The first one is simple, we require a @md tag for docs that is in Markdown. If @md is present, everything is parsed as md. This could be specified at the package level or the R object level. The second one is simple conceptually, it is a bit harder to implement. We would protect all Rd commands from the Markdown parser, so they would be not parsed at all. This also means that if you have an \itemize{}, you cannot use Markdown in it. The implementation would just replace Rd tags with markers before calling the md parser, and then put them back afterwards. This is similar to how Markdown is not interpreted within HTML. This is similar to the previous one, but we would only apply it to Rd tags that are potentially dangerous: \code, \preformatted, \deqn and \eqn, etc. The complete list can be assembled from https://developer.r-project.org/parseRd.pdf We would protect tags which most probably do not contain markdown markup, but might contain text, code or markup that is accidentally interpreted by the markdown parser. The only annoying thing with this solution is that if your Rd docs is still picked up as md, there might be no errors or warnings, your docs output will just be incorrect. But I don't think this would happen too often. I think I prefer number three. It is potentially (slightly) dangerous, but it would allow a seamless transition, and I think it would work just fine in almost all cases. @hadley What do you think?

### hadley commented Dec 2, 2015

 3 sounds good to me.

### gaborcsardi commented Jan 1, 2016

 This is actually quite tricky. Ideally you would want to parse the text with Rd first, so that you can be sure that you find all the Rd tags, without any errors. But of course we don't want to do that, and with the markdown markup included, the text might not even parse as Rd. So the plan is to have a very simple parser, that only cares about \{}% characters. It would find the Rd macro arguments that we want to ignore, and replace them with markers, before the Md parsing. And then put them back in the end. I am a bit afraid that writing this in R will be somewhat slow, but we'll see. I'll include the list of all Rd macros, and how we would handle them, in another comment.

## Summary

The tables below contain all Rd macros, according to https://developer.r-project.org/parseRd.pdf and Writing R Extensions.

+ in the MD column means that we will parse the contents of the argument(s) of the macro as Markdown. - means that we will not, and roxygen will treat them verbatim.

For some macros, I am not quite sure what to do, these are marked with a question mark, and there is a short discussion about them at the end.

## Sectioning macros

Macro Text type Roxy tag MD
\arguments latex implicit +
\author latex @author +
\concept latex @concepts +?
\description latex @description +
\details latex @details +
\docType latex @docType -
\encoding latex @encoding -
\format latex @format +
\keyword latex @keywords -
\name latex @name -
\note latex @note +
\references latex @references +
\section latex @section +
\seealso latex @seealso +
\source latex @source +
\title latex @title +
\value latex @value +
\examples R @examples -
\usage R @usage -
\alias verbatim @aliases -
\Rdversion verbatim -
\synopsis verbatim -
\Sexpr R -
\RdOpts verbatim -

## Markup macros within sections taking LaTeX-like text

Macro Text type Roxy tag MD
\acronym latex -?
\bold latex +
\cite latex +
\command latex +?
\dfn latex +
\dQuote latex +
\email latex -
\emph latex +
\file latex -
\item latex +
\linkS4class latex -
\pkg latex -
\sQuote latex +
\strong latex +
\var latex -
\describe latex +
\enumerate latex +
\itemize latex +
\enc latex +
\if latex +2
\ifelse latex +23
\method latex -
\S3method latex -
\S4method latex -
\tabular latex +
\subsection latex +
\link latex -
\href verb+latex +2

+2 means that MD is parsed in the second argument only. +23 means that MD is parsed in the second and third arguments, but not in the first, etc.

## Markup macros within sections taking R-like text, or verbatim text.

Note, that some macros do not take arguments, these are not important for our purposes: \cr, \dots, \ldots, \R, \tab.

Macro Text type Roxy tag MD
\code R -?
\dontshow R -
\donttest R -
\testonly R -
\dontrun verbatim -
\env verbatim -
\kbd verbatim -
\option verbatim -
\out verbatim -
\preformatted verbatim -
\samp verbatim -
\special verbatim -
\url verbatim -
\verb verbatim -
\deqn verbatim -
\eqn verbatim -
\newcommand verbatim -?
\renewcommand verbatim -?

## New functions, not in the original docs

Macro Text type Roxy tag MD
\figure special? -

## User defined macros

There are in share/Rd/macros/system.Rd and can be used in Rd files in general. Here is the current R-devel version: https://github.com/wch/r-source/blob/trunk/share/Rd/macros/system.Rd

Macro MD
\CRANpkg -
\PR -
\sspace -
\packageTitle -
\packageDescription -
\packageAuthor -
\packageMaintainer -
\packageDESCRIPTION -
\packageIndices -
\doi -

• \concept usually don't need MD markup, but maybe sometimes
it does, so I would allow it for now.
• I guess we don't need MD within \acronym.
• Is it dangerous parse MD within \command?
• \code is tricky, because it is usually R code, so we don't want to parse it, but \link and \var are still interpreted. Anyway, within code, people can write Rd syntax.

added 8 commits Mar 13, 2016
 Parse markdown within fields 
 7e47f53 
 If \preformatted, then keep leading whitespace 
 6c59069 
Leading whitespace is usually removed by commonmark,
unless we are in , or in a list, etc.
 Support Markdown style links to functions, etc. 
 73e5b7b 
 Fix NEWS.md file 
 0d1ea0b 
 Test case for Markdown & Rd interplay 
 6234568 
This breaks currently, but it should pass once we ignore
\preformatted when parsing markdown.
 @Keywords is not parsed as markdown 
 a52c930 
 Escape fragile Rd tags before markdown processing 
 f2e131b 
 Support @nomd 
 378dab9 

### gaborcsardi commented Mar 14, 2016

 This is pretty close now I think. I implemented the escaping of the "fragile" Rd tags: \code, \preformatted, etc. Within these there is no markdown parsing. I also implemented the @noMd tag. It works at the block level, just like noRd, and uses a special marker in the tags environment, called markdown-support. Somewhat abusing the tags, but also seemed like a natural place to put it. I can certainly put it in another environment if you don't like this solution. For the fragile tag parsing, I modified the rdComplete C code, because I needed almost the same code, to find the end of the arguments after an Rd tag. Apart from writing the vignette, what would be the best form for documentation? Is a brief manual page, pointing to the vignette, enough? Just trying to avoid duplication, if possible.... Please test if you have time and you are brave. :) Just kidding, I think it will work mostly out of the box, and if not, you can always use @noMd. Btw. do we (temporarily?) want to have an option to turn on markdown parsing and leave it off by default? Just to be on the safe side.

added 3 commits Mar 14, 2016
 Fix NEWS file, move new entry up 
 80eef5b 
 @method is not parsed as markdown 
 dfe5f5e 
 Fix for inline_html -> html_inline commonmark tag name change 
 30cbb82 

### gaborcsardi commented Mar 30, 2016

 I fixed some bugs, and found a new one. More precisely, is actually not really a bug, but it is still annoying. This docs: https://github.com/hadley/ggplot2/blob/9b5e097e0aafdca19b5e8f9d2153177eeba809fb/R/data.R#L117 are interpreted as an ordered list: ❯ cat(markdown_xml("qqqq\n 1. eeee fff ggg\nrrrr xvxcvxcvxcv")) qqqq eeee fff ggg rrrr xvxcvxcvxcv  But of course sometimes these are really meant to be ordered lists. So I am not sure what to do about this, if anything. (Other than making sure that there is not error at least, like for ggplot2.)

 Fix list items without contents 
 68eaaa3 

 This is the diff between original roxygen and roxygen with this PR, for ggplot2: https://gist.github.com/gaborcsardi/baed9ffd55c072425bb926595b144324 Leading whitespace is deleted (you don't see this is in the diff, because in the diff I ignored the whitespace). $ _ and % are not escaped A whole \section{}{} is removed four leading spaces are picked up as \preformatted{} when they ideally shouldn't the spurious ordered list, mentioned in #431 (comment) The first three I can fix, those are bugs. The last two are supposedly features in commonmark, which are better fixed in ggplot2 I think. I will fix these and then run it on more packages to see other possible bugs. ### gaborcsardi commented Apr 4, 2016  So, if we don't mind deviating from commonmark, we can preprocess the text, so that Ordered lists are only picked up after an empty line The same for block quotes I.e. this is an ordered list: blah blah blah 1. first 2. second but this is not: blah blah blah April 3, 2016.  ### gaborcsardi commented Apr 4, 2016  Btw. @hadley, thanks for the +1, please comment next time, because it seems that I don't get a notification about +1s. I am not sure if this is for the good or bad. Thanks. ### jeroen commented Apr 4, 2016  I suggest we to stick with the commonmark standard and put the responsibility of using proper markdown with the package authors. I think these edge cases are quite rare, and defining a different markdown only makes things more confusing and error prone. People will have to deal with the same issue when using markdown in vignettes or elsewhere. added 2 commits Apr 4, 2016  Don't let commonmark unescape \\%, \\_, \\$ 
 a7c9030 
We double-escape them before running the markdown parser.
 Markdown: don't escape leading ws, \preformatted is protected 
 dcd305a 
Ideally we should keep leading ws in general, because that's what
Roxygen currently does. But it is also tricky, because the
ws might be followed by an ordered or unordered list, or a >
for a block quote, etc.

The parsed XML does not have the ws any more, so we need to
handle it before the markdown run.

### hadley commented Apr 4, 2016

 Yeah, I'd say stick with common as well.

mentioned this pull request Apr 5, 2016

### gaborcsardi commented Apr 5, 2016

 OK, I fixed all the bugs I noticed. Still needs some testing. I think we should turn it off by default, because I cannot reliably fix the removal of the leading whitespace. Sometimes leading whitespace is meaningful for markdown, e.g. * This is * A list - With an embedded - List inside  I cannot "escape" the leading whitespace in this case, because then commonmark does not parse the list properly. So I think it is better to turn it off by default for now. @hadley Is there a standard way of specifying Roxygen options for the whole package? If not, then maybe we could have one using the RoxygenNote field? In addition it would make sense to have an @Md tag as well, to turn it on for a single chunk.

### gaborcsardi commented Apr 6, 2016

 @hadley If there is no way currently to set options for the whole package, how about using the RoxygeNote DESCRIPTION field for that? We could have something like: RoxygenNote: 5.0.1, markdown = TRUE  I would do this in another PR.

### hadley commented Apr 6, 2016

 Maybe for this version we should just let people opt in with @md? And then in the next version we could think about global switches?

### gaborcsardi commented Apr 6, 2016

 @hadley I was thinking about that, too. It is kinda painful to opt in for every single chunk.

### gaborcsardi commented Apr 6, 2016

 But sure, for now, we can do that.

added 3 commits Apr 7, 2016
 Markdown opt-in instead of opt-out 
 37f1787 
So we have an @md tag now, instead of @nomd.
 Test that markdown is off by default 
 6319527 
 Remove two skipped markdown tests 
 444f5e3 
We'll add them back once we'll have a @nomd tag.

### gaborcsardi commented Apr 7, 2016

 OK, it is opt-in now. There is an @md tag instead of the @noMd`. This way I think it is pretty safe to merge, in the end I did not modify any existing test cases, and all of them pass. There is a single item on the TODO list, the vignette about markdown mentioned in #431 (comment). Maybe I can just write that in another PR, and we can start testing this. :) EDIT: I mean, start using this. :)

### gaborcsardi commented Apr 7, 2016

 @hadley Btw. when is the next Roxygen release planned? Just because I am excited to start using this. :) And also, I need to write the vignette before that..... Anyway, if you missed the previous comment because you are not watching this repo, this is now ready to be merged I think.

### jeroen commented Apr 14, 2016

 @gaborcsardi looking into supporting readthedocs for R documentation with karthik. They also support commonmark as an input format http://docs.readthedocs.org/en/latest/getting_started.html.

### gaborcsardi commented Apr 15, 2016

 @jeroenooms That would be amazing! Somewhat different from this PR, though. Depending on what exactly you want. I guess there you want some Rd -> markdown translation, whereas this PR is the opposite way. Anyway, getting something out from Rd and/or roxygen that you can put on readthedocs would be super nice.

### hadley commented Aug 29, 2016

 @gaborcsardi I'm working on roxygen2 again, aiming for a release around September 16. I'm travelling quite a bit so I'm not sure I'll be tackling any big problems for this release, but I'm definitely available to give feedback and discuss options.

### gaborcsardi commented Aug 29, 2016

 OK. I'll rebase this, and also write the vignette. Since this is opt-in, I think it is fairly safe to merge. It would be still nice to turn it on for the whole package. :)

### gaborcsardi commented Aug 29, 2016

 @hadley A technical issue. I deleted my fork in the meanwhile, so I cannot change this PR. I'll close it down and open another one.

mentioned this pull request Aug 29, 2016
mentioned this pull request Sep 22, 2016
mentioned this pull request Sep 30, 2016