-
Notifications
You must be signed in to change notification settings - Fork 233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Markdown #431
Markdown #431
Conversation
@@ -7,6 +7,10 @@ | |||
|
|||
* Fixed bug in `@noRd`, where usage would cause error (#418). | |||
|
|||
* Most fields can now be written using Markdown markup instead of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW I think it's more robust to always put new items at the top of the list
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
What do you think about the |
I think it should be |
OK, |
Here is a problem I just found. The Md emphasis markers are picked up from different lines, and this is sometimes a problem. E.g.
results
This is of course correct for Md, but it will screw up Rd.
The second is more sensible, I think. I would remove them completely before the parsing, Toughts? @hadley @jeroenooms ? |
I chatted with @jeroenooms about this. Here are three good options to make sure that Rd markup still works as expected after merging this PR.
I think I prefer number three. It is potentially (slightly) dangerous, but it would allow a seamless transition, and I think it would work just fine in almost all cases. @hadley What do you think? |
3 sounds good to me. |
This is actually quite tricky. Ideally you would want to parse the text with Rd first, so that you can be sure that you find all the Rd tags, without any errors. But of course we don't want to do that, and with the markdown markup included, the text might not even parse as Rd. So the plan is to have a very simple parser, that only cares about I am a bit afraid that writing this in R will be somewhat slow, but we'll see. I'll include the list of all Rd macros, and how we would handle them, in another comment. |
SummaryThe tables below contain all Rd macros, according to https://developer.r-project.org/parseRd.pdf and Writing R Extensions.
For some macros, I am not quite sure what to do, these are marked with a question mark, and there is a short discussion about them at the end. Sectioning macros
Markup macros within sections taking LaTeX-like text
Markup macros within sections taking R-like text, or verbatim text.Note, that some macros do not take arguments, these are not important for our purposes:
New functions, not in the original docs
User defined macrosThere are in
Comments on some macros
|
Leading whitespace is usually removed by commonmark, unless we are in ```, or in a list, etc.
This breaks currently, but it should pass once we ignore \preformatted when parsing markdown.
This is pretty close now I think.
For the fragile tag parsing, I modified the Apart from writing the vignette, what would be the best form for documentation? Is a brief manual page, pointing to the vignette, enough? Just trying to avoid duplication, if possible.... Please test if you have time and you are brave. :) Just kidding, I think it will work mostly out of the box, and if not, you can always use Btw. do we (temporarily?) want to have an option to turn on markdown parsing and leave it off by default? Just to be on the safe side. |
I fixed some bugs, and found a new one. More precisely, is actually not really a bug, but it is still annoying. This docs: ❯ cat(markdown_xml("qqqq\n 1. eeee fff ggg\nrrrr xvxcvxcvxcv"))
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE document SYSTEM "CommonMark.dtd">
<document xmlns="http://commonmark.org/xml/1.0">
<paragraph>
<text>qqqq</text>
</paragraph>
<list type="ordered" start="1" delim="period" tight="true">
<item>
<paragraph>
<text>eeee fff ggg</text>
<softbreak />
<text>rrrr xvxcvxcvxcv</text>
</paragraph>
</item>
</list>
</document> But of course sometimes these are really meant to be ordered lists. So I am not sure what to do about this, if anything. (Other than making sure that there is not error at least, like for ggplot2.) |
This is the diff between original roxygen and roxygen with this PR, for ggplot2: https://gist.github.com/gaborcsardi/baed9ffd55c072425bb926595b144324
The first three I can fix, those are bugs. The last two are supposedly features in commonmark, which are better fixed in ggplot2 I think. I will fix these and then run it on more packages to see other possible bugs. |
So, if we don't mind deviating from commonmark, we can preprocess the text, so that
I.e. this is an ordered list: blah blah blah
1. first
2. second but this is not: blah blah blah April 3,
2016. |
Btw. @hadley, thanks for the +1, please comment next time, because it seems that I don't get a notification about +1s. I am not sure if this is for the good or bad. Thanks. |
I suggest we to stick with the commonmark standard and put the responsibility of using proper markdown with the package authors. I think these edge cases are quite rare, and defining a different markdown only makes things more confusing and error prone. People will have to deal with the same issue when using markdown in vignettes or elsewhere. |
We double-escape them before running the markdown parser.
Ideally we should keep leading ws in general, because that's what Roxygen currently does. But it is also tricky, because the ws might be followed by an ordered or unordered list, or a > for a block quote, etc. The parsed XML does not have the ws any more, so we need to handle it before the markdown run.
Yeah, I'd say stick with common as well. |
OK, I fixed all the bugs I noticed. Still needs some testing. I think we should turn it off by default, because I cannot reliably fix the removal of the leading whitespace. Sometimes leading whitespace is meaningful for markdown, e.g.
I cannot "escape" the leading whitespace in this case, because then commonmark does not parse the list properly. So I think it is better to turn it off by default for now. @hadley Is there a standard way of specifying Roxygen options for the whole package? If not, then maybe we could have one using the In addition it would make sense to have an |
@hadley If there is no way currently to set options for the whole package, how about using the
I would do this in another PR. |
Maybe for this version we should just let people opt in with |
@hadley I was thinking about that, too. It is kinda painful to opt in for every single chunk. |
But sure, for now, we can do that. |
We'll add them back once we'll have a @nomd tag.
OK, it is opt-in now. There is an This way I think it is pretty safe to merge, in the end I did not modify any existing test cases, and all of them pass. There is a single item on the TODO list, the vignette about markdown mentioned in #431 (comment). Maybe I can just write that in another PR, and we can start testing this. :) EDIT: I mean, start using this. :) |
@hadley Btw. when is the next Roxygen release planned? Just because I am excited to start using this. :) And also, I need to write the vignette before that..... Anyway, if you missed the previous comment because you are not watching this repo, this is now ready to be merged I think. |
@gaborcsardi looking into supporting readthedocs for R documentation with karthik. They also support commonmark as an input format http://docs.readthedocs.org/en/latest/getting_started.html. |
@jeroenooms That would be amazing! Somewhat different from this PR, though. Depending on what exactly you want. I guess there you want some Rd -> markdown translation, whereas this PR is the opposite way. Anyway, getting something out from Rd and/or roxygen that you can put on readthedocs would be super nice. |
@gaborcsardi I'm working on roxygen2 again, aiming for a release around September 16. I'm travelling quite a bit so I'm not sure I'll be tackling any big problems for this release, but I'm definitely available to give feedback and discuss options. |
OK. I'll rebase this, and also write the vignette. Since this is opt-in, I think it is fairly safe to merge. It would be still nice to turn it on for the whole package. :) |
@hadley A technical issue. I deleted my fork in the meanwhile, so I cannot change this PR. I'll close it down and open another one. |
From #365. This is not ready yet for merging, but it is close. :)
This PR adds markdown parsing for fields: title, description, details, references, concept, note, seealso, keywords, return, author, section, format, source, param, slot, field, method.
It should leave Rd notation unaffected. One glitch I noticed is that the commonmark parser removes leading whitespace. This is mostly fine with Rd, whitespace is not significant, except within
\preformatted{}
, so I have a workaround for that. (FIXME: any other case where leading whitespace matters?)It is possible that some Rd notation is picked up as markdown, but all the examples I could come up with were quite artificial. Nevertheless we should have a
@nomarkdown
tag that forbids Markdown parsing.For the supported notation, see the test cases and the readme at maxygen: https://github.com/gaborcsardi/maxygen
TODO:
@nomarkdown
tag. Or@noMd
is better.Feedback welcome!