Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rmarkdown chunk options disappear after translation #1

Open
joelnitta opened this issue May 10, 2022 · 13 comments
Open

Rmarkdown chunk options disappear after translation #1

joelnitta opened this issue May 10, 2022 · 13 comments

Comments

@joelnitta
Copy link
Owner

Currently, translated chunks looks like this

```r
<code here>
```

with all of the chunk options after r missing.

@zkamvar
Copy link

zkamvar commented May 10, 2022

One cheat is if we use the yaml syntax for chunk options:

```{r}
#| name: chunky
#| fig.alt: a chunk image
plot(thing)
```

@joelnitta
Copy link
Owner Author

Thanks!

My goal is to be able to translate any valid RMarkdown, but this is good to know as a work-around.

@zkamvar
Copy link

zkamvar commented May 12, 2022

I get a feeling that another workaround might be by writing an event that catches the rmarkdown code fence and writing that into an extracting comment: https://mdpo.readthedocs.io/en/master/commands.html#extracting-comments-itself, but I'm not sure how that will work on the po -> markdown side.

@zkamvar
Copy link

zkamvar commented May 17, 2022

FWIW: we have supporting the new knitr options on the roadmap for tinkr (which is used by pegboard): ropensci/tinkr#55, which we should be able to implement easily since knitr::partition_chunk() is now exported.

Also I was thinking about how to work with the chunk options and one thing you could do is take the translations directly from the rendered markdown files in site/built/ since they won't have chunk options, they will have normal code blocks. It makes things like translation of figure alt text and captions more readily available for generated figures.

@joelnitta
Copy link
Owner Author

joelnitta commented May 17, 2022

@zkamvar Thanks for the idea! I must admit I don't know enough about how all of the Workbench packages fit together, but is pegboard parsing of the Rmd required for building the rendered lesson? In other words, are the new knitr options not currently supported by Workbench?

Based on your earlier comment, I was actually just working on a fix that involves parsing the Rmd and converting all chunk options to yaml style using the parsermd package. In such an approach, the creator of the original lesson would not have to worry about which style of chunk options to use, but the translator working with the PO file would see them as if they were in yaml format. I think yaml format is easier for translating anyways since there is one line per setting, and not all settings need to be translated (indeed, I can think of only a few e.g., figure captions, that would need translating). This only works if {sandpaper} can render the lesson using yaml-formatted chunk options though.

I will look into your other idea using site/built/ too...

@joelnitta
Copy link
Owner Author

(update: I just tried yaml-style chunk options with {sandpaper} and it seems to work fine)

@zkamvar
Copy link

zkamvar commented May 19, 2022

is pegboard parsing of the Rmd required for building the rendered lesson?

No. {pegboard} (parsing via {tinkr}) is not required to build the lesson. {knitr} and pandoc build the lesson. {pegboard}'s role is to provide a pre-flight validation mechanism even if pandoc does not exist:

flowchart LR
ep[/episodes/*Rmd/]
md[/site/built/*md/]
html[/site/docs/*html/]
stderr(["output to user"])

ep --> sandpaper
sandpaper --> knitr --> md --> pandoc --> varnish --> html
sandpaper --> pegboard --> XML --> stderr
Loading

@zkamvar
Copy link

zkamvar commented May 19, 2022

Based on your earlier comment, I was actually just working on a fix that involves parsing the Rmd and converting all chunk options to yaml style using the parsermd package.

One reason why I suggested {tinkr} is partially because I'm on the cusp of adding that feature (I really just need to carve out a couple hours of some non-beta release time to do this), so ideally, you could use tinkr to go from RMarkdown with long chunk options to RMarkdown with yaml chunk options.

In the future, {tinkr} could even be used as a backend for an R version of mdpo.

@joelnitta
Copy link
Owner Author

RE: purpose of {pegboard}. Thanks for the clarification. But sandpaper::build_lesson() calls sandpaper::validate_lesson(), which uses {pegboard}. And if lesson validation doesn't pass, the site isn't built (right?). So in that sense, could pegboard validation be considered a pre-requisite to build the lesson? Just trying to clarify my mental model.

Having another look at {pegboard} and {tinkr}, I agree those packages seem well suited for such pre-processing steps. I will hold off on my attempt using {parsermd} for now. No sense in doubling our work.

I agree weaning ourselves off mdpo and implementing a pure R solution scoped to Rmarkdown would be great. But as mentioned already, it won't be trivial... so for now I think mdpo as backend should work. The main thing is that the units of the translation strings (msgid / msgtr) correspond to markdown blocks (paragraphs), not single lines. Such an approach is better suited to prose text and should be easier for the translators. In either case, the dovetail UI should be the same (more or less).

@zkamvar
Copy link

zkamvar commented May 19, 2022

RE: purpose of {pegboard}. Thanks for the clarification. But sandpaper::build_lesson() calls sandpaper::validate_lesson(), which uses {pegboard}. And if lesson validation doesn't pass, the site isn't built (right?). So in that sense, could pegboard validation be considered a pre-requisite to build the lesson? Just trying to clarify my mental model.

A lesson with invalid links, image, div, or heading elements does not produce an error as long as the HTML can be rendered. The reason for this is a programmatic implementation of the growth mindset---understanding that as people work on the lessons, there should be as few barriers as possible to get from ideas to markdown to HTML. All of these errors are often down to simple typos and it's better to have something slightly imperfect than to have nothing at all.

As an example, my tests with r-novice-gapminder currently show that there are 10 elements that need addressing, but the website is still built regardless:

screencapture of Annotations from a GitHub Actions workflow run. There are 10 warnings, the majority of which indicate that HTTP needs to be HTTPS and one indicates uninformative link text of 'here'

(note that the above links will likely become invalid soon as I continue to iterate, so this is why I am including a screenshot.)

I'm sorry for the confusion on that. I'll try to make the docs a bit more clear in the documentation (and give better examples in sandpaper!). For reference, a more detailed documentation of the tests is here (though slightly obscured by the github-formatted output since it was generated on CI): https://carpentries.github.io/pegboard/articles/validation.html

The main thing is that the units of the translation strings (msgid / msgtr) correspond to markdown blocks (paragraphs), not single lines. Such an approach is better suited to prose text and should be easier for the translators. In either case, the dovetail UI should be the same (more or less).

💯! It is my hope that we will be able to come up with something that is stable regardless of the engine under the hood (which is exactly how I designed sandpaper to work).

@joelnitta
Copy link
Owner Author

Thanks for the clarification! All of this sounds great to me, design-wise.

Circling back to your other idea about using the markdown site/built/ for translation, do you see any other benefits besides possibly solving this issue? The downside it seems to me is that it complicates the workflow for dovetail by adding a build_lesson() step before creating the PO or using the PO for translation.

@zkamvar
Copy link

zkamvar commented May 20, 2022

Circling back to your other idea about using the markdown site/built/ for translation, do you see any other benefits besides possibly solving this issue? The downside it seems to me is that it complicates the workflow for dovetail by adding a build_lesson() step before creating the PO or using the PO for translation.

That's a good point! The only benefit I can see is that the generated figure captions are easier to address, but you are correct that it would end up being a bit of a headache for the translators to work on a derivative document. (FWIW, you wouldn't necessarily have to run build_lesson() before running since the intermediate markdown files are stored in the md-outputs branch: https://carpentries.github.io/sandpaper/articles/deployment.html#i-deployment-github).

@joelnitta
Copy link
Owner Author

I realized another downside to using rendered md for translation is that if the translators are looking at the original Rmd they might be confused when HTML starts popping up. So I will avoid this approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants