Skip to content
This repository has been archived by the owner on May 19, 2021. It is now read-only.

Workflow for publications using Rmarkdown with users that won't get passed Word/Google docs #42

Open
lauracion opened this issue Apr 25, 2018 · 35 comments

Comments

@lauracion
Copy link

This is a specific issue related to #27 (and somewhat to #22). How to to successfully and painlessly collaborate in a publication workflow using Rmarkdown with researchers (or others) that are not interested at all in getting passed Google docs.

I have no idea how to tackle this. I know this is something I would use a lot.

There is some discussion and ideas in this thread: https://twitter.com/CMastication/status/942151771627155457

And a deeper discussion here: https://community.rstudio.com/t/publishing-rmarkdown-to-google-docs/832 where @jennybc says "The problem we ran into is that compiling .Rmd to Google Doc is not that hard. But for the whole workflow to truly be useful, you then want to go the other direction. And that is really hard." Wondering if this got any easier since last time this was discussed.

FWIW, my 2 cents for the 2018 unconf 🙂

@jzelner
Copy link

jzelner commented Apr 25, 2018

This is maybe a bit off-topic here, but has anyone come up with a way to make the gdoc/work -> RMD conversion work s/t the R sections are retained? This would go for any backwards conversion from a 'final' document format a non-RMD user might use (TeX, Word, etc). For me, this would really close the collaboration loop for folks who want to edit the text but don't care at all about the nuts and bolts of a literate programming/data science document.

@cboettig
Copy link
Member

@jzelner The only practical approach I've found for letting collaborators edit the text in Word is to just paste the raw Rmd into a word document. Like you say, I've found these collaborators are focused only on the text anyway, and are going to skip over any equations or code, so it doesn't matter a jot that equations aren't rendered and code isn't pretty or highlighted.

Of course you get a few errors pasting the Word edits back into Rmd (mostly rich-text characters corrupting some ascii code characters) but these are easy to spot and fix with a diff. I know this is low-tech but relatively robust, I picked this up from a mathematician who works in TeX but also frequently collaborates with Word-only folks.

@mmulvahill
Copy link

mmulvahill commented Apr 26, 2018

The best approach I've found* is using Rmd->Word & markdown on my end. I send the Word doc to the collaborator(s) who edit(s) w/ track changes. When I get the edits back, I accept all changes and use pandoc to convert Word to markdown. I can then diff the two markdown files, resolve changes into my Rmd, build report, commit, and send off a new version as necessary. Pandoc does a pretty good job of Word->markdown. (I haven't thought about it until now, but could the word->md and diff/resolve be built into a pkg and possibly integrate w/ RStudio in a slick way?)

In my consulting biostats work, I'm never on a project with someone who knows our R/markup/Git toolkit and getting MDs to use anything but Word is a massive endeavor in cultural change. The other option a colleague tried was requiring collaborators to use the GitHub/GitLab/BitBucket web editor and save via commits. This project, though, had an MD/PhD primary investigator who was very motivated to conduct the study with reproducible tools and had the clout to enforce it.

(this comment may be more #27, but sort of both)
*Edited - a colleague and I started doing this in our RA consulting lab.

@lauracion
Copy link
Author

All are great suggestions. I am wondering if we could settle on a very good way to do this. Including both word and google docs as options for interacting with folks not interested in getting involved further.

could the word->md and diff/resolve be built into a pkg and possibly integrate w/ RStudio in a slick way?

For example, this 👆, particularly the integration in RStudio part 😉
Having google docs as an starting option would be a nice have too.

@batpigandme
Copy link

I vaguely remember being told this was all old hat when I shared it on Twitter awhile back, but possibly relevant nonetheless:
How to convert a Google Doc to RMarkdown and publish on Github pages

@Pakillo
Copy link
Member

Pakillo commented Apr 26, 2018

Great thread! A common bottleneck for many, I think.

For some time I thought that using Authorea or Overleaf, which can be easily synced to GitHub (e.g. see here & here) could be a solution: just pull changes from github repo. But then you would have to persuade coauthors to switch platform, which is very hard (even the transition from Word to GDocs).

By now I incorporate changes manually into the Rmd, or use diff/merge of different versions. Now that we are at it, may I ask for a good tool to make diff & merge for manuscripts? I have been using SourceDiffMerge, but it would be great to be able to accept/reject changes word by word, rather than by line or paragraph. Does that tool exist?

Many thanks, and looking forward to seeing what you develop.

P.S. I think the Word -> markdown conversion can be done through rmarkdown::pandoc_convert

@lauracion lauracion changed the title Workflow for publications using Rmarkdown with users that won't get passed Google docs Workflow for publications using Rmarkdown with users that won't get passed Word/Google docs Apr 26, 2018
@jzelner
Copy link

jzelner commented Apr 26, 2018 via email

@jzelner
Copy link

jzelner commented Apr 26, 2018

Apologies for the weird HTML links inserted via gmail!

@jennybc
Copy link
Member

jennybc commented Apr 26, 2018

To play the crotchety veteran, this too (or at least parts of it) has been an Unconf project before 🙂

https://github.com/ropenscilabs/gdoc

@mmulvahill
Copy link

I suspected so -- at least that it had at least been thoroughly considered by the R/rOpenSci/Rstudio community. Curious if there's anything specific/narrow here for a smaller post-Unconf project or side-conversation? I saw somewhere that file diff in the Rstudio text editor has been considered, but that's outside our realm.

I've personally tried to come up with a system that allow me to avoid using MS Office/Gdocs as much as possible, and humbly prefer that to digging into VBA/Office internals. A distant colleague has been developing StatTag -- a Word plugin that reverses our typical workflow (R/SAS/SPSS code goes in the the document and StatTag handles executing/printing it) -- which will work for some folks.

@jennybc
Copy link
Member

jennybc commented Apr 26, 2018

at least that it had at least been thoroughly considered by the R/rOpenSci/Rstudio community

And the goal of being able to un-knit is ... on the radar? Discussed? I want to convey that it's absolutely A Recognized Thing but also that no one should expect anything on a specific date or even at all. It's a very big task.

@noamross
Copy link

I started writing a response summarizing my attempts to solve this problem and realized it was becoming essay-length, so I put it here: https://github.com/noamross/rmd-rant . I do think there's some space to carve out an unconf-scale project on this issue, possibly along the lines of strategy No. 2 that I describe there.

@njtierney
Copy link

njtierney commented Apr 30, 2018

On @jennybc 's note of gdoc - @MilesMcBain has a nifty R package for rendering google docs into markdown that could be handy? markdrive

@jtr13
Copy link

jtr13 commented May 1, 2018

I am super interested in this topic, particularly in terms of advising students on how best to collaborate on projects. While the students are working individually on homework assignments, they're all in rmarkdown-lalaland. Then final projects come along and it's crash and burn for that model. They are all coders but yet they will not collaborate on text w/Git/Github; feedback on this approach from one group: "It's insane to deal with merge conflicts involving text. No way." This group switched to Google Docs and then copied and pasted back to Rmd. Quite ironic, since Rmd was supposed to eliminate the copy and paste approach: now we've just flipped the model on its head and copy and paste text instead of code and output (an improvement I suppose but still).

@jtr13
Copy link

jtr13 commented May 1, 2018

One more thought: there's one small piece of the puzzle that I doubt would be hard to implement and would make a big difference. That is, having an echo=FALSE option for text, to provide the same flexibility for text in progress as we have with code in progress. I can think of so many uses: the ability for example to create assignments with and without solutions. (I know there are workarounds using comments in code chunks but that's not the same.)

@jennybc
Copy link
Member

jennybc commented May 1, 2018

Writing prose, collaboratively, with plain text + version control tends to be WAY more difficult than working on code collaboratively.

https://twitter.com/emhrt_/status/740777537547173889

Writing a multi-author paper on github is a far bigger test of git skills than any coding I've done.

@jennybc
Copy link
Member

jennybc commented May 1, 2018

This group switched to Google Docs and then copied and pasted back to Rmd. Quite ironic, since Rmd was supposed to eliminate the copy and paste approach

Maybe it's fine for them to do collaborative editing of prose as a Google Doc? You could even script the regular task of pulling that file down into the repo (possibly converting to a specific format?) and committing, so that the repo remains the definitive source of the entire project, even though a prose document has been vendored out to Google Docs, due to a more humane user interface for collaborative editing.

@jzelner
Copy link

jzelner commented May 1, 2018 via email

@jennybc
Copy link
Member

jennybc commented May 1, 2018

I do wonder if there's a solution to many of these problems that doesn't
reinvent too many wheels and is more of a workflow solution

Yes, I have definitely counselled students working on group projects to solve this by breaking their report or website up into smaller pieces or pages with an index. This greatly reduces the problem of multiple people touching the same things at the same time.

@wlandau
Copy link

wlandau commented May 1, 2018

I do wonder if there's a solution to many of these problems that doesn't
reinvent too many wheels and is more of a workflow solution. For example, I
try to construct my RMD, at least for manuscripts, so that there is minimal
computation going on in the document. Instead, I try to 'compile' all of my
results to some kind of external data file. This way, the ultimate target
of all the analysis code is this omnibus data file, rather than the PDF. When
I'm feeling extra-organized, it's JSON or YAML. When I'm feeling lazy (i.e.
most of the time), it's a bunch of R objects serialized to Rds files and
CSVs.

Sorry if I have been repeating myself too much, but drake was designed to accommodate this use case. The user specifies a declarative workflow with intermediate data objects and files, R Markdown reports included. Here, a dynamic report is just a target in the workflow rather than the overarching workflow manager (minimal computational burden on knitr), and there are ways to tell drake about the dependencies of each report. Example: https://ropensci.github.io/drake/articles/drake.html. As @ldecicco-USGS pointed out in #30, things get tougher once we try to integrate with Google Drive / Google Docs.

@jtr13
Copy link

jtr13 commented May 1, 2018

@jennybc I don't have a problem with Google Docs, at least until the tools get better. I haven't done an ethnography of how students work, but I hypothesize that they need different tools than seasoned researchers who can divvy up the work more readily (you do the abstract, I'll write up the methods...) Many are out of their comfort zone writing. They struggle together with how to write each piece of a report and then put the pieces together. All that to say that some of them at least are truly working together on each sentence and derive a great deal of benefit from the Google Docs platform.

@jzelner
Copy link

jzelner commented May 1, 2018 via email

@wlandau
Copy link

wlandau commented May 2, 2018

Sure! Basically, that data frame is like a Makefile, your recipes are R code chunks, and you declare dependencies just by using them in the recipes. I am happy to receive questions as new issues, especially since they help build up the FAQ.

@jzelner
Copy link

jzelner commented May 2, 2018 via email

@lauracion
Copy link
Author

Wow, such a great discussion is happening here. Thank you, everyone! I lost track of this until @stefaniebutland made me noticed the discussion was far from over. There is a lot to digest here for me. I will try to come back to it in the upcoming days before the unconf and see if I can summarize all the directions proposed for this issue thus far so we can choose which direction may be worth tackling in Seattle.

@mpadge
Copy link
Member

mpadge commented May 10, 2018

There's a pre-emptive unconf-y-thing happening as I write that overlaps somewhat with this issue: the eLife Innovation Sprint aiming

to collaboratively prototype new innovations that bring cutting-edge technology to open research communication.

The outputs of that will be well-worth keeping an eye on, and feeding back into this issue, and potentially the broader unconf in general.

@lauracion
Copy link
Author

Ok, folks, initial step before getting to @noamross summary request. I summarize the items I see could conduct to unconf projects*:

  1. Develop further @noamross strategy 2

  2. Add echo = FALSE for text in progress, a nice-have feature suggested by @jtr13

  3. @jzelner's package approach for users who are willing to collaborate using RStudio:

  1. Compile to a zipfile or other archive, with a) an RDS file containing
    all of the R objects needed in the course of generating the final
    PDF/HTML/MD document, b) a directory of binary or text files (e.g. figures,
    csv files), c) a requirements.txt style manifest listing both what is in
    the archive and any R dependencies.
  1. At document-generation time, the archive is mounted and accessed without
    expanding it into the filesystem, and executed like a normal RMD.

I can open new issues for 1) and 2) (but Noam and Joyce may be better authors for those issues than I - please let me know your preference).

I can't open a new issue for 3) because I don't follow the idea completely 😬. Could you please do the honors, Jon?

* I am omitting workflow descriptions posted not because they lack interest, but 'cause I don't see a project there - if you see a project from something I am omitting, please open an issue with it 🙏

@lauracion
Copy link
Author

lauracion commented May 18, 2018

Summary: All projects I identified from the discussion for this issue are now summarized in #73, #74, and #75.

@fBedecarrats
Copy link

Hi, that’s also my #1 bottleneck in everyday work with R. Where can I know if you made progress on this during this unconference? I wonder if a temporarily satisfying workflow is not to use Open Science Framework. R users can Git directly into it (thanks to the osfr package developed I think at an unconference, or simply having their OSF project mirroring a GitHub repo) and non-R users can view and edit it directly on their browser thanks to the online OSF viewer (which looks somehow like any browser based text editor) and accompany any change with a comment on the commit their doing on the browser. OSF has also the great advantage to support Zotero/Mendeley connection + synchronization with any cloud storage solution.

@mattpollock
Copy link

mattpollock commented Oct 11, 2018

https://github.com/ColinFay/backyard looks like a really interesting candidate solution here (not a self-interested pitch - I just stumbled upon this repo yesterday). It needs a lot more features still (author has big WIP waring in the readme) but if "open" cold be tied to taking out a new branch (maybe username/date or something) and "save" commits the branch and opens a PR to master then we could be getting close.

This will not appease those who will do nothing outside of Word, but it may be a workable solution for more supportive management types who just want something that, given no R/md experience, they can work with.

@stefaniebutland
Copy link
Member

@fBedecarrats

Where can I know if you made progress on this during this unconference?

Here's the WIP repo that was done at the unconf https://github.com/ropenscilabs/trackmd

@trashbirdecology
Copy link

d->markdown. (I haven't thought about it until now, but could the word->md and diff/resolve be built into a pkg and possibly integrate w/ RStudio in a slick way?)

@mmulvahill how are you resolving the differences between new .md and original .rmd?

@mmulvahill
Copy link

mmulvahill commented May 8, 2019

Hi @trashbirdecology --

I use a text editor to highlight the differences between the original and revised files, and just copy the changes from one to the other. If you're vim inclined, then vimdiff is great vimdiff new.md original.md, but VS Code, SublimeText, TextWrangler, Notepad++, etc should do this comparably.

Sometimes I use the keep_md YAML option so that I have an a md version of the original for comparison without the Rmd extras, but this is not that helpful since the original Rmd is the one needing updating

@noamross
Copy link

My latest attempt at this may be of interest: https://noamross.github.io/redoc/

@lauracion
Copy link
Author

Can't wait to try it. Thank you, Noam!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests