Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

developing rmdrive #5

Closed
ClaudioZandonella opened this issue Jan 11, 2021 · 13 comments
Closed

developing rmdrive #5

ClaudioZandonella opened this issue Jan 11, 2021 · 13 comments

Comments

@ClaudioZandonella
Copy link
Owner

ClaudioZandonella commented Jan 11, 2021

Hi @ekothe!

@filippogambarota and I stumbled across your package. We found it very interesting, and we believe that it would be a handy tool in enhancing the use of R markdown in research. Many PhD students and young researchers already recognise the utility of using literate programming to write reproducible documents as well as research papers. However, the reviewing process with colleagues often becomes a bottleneck as not everyone knows R and, even using shared repositories and git, the workflow is still far from ideal. Your package, instead, allows a smooth workflow that integrates the powerful and well-known google document interface to carry a review.

We would like to collaborate on your package to improve its functions and to publish it on CRAN. Would you be interested in?

We started from the forked repository by Januz and we have already added a few improvements:

  • add the possibility to remove code chuck from the uploaded document. This will allow to not scare colleagues not familiar with R code and instead focusing only on the actual text. Chunks are restored when the document is download.
  • add the possibility to upload also a pdf report including the actual chunks output (e.g., figures and tables). Colleagues can use this to evaluate figures and tables and eventually comment on the pdf directly on google drive to require changes.
  • upload the final document in a pdf/HTML for the final check of the desired paper
  • faster google API call to identify dribble objects

We are still working on it on the develop branch (link our repo). Most of the documentation is missing and we are keen to add a few more features as well (e.g., support for .Rnw files). We plan to complete the last few things and then prepare the submission to CRAN. Would you be interested in?

Or alternatively could you please add a License to your repository in order to us to understand if we can build on from it.

@ekothe
Copy link
Collaborator

ekothe commented Feb 23, 2021

Hi @ClaudioZandonella, I saw you also sent this as an email. As you no doubt saw when you emailed, I was on maternity leave last year. I’ve just returned to work and have seen your work on this. This looks like a fantastic package, I’d be very happy to be involved.

@januz
Copy link
Collaborator

januz commented Mar 1, 2021

@ClaudioZandonella I'd be interested in developing the package collaboratively too.

@ClaudioZandonella
Copy link
Owner Author

Hi @ekothe and @januz,
I'm really happy about the renewed interest in this package. I think it is really useful and it offers a great solution to collaborative workflow when writing documents in R. I have already used it a couple of times with some colleagues, and it is works really smooth. @filippogambarota and I worked on it recently to add new features. We created a separate repository starting from the forked version and renamed the package "reviewdown". Look at the develop branch for the last changes (https://github.com/ClaudioZandonella/reviewdown/tree/develop).

Main changes are:

  • add support for both .rmd and .rnw documents. Now you can simply indicate the full path to the file including the extension.
  • add the possibility to remove code chuck and header info from the uploaded document, placeholders of type [[chunk-<name>]] are showed instead . This will allow to not scare colleagues not familiar with R, YAML or LaTeX code and instead let them to focus only on the actual text. Code is restored when the document is download. Moreover a brief set of instruction are placed on top of the document. An example https://docs.google.com/document/d/1NAJxaBe5IAuVEBZAFQpWZyCIaPYKSUNoqeQ_HRzw0u8/edit?usp=sharing.
  • revise functions from namer R-package to parse chunck names (https://github.com/ClaudioZandonella/reviewdown/blob/develop/R/utils_namer.R). We added regex support for .rnw chuncks as well.
  • create a hidden folder .reviewdown where chunck info and header info are saved to be restored later.
  • add the possibility to upload the document output as well. Preferably a pdf file, if HTML this can be converted into pdf using pagedown nd chrome. Pdf output is really useful as helps the inexperted reader to easily follow the text where it could be more messy (use of inline code or specific LaTeX/html jargon). Moreover, the pdf allows colleagues to use google comments to suggest/require changes on general text formatting, figures, tables etc.
  • add faster google API call to identify dribble objects (see all code in https://github.com/ClaudioZandonella/reviewdown/blob/develop/R/utils_dribble.R)
  • add unit-test for general functions as well as API functions. Unit test with API functions requires the management of tokens and access credentials we partially solved, at least locally and using vcr R-package in Github Actions (https://books.ropensci.org/http-testing/vcr.html)

Our idea of the package is to obtain a tool that allows a group of reserchers to collaborate on the writing of .Rmd or .Rnw files in google drive. Thus, we assume that at least one member of the group is more experienced in coding. We assume she/he uses git to track changes, manages th general workflow, and is able to solve possible issues created by other colleagues (e.g., error in the .rmd or .rnw syntax). Thus, we think to limit package functionalities to minimum, avoiding overlapping with other tools (e.g., keeping track of changes should be done using git).

At the moment the development is in a middle stage where almost all internal functions are ready and now we are revising the main functions to manage properly all objects and calls between functions, adding remaining unit-test, and revising the workflow to make everything run smooth. So the actual version is not stable but a working in progress. As we changed the structures of the objects returned by the internal functions, we still have to update main functions according to them.

@filippogambarota and I would be very happy to collaborate together with all you in the development of the package. However, if is it ok, we would prefer first to reach a stable version in the actual development process. After it would be perfect to revise everithing together proposing changes and further improvements. This is to avoid doing something like "Penelope's Web" tale where we add code and remove code continuously without progressing. So we would prefer to reach first a new stable point from where to decide together the new direction.

Considering the package name, we think that reviewdown is a really good choice since the package is not limited to .rmd files but supports also .rnw files, and it is "in line" with all the other packages used in R to write documents (e.g., rmarkdown, pagedown, bookdown). We would like to know your opinion about.

Moreover, we plan to send a contribution to the useR conference to present the package (deadline 15th of March, https://user2021.r-project.org/participation/call-for-abstracts/). Would you be interested in?
Subsequentely, we plan to submit the package to rOpenSci, a peer review process for r-packages (https://ropensci.org/software-review/, see their guide https://devguide.ropensci.org/) and of course publish the package on CRAN (we are already following all recommended practices and testing package build on all three OS, see the badges in the read me file of the develop branch).

Finally I would like to know about the contribution to the package of @Lingtax and @benmarwick as there are some commits authored by them.

We plan to reach a stable version in two-three weeks and then we will be ready to revise everything togheter. Please let me know if this sounds good to you!

@januz
Copy link
Collaborator

januz commented Mar 2, 2021

@ClaudioZandonella Thanks for your response and the thorough overview of what you have planned! This all sounds great to me. Some of the features you mention (e.g., having a hidden folder for metadata) were also among the things I wanted to implement. And others like hiding code chunks and also uploading the output file seem like good ideas to make it easier to collaborate with people who will never open the file in an R editor. Generally, I am very happy that you have taken on the project because I always felt like the concept deserves a package that lives on CRAN and has more extensive documentation using pkgdown etc.

I am good with giving you the time to develop and finish all the features you want to implement and then reviewing the code and talk about potential changes/additions and potentially dividiing up future work. I should probably mention at this point already though that I recently had a user of my fork write me and report issues with using the package on Windows. I assume that the problems have to do with how I wrote the sanitize_gfile() function, especially the replacement of linebreaks (I assume that on Windows one has to replace "\r\n" instead of "\n" on UNIX-like systems), but didn't have the time (and Windows computer/VM) to test yet. As far as I see, the sanitize_gfile() function still exists in your rewrite, so it might be good to check for cross-platform compatibility.

I like the new name of the package and that you aligned it with the other packages in the ...down family. I wonder though whether it best represents what the package can be used for. The package can be used for reviewing a co-authors work but I would assume that many would also use it to "co-author" (i.e., collaboratively develop different parts of the document at the same time without a clear distinction between who writes a first draft and who reviews or edits) which would speak for a more neutral term like collabdown. Furthermore, the term "review", at least in academic circles, has a strong association with reviewing manuscript submissions to scientific journals. I don't have a super strong opinion here and am fine with going with reviewdown, just wanted to mention these concerns. For me, the main objective of the package still is to extend a local R Markdown workflow by using Google Drive which @ekothe represented with her package name and would speak for something like gdrivedown / gdocdown / googledown if one wanted to include the ...down association.

I'm happy to contribute to useR and rOpenSci submissions wherever you'd like help. Thanks again for taking on this project and giving it the attention and effort it deserves!

@Lingtax
Copy link
Collaborator

Lingtax commented Mar 2, 2021

I'm just going to chime in briefly to water down my contributions. @ekothe originally wrote the underlying code in a standard script file (i.e. as interactive). I thought it was ready to be more publicly accessible, so I wrapped it in functions, put it in the package structure, and put it on github before turning it over to her again. I'm happy to defer to you all on the value of that.
Both development branches are a little outside my skillset at the moment, but would be happy to contribute to this where i can moving forward.

@ekothe
Copy link
Collaborator

ekothe commented Mar 3, 2021

@ClaudioZandonella I'm very happy for you and @filippogambarota to take the lead on this, and I would be very eager to see an rOpenSci and CRAN submission come about from this. At almost every workshop on rmarkdown I've ever attended this challenge of how to collaborate with co-authors who aren't R users has come up, so I am confident that there will be an audience for this once we have a stable release.

Submission to useR 2021 sounds like a great goal and I'd love to be involved if there is anything you need help within on order to get an abstract together by the deadline.

In terms of package name, I'd be inclined to agree with @januz about the specific meaning of the word 'review', collabdown would be a nice option if we think it would be possible to incorporate other editors in the future (beyond googledocs). If we think it is likely to only cover googledrive integration then I'd be inclined to go with a package name that indicates that in some way. However, I should note I am not at all wedded to rmdrive and have no problem having the package change names (especially since there is such a substantial rebuild).

@ClaudioZandonella
Copy link
Owner Author

ClaudioZandonella commented Mar 5, 2021

Thanks to everyone! I am very happy for the encouraging comments, I love this kind of collaboration and enthusiasms as it is the root of opensource...For this reason, I would like to include everyone in the development of the package as everyone can offer a valuable contribution...therefore @Lingtax and @benmarwick any time you have a suggestion, just jump in the discussion...

moving on... thanks @januz for pointing out possible issues on Windows (I am sure it will be not the only one😅) the solution you propose seems perfect! Unfortunately, I don't have a Windows computer but as soon as we will do some other testing with colleagues we could check all compatibility issues.

Necessary things to do now for the useR submission are deciding the reference repository and the package name.

Regarding the repository, it would make sense to use @ekothe 's original repo... it would require to change the name (is a matter of seconds) but it will undergo substantial changes...I don't know, maybe do you prefer to keep the two projects separately?... For me is the same, what we all really need is access to the repository as collaborators (I think this is enough to guarantee also the use of GitHub actions workflow)...

Regarding the package name, this is an important choice...I see the package as something used to revise a document in its late stages rather than collaborative writing of a new document. This because it would be extremely difficult (and not useful) to write code and text directly in a google doc. At least in my experience, there are two scenarios when writing a document. 1) Text is written separately and then integrated with code to add figures, tables and references (usually the case of academic papers). In this case, I suppose text is written in a normal google doc and then a member of the group takes the responsibility to write the rmd/rnw document copying the text and adding code. The final file can be revised using our package to make the last changes, ask other members for revision/suggestions and to conduct the actual peer-review process. 2) The document is directly written in rmd/rnw (is most often the case of report, supplemental materials, and tutorials). In this case, if there is more than one author, I would expect them to work on separate chapters or sections of the paper using git and an online repository to manage the workflow. Once the first draft is finished they could use our package to revise their work or ask for other revision as well (other collaborators or supervisors).

For these reasons, I think reviewdown is appropriate. I agree that in academic, "review" is linked with the submissions to scientific journals, still, it is an intuitive name and actually, I think collaborative writing is possible but not not the main aim of the package. Writing properly formatted markdown text and r code in google docs rather than a code editor would be challenging...moreover "review" is also the menu section in Microsoft Word used to comment and track changes. In google doc, instead, we have "Editing/Suggesting".

I would keep the ...down ending so possible names are:

  • reviewdown
  • collabdown
  • editdown
  • gdrivedown / gdocdown / googledown
  • reviewdrive
  • ....other?

I would not use rmd as we support other file extensions as well...I think it would be difficult to implement other editors beyond googledrive (I don't know other options) still we should be careful in including googledrive in the name to do not overlap with other packages, in particular googledrive from Tidyverse...

I don't have strong opinions but I think name choice is important. Probably is it worth to discuss a little bit about the name (and the package goals, how we expect it to be used) as the name has to be informative, catchy, and it should sound good (e.g. suggestdown sounds weird).

Please lets share your opinions

@ekothe
Copy link
Collaborator

ekothe commented Mar 5, 2021

Regarding the package name, this is an important choice...I see the package as something used to revise a document in its late stages rather than collaborative writing of a new document. This because it would be extremely difficult (and not useful) to write code and text directly in a google doc. @ClaudioZandonella

@ClaudioZandonella I think there may be substantial differences in workflow that reflect discipline norms and expectations as well as the way that different teams work collaboratively.

Here is how I currently use rmdrive (and more generally rmarkdown with github)

  • Create a rmd paper template (typically using papaja) in R and then upload to googledocs (single R literate author)
  • Non-R literate authors add their author details to the YAML header in googledocs
  • Write some sections of the paper that contain no code (e.g. the introduction and the skeleton of the methods and results) in googledocs (multiple authors with varying R literacy)
  • Import the googledocs back into R so the results and sections of the methods are generated from code. This includes some code chunks that output tables and some inline code within text sections (especially within the methods) (multiple R literate authors)
  • Add explanatory text for the newly generated results in googledocs (multiple authors with varying R literacy)
  • Write the bulk of the discussion in googledocs (multiple authors with varying R literacy)
  • Make changes to tables/figures and other analytic code in R in response to questions/comments from co-authors based on the text in googledocs (and often on a knitted version of the paper) (multiple R literate authors)
  • Lots and lots of text editing in googledocs (multiple authors with varying R literacy)
  • Eventually - final knitting ready for paper submission

These steps don't necessarily occur in order and some happen simultanously or multiple times. Steps conducted in R are often conducted collaboratively with several R literate authors who may be helping to implement data munging, analyses, and reporting. Steps conducted in googledocs will normally include the same R literate authors as well as some other collaborators who are involved in less analytically heavy components of the project (like writing introductions and interpreting results). We specifically try and avoid hard coding any numerical values into text and where possible also want to avoid any copy and pasting between documents since this can introduce errors. It is not uncommon for some sections of text to be written before any data is even collected, whereas other sections of a paper might be completed much later. All sections undergo multiple rounds of editing from multiple authors and it is not uncommon for components that require R coding like data viz and table reporting to require editing late in the manuscript editing process. I have some co-authors who will co-write substanital sections of text and code and some that will simply make comments and suggestions based on a more completed text.

One paper I recently worked on this way had the following text chunk in the methods section

Intention was measured as the mean of three items (1= Strongly Disagree, 5 = Strongly Agree) regarding intention to engage in the referent behaviour "have a flu vaccination in the next flu season" (e.g. “I intend to have a flu vaccination in the next flu season”). Higher scores indicate greater intention to have the vaccination ($\alpha$ = `r flu %>% select(flu_int_1, flu_int_2, flu_int_3) %>% psych::alpha() %>% purrr::pluck(1, "std.alpha")`).

The first version of this text section was written in googledocs before data was even collected.

Intention was measured as the mean of three items (1= Strongly Disagree, 5 = Strongly Agree) regarding intention to engage in the referent behaviour "have a flu vaccination in the next flu season" (e.g. “I intend to have a flu vaccination in the next flu season”). Higher scores indicate greater intention to have the vaccination

We then added R code when calculating alpha and moved the text back to googledocs for more text editing. We also conducted various measures of a very similar format. As such, some R code was actually written in googledocs since it was possible for one of our relatively R naiive co-authors to understand the pattern in the code sufficently to generate the code for other variables just by changing relevant variable names as so...

Higher scores indicate greater perceived severity of seasonal influenza ($\alpha$ = `r flu %>% select(flu_sev_1, flu_sev_2, flu_sev_3) %>% psych::alpha() %>% purrr::pluck(1, "std.alpha")`).

Higher scores indicate greater perceived susceptibility to negative impacts of seasonal influenza ($\alpha$ = `r flu %>% mutate(flu_sus_2 = invert_likert(flu_sus_2)) %>% select(flu_sus_1, flu_sus_2, flu_sus_3) %>% psych::alpha() %>% purrr::pluck(1, "std.alpha")`).

For these reasons, I think reviewdown is appropriate. I agree that in academic, "review" is linked with the submissions to scientific journals, still, it is an intuitive name and actually, I think collaborative writing is possible but not not the main aim of the package. @ClaudioZandonella

I recognise that this may simply be a reflection of the workflow used within our lab - but certainly the primary intention of creating rmdrive was to allow collaborative writing. I see this as the central usecase. For that reason I'd be more inclined to go with a package name that didn't privledge the "reviewing" stage of manuscript preparation. I'd also argue that review by co-authors is still a central part of collaboration and so a name like collabdown doesn't necessarily provide the impression that the package wouldn't be useful in that stage of manuscript writing as well.

@Lingtax
Copy link
Collaborator

Lingtax commented Mar 6, 2021

Maybe in the same way that you don't want a name that privileges one format (e.g.rmd), or one software (e.g. googledrive) maybe we need a label that doesn't privilege a workflow, as it seems we have two very different workflows here both strongly adopted by their respective teams.

Maybe something that references the broader category of activities happening after movement (e.g. editing, changing), the type of material being changes (e.g. words, prose) or referencing a category of external platforms (e.g. cloud).

@ClaudioZandonella
Copy link
Owner Author

Thanks @ekothe for the detailed explanation. Now it is much clear to me and I agree with your and @Lingtax 's point. The label should not privilege the review workflow. I agree collaborative writing is a much more important usecase.

So possible names could be:

  • collabdown (this sounds close to google colab to me)
  • editdown
  • drivedown
  • editdrive
  • worddrive
  • ...(any other combination of collaborate/edit/change/word/drive/down/cloud etc..😅)

Honestly, I would leave the choice to you (@ekothe , @januz, @Lingtax) since some combinations sound weird to me but probably only because they remember me something different in Italian.

@januz
Copy link
Collaborator

januz commented Mar 6, 2021

In my work, I have experienced different workflows that lie in between the ones you outline, @ClaudioZandonella and @ekothe (i.e., varying number of users with varying R knowledge and involvement in the writing/coding process). Thus a more general term would be best from my point of view to encompass all possible usage scenarios.

For me, it comes down to what added benefit the package delivers. For me that was mostly being able to use track changes in a Microsoft Word-like fashion with colleagues who might not be able or want to use a git-based collaborative writing process. So, in line with @Lingtax's categorization I would probably go for a name hinting at that process (collabdown, editdown, trackdown?) or the external platform (gdocdown, drivedown, sharedown).

As I'm also not a native speaker, maybe @ekothe and @Lingtax should make the final decision :)

@Lingtax
Copy link
Collaborator

Lingtax commented Mar 6, 2021 via email

@filippogambarota
Copy link
Collaborator

filippogambarota commented Mar 7, 2021

Hello to everyone! thank you for all the interesting ideas and suggestions and also for the availability to collaborate on this project!
Regarding the name, is always a difficult point :). I am Italian as @ClaudioZandonella, and beyond the down part that is very common in the literate programming world with R, the review part was the name with the best sound and meaning, at least for us. I agree also with @januz and @ClaudioZandonella to leave the final choice to @ekothe @Lingtax. Personally, I would keep the down part. Google is very inconsistent and variable with naming (gdoc is always the full office suite) I would avoid using drive, gdoc, etc. I would stay with the action that the package is made for, collab, review, track, etc.

@ekothe your workflow is very interesting and for this reason could be useful at some points to share some ideas about this. The package has a lot of potential in terms of several workflows and we should think about ways to be as open as possible. For example, if I use the package I would avoid to write/modify R code from google docs. However, this could be at the same time a great possibility, especially for simple code chunks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants