Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cleaning up introduction and removing content that goes off track #57

Merged
merged 5 commits into from
Apr 15, 2020

Conversation

vsoch
Copy link
Collaborator

@vsoch vsoch commented Apr 2, 2020

This pull request is a first shot at cleaning up the manuscript, namely:

  • the introduction side tracked into talking about research compendiums, which was distracting and out of scope for an introduction that should be focused on leading into talking about building data science containers.
  • I removed lines that (although possibly true / meaningful) didn't add to the flow of the paper.

I can add comment for any specific choice for reviewers that are interested.

Signed-off-by: vsoch vsochat@stanford.edu

Signed-off-by: vsoch <vsochat@stanford.edu>
Don't want to imply that donoho talked about containerization
also one sentence per line
Copy link
Owner

@nuest nuest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @vsoch - good job. I'll do another run through and try to remove some repeated content and close the remaining issues today. Besides the open issues, the only thing missing now is a better example Dockerfile for within the article.

@@ -1,5 +1,5 @@
---
title: "Ten Simple Rules for Writing Dockerfiles for Reproducible Research"
title: "Ten Simple Rules for Writing Dockerfiles for Reproducible Data Science"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good!

By providing this recipe, authors of scientific articles greatly improve their work's level of documentation, transparency, and reusability.
Such practice is one important part of common practices for scientific computing [@wilson_best_2014; @wilson_good_2017], with the result that it is much more likely both the author and others are able to reproduce and extend an analysis workflow.
The containers built from these recipes are portable encapsulated snapshots of a specific computing environment.
Such containers have been demonstrated for capturing scientific notebooks [@rule_ten_2019] and reproducible workflows [@sandve_ten_2013].
Research compendia also allow for proper citation of the used computing environment, which is not possible within containers alone.
Best practices are still a work in progress [cf. @katz_software_2018], but you should try your best to give credit to creators of software you rely on by following recommendations of projects such as CodeMeta ([https://codemeta.github.io/](https://codemeta.github.io/)) and the Citation File Format ([https://citation-file-format.github.io/](https://citation-file-format.github.io/)).
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Killed one of my darlings here... good job :-).

@@ -369,7 +337,7 @@ You can view all labels with [`docker inspect`](https://docs.docker.com/engine/r
Labels serve as structured metadata that can be leveraged by services, e.g., https://microbadger.com/labels.
For example, software versions, license, and maintainer contact information are commonly seen and very useful if a `Dockerfile` is discovered out of context.
While you can add arbitrarily complex information with labels, for research compendia the user-facing documentation is much more important.
If you want to earn extra points, and you never know what future algorithms will be able to make sense of, include global identifiers such as [ORCID identifiers](https://orcid.org/) for people, a DOI of the research compendium, e.g., [reserved on Zenodo](https://help.zenodo.org/) before publishing the research compendium, or your funding agency's grant number.
Important metadata that might be more utilized with future tools includes global identifiers such as [ORCID identifiers](https://orcid.org/), DOIs of the research compendium, e.g., [reserved on Zenodo](https://help.zenodo.org/), or a funding agency's grant number.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've re-added the research compendium link here, but am fine with not using the term throughout the article.

Depositing the image next to other project files, i.e., data, code, and the used `Dockerfile`, in a public repository makes them likely to be preserved, but is is highly unlikely that over time you will be able to recreate it precisely from the accompanying `Dockerfile`.
Publishing the image and the contained metadata therein (e.g., the Docker version used) may even allow future science historians to emulate the Docker runtime environment.
Applying proper preservation strategies (cf. [@emsley_framework_2018]) can be highly complex, but simply running an image "as-is", i.e. with the default command and entrypoint (see \ruleref{rule:interactive}), and observing the output is quite likely to work for many years into the future.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another one of my darlings 💀 !! I agree though that container preservation as a topic is not mature enough yet to expose the target audience of this article.

@nuest nuest merged commit f811f9a into master Apr 15, 2020
@vsoch
Copy link
Collaborator Author

vsoch commented Apr 15, 2020

Woohoo! Thank you @nuest !

@nuest nuest deleted the cleaning-up-sections branch May 13, 2020 06:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants