Skip to content

This is the GitHub repository for our recent study "A comprehensive analysis of the usability and archival stability of omics computational tools and resources." Please follow to the repository's wiki for more information and details on how this repository is connected to the paper.

License

smangul1/good.software

Repository files navigation

Analysis of the usability and archival stability of omics computational tools

Preprint Available MIT Licence Binder

This project contains the links to the datasets and the code that was used for our study : "A comprehensive analysis of the usability and archival stability of omics computational tools and resources"

Table of contents

How to cite this study

Mangul, Serghei, et al. "A comprehensive analysis of the usability and archival stability of omics computational tools and resources." bioRxiv, doi: https://doi.org/10.1101/452532

Datasets

Archival stability

Hosted on Figshare-DOI: 10.6084/m9.figshare.7738901

We downloaded open access papers via PubMed from 10 systems and computational biology journals. Raw data in XML format is available here. Our approach to extract software links from the downloaded papers and verify the archival stability of links is described in the Methods section of the paper and Figure S1. Timeout links were manually verified.

Links extracted from the abstracts and the body of the surveyed papers (n=48,393) are available in CSV format here. The CVS file contains the following fields:

  • The type of link. The links were classified as extracted from abstract or the body of the paper
  • Name of the journal
  • Year the paper was published
  • URL
  • HTTP status: 0-300 - success. 300-400 redirection. 400 - broken link. -1 - timeout. See more details here
  • Binary flag to indicates if the link was present in one paper or was shared across multiple papers.

Usability

Hosted on Figshare-DOI: 10.6084/m9.figshare.7738949

We have randomly chosen 99 tools across various domains of computational biology. The methodology used to select tools and list of domains is presented in the Methods section of our paper.

Information about the usability of 99 tools is presented in CSV format here. The CVS file contains the following fields:

  • tool ID
  • Name of the package manager from which the tools was available, or "NA" if the tool was not available via a package manager
  • Number of citations per year
  • Number of commands executed during the installation process
  • Number of commands suggested in the installation manual of the tool
  • The proportion of undocumented commands (not specified in the manual)
  • Binary flag to indicate if the tool passed automatic installation test. Tools that require no manual intervention are considered to pass automatics installation test.
  • The total installation time
  • Binary flag to indicate how easy was to install the software tool. We categorized a tool as ‘easy to install’ if it could be installed in 15 minutes or less; ‘complex installation’ if it required more than 15 minutes but was successfully installed before the two-hour limit; and ‘not installed’ if the tool could not be successfully installed within two hours
  • Binary flag to indicate if the example dataset was provided

Reproducing results

We have prepared Jupyter Notebooks that utilize the raw data described above to reproduce the results and figures presented in our manuscript.

For more information about reproducing the data collection process used in the archival stability section of our study, see the README.md file in the download.parse.data/ directory.

Would you like to play with our data and code? There is no need to download or install anything, we set this repository up compatible with Binder:

Binder

Further acknowledgements

We thank the input from our peer reviewers, as well as online commenters in social media, in suggesting making the figures colorblind friendly. We acknowledge the following resources, which help us achieve the final result:

License

This repository is under MIT license. For more information, please read our LICENSE.md file.

Contact

Please do not hesitate to contact us (smangul@ucla.edu, thiago.mosqueiro@gmail.com, blekhman@umn.edu) if you have any comments, suggestions, or clarification requests regarding the study or if you would like to contribute to this resource.

About

This is the GitHub repository for our recent study "A comprehensive analysis of the usability and archival stability of omics computational tools and resources." Please follow to the repository's wiki for more information and details on how this repository is connected to the paper.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published