Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Observations about Proposed Standards for Public Health Bioinformatics Software #33

Open
svarona opened this issue Dec 14, 2023 · 0 comments

Comments

@svarona
Copy link
Collaborator

svarona commented Dec 14, 2023

My group and I have been reviewing the Proposed Standards for Public Health Bioinformatics Software document from the perspective of a team dedicated to the development of analysis pipelines, and we have some observations, from our humble opinion and experience, about the document, that we hope could help in its development.

We believe it needs to be clearly defined whether these are minimum requirements, best practices, or guidelines, something we think its already under discussion in the meetings. We also think it should be clarified whether these are standards for pipelines or software, as some points may not apply to pipelines, and reversal.

We also believe that, in addition to indicating how this is going to be evaluated as Frank is working in his PR (#32), it could be useful to provide another section per point with resources, such as links or documents, that can assist developers with each of the standards.

Next, I will describe our observations on some of the points:

  • Version Control: Links like https://semver.org/spec/v2.0.0.html and https://keepachangelog.com/en/1.0.0/ could be added regarding the CHANGELOG.
  • Commitment to Maintain: Perhaps it could be replaced with a section like "Maintenance Capability," as even if the commitment to maintain exists, it may not be fulfilled due to external factors or bad faith. Perhaps it is sufficient with the description/demonstration of how it will be maintained, having it considered in the README and contemplated in the pipeline's background (research project, master project, community...).
  • Documentation for Local Installation and/or Remote Access (e.g. Web Server or Galaxy/Terra Workflow): Some recommendations like pip, conda, etc., could be included for installation.
  • Software Performance: We do not believe that documenting this should be a minimum for a pipeline. Besides, it is relatively difficult for a small group of developers working alone, and it seems more like obtaining feedback from other groups or users.
  • Common File Formats: Is this going to be reviewed with a list of formats? A list would need to be provided in the standard's definition but alse kept up to date over time, which seems complicated but also relevant.
  • Software Security and Vulnerabilities: We do not see this as necessary for a pipeline. Perhaps for a website or database, but even then, in many cases, it is carried out by the security department of the institution externally to the code itself.

Here is a just proposal of reorganization to reduce the list to 10, which I believe was one of the next objectives:

  1. Publicly-Accessible Repository
  2. Version Control
  3. Pipeline Documentation
    • Open-Source License
    • Contribution, Authorship, and Verified Point of Contact
    • Maintenance Capability
    • Conflict of Interest Statement
  4. Pipeline Guidelines
    • Documentation for Local Installation and/or Remote Access
    • Software Functionality
    • Statement of Need with Respect to Public Health Pathogen Genomics
    • Example Usage
    • Container/Packaged Software
  5. Software Testing
  6. Community Guidelines for Contribution and Support
  7. Benchmark/Validation Datasets
  8. Common File Formats
  9. Reference Data Requirements
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants