OpenOmics: Library for integration of multi-omics, annotation, and interaction data

Submitting Author: Jonny Tran (@JonnyTran)  
All current maintainers: @JonnyTran
Package Name: openomics
One-Line Description of Package: Library for integration of multi-omics, annotation, and interaction data
Repository Link:  https://github.com/JonnyTran/OpenOmics
Version submitted: [0.8.4](https://github.com/JonnyTran/OpenOmics/releases/tag/v0.8.4)
Editor: @NickleDave   
Reviewer 1: @gawbul
Reviewer 2: @ksielemann
Archive: [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.4731011.svg)](https://doi.org/10.5281/zenodo.4731011)
JOSS DOI: [![DOI](https://joss.theoj.org/papers/10.21105/joss.03249/status.svg)](https://joss.theoj.org/papers/10.21105/joss.03249)
Version accepted: v [0.8.8](https://github.com/JonnyTran/OpenOmics/releases/tag/v0.8.8)
Date accepted (month/day/year): 04/17/2021

---

## Description

OpenOmics is a Python library to assist integration of heterogeneous multi-omics bioinformatics data. By providing an API of data manipulation tools as well as a web interface (WIP), OpenOmics facilitates the common coding tasks when preparing data for bioinformatics analysis. It features support for:
- Genomics, Transcriptomics, Proteomics, and Clinical data.
- Harmonization with 20+ popular annotation, interaction, and disease-association databases (e.g. GENCODE, Ensembl, RNA Central, BioGRID, DisGeNet etc.)

OpenOmics also has an efficient data pipeline that bridges the popular data manipulation Pandas library and Dask distributed processing to address the following use cases:
- Provides a standard pipeline for dataset indexing, table joining and querying, which are transparent and customizable for end-users. 
- Efficient disk storage for large multi-omics dataset with Parquet data structures.
- Multiple data types that supports both interactions and sequence data, and allows users to export to NetworkX graphs or down-stream machine learning.
- An easy-to-use API that works seamlessly with external Galaxy tool interface or the built-in Dash web interface (WIP).

## Scope 
- Please indicate which [category or categories][PackageCategories] this package falls under:
	- [x] Data retrieval
	- [x] Data extraction
	- [x] Data munging
	- [ ] Data deposition
	- [x] Reproducibility
	- [ ] Geospatial
	- [ ] Education
	- [ ] Data visualization*

\* Please fill out a pre-submission inquiry before submitting a data visualization package. For more info, see [notes on categories][NotesOnCategories] of our guidebook.

- Explain how the and why the package falls under these categories (briefly, 1-2 sentences):

>   OpenOmics' core functionalities are to provide a suite of tools for data preprocessing, data integration, and public database retrieval. Its main goal is to maximize the transparency and reproducibility in the process of multi-omics data integration.


-   Who is the target audience and what are scientific applications of this package?  

>   OpenOmics' primary target audience are computational bioinformaticians, and the scientific application of this package is to provide scalable ad-hoc data-frame manipulation for multi-omics data integration in a reproducible manner. Also, we are currently developing an interactive web dashboard and interfaces to the Galaxy Tool Shed, disseminating the tool to biologists without a programming background.


-   Are there other Python packages that accomplish the same thing? If so, how does yours differ?

> Existing PyPI Python packages within the scope of multi-omics data analysis are "pythomics" and "omics". Their functions appear to be lacking support for manipulation of integrated multi-omics dataset, retrieval of public databases, and extensible OOP design. OpenOmics aims to follow modern software best-practices and package publishing standards.
>  
> Aside from multi-omics integration tools, several specialized Python packages exists for single omics data, such as ScanPy's "AnnData" and "Loom" files. They provide an intuitive data structure for expression arrays and side annotations, and Loom file even allows for out-of-core data-frame processing. However, they don't yet provide mechanisms for multi-omics data integration, where each omics data may have overlapping samples or varying row/column sizes.


-   If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or `@tag` the editor you contacted:

> https://github.com/pyOpenSci/software-review/issues/30


## Technical checks

For details about the pyOpenSci packaging requirements, see our [packaging guide][PackagingGuide]. Confirm each of the following by checking the box.  This package:

- [x] does not violate the Terms of Service of any service it interacts with. 
- [x] has an [OSI approved license][OsiApprovedLicense].
- [x] contains a README with instructions for installing the development version. 
- [x] includes documentation with examples for all functions.
- [x] contains a vignette with examples of its essential functions and uses.
- [x] has a test suite.
- [x] has continuous integration, such as Travis CI, AppVeyor, CircleCI, and/or others.

## Publication options

- [x] Do you wish to automatically submit to the [Journal of Open Source Software][JournalOfOpenSourceSoftware]? If so:

<details>
 <summary>JOSS Checks</summary>  

- [x] The package has an **obvious research application** according to JOSS's definition in their [submission requirements][JossSubmissionRequirements]. Be aware that completing the pyOpenSci review process **does not** guarantee acceptance to JOSS. Be sure to read their submission requirements (linked above) if you are interested in submitting to JOSS.
- [x] The package is not a "minor utility" as defined by JOSS's [submission requirements][JossSubmissionRequirements]: "Minor ‘utility’ packages, including ‘thin’ API clients, are not acceptable." pyOpenSci welcomes these packages under "Data Retrieval", but JOSS has slightly different criteria.
- [x] The package contains a `paper.md` matching [JOSS's requirements][JossPaperRequirements] with a high-level description in the package root or in `inst/`.
- [x] The package is deposited in a long-term repository with the DOI: 10.5281/zenodo.4441167

*Note: Do not submit your package separately to JOSS*
  
</details>

## Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?
This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.

- [x] Yes I am OK with reviewers submitting requested changes as issues to my repo. Reviewers will then link to the issues in their submitted review.

## Code of conduct

- [x] I agree to abide by [pyOpenSci's Code of Conduct][PyOpenSciCodeOfConduct] during the review process and in maintaining my package should it be accepted.


**P.S.** *Have feedback/comments about our review process? Leave a comment [here][Comments]

## Editor and Review Templates

[Editor and review templates can be found here][Templates]

[PackagingGuide]: https://www.pyopensci.org/contributing-guide/authoring/index.html#packaging-guide

[PackageCategories]: https://www.pyopensci.org/contributing-guide/open-source-software-peer-review/aims-and-scope.html?highlight=data#package-categories

[NotesOnCategories]: https://www.pyopensci.org/contributing-guide/open-source-software-peer-review/aims-and-scope.html?highlight=data#notes-on-categories


[JournalOfOpenSourceSoftware]: http://joss.theoj.org/

[JossSubmissionRequirements]: https://joss.readthedocs.io/en/latest/submitting.html#submission-requirements

[JossPaperRequirements]: https://joss.readthedocs.io/en/latest/submitting.html#what-should-my-paper-contain

[PyOpenSciCodeOfConduct]: https://www.pyopensci.org/contributing-guide/open-source-software-peer-review/code-of-conduct.html?highlight=code%20conduct

[OsiApprovedLicense]: https://opensource.org/licenses

[Templates]: https://www.pyopensci.org/contributing-guide/appendices/templates.html

[Comments]: https://github.com/pyOpenSci/governance/issues/8


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

OpenOmics: Library for integration of multi-omics, annotation, and interaction data #31

Description

Scope

Technical checks

Publication options

Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?

Code of conduct

Editor and Review Templates

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

OpenOmics: Library for integration of multi-omics, annotation, and interaction data #31

Description

Description

Scope

Technical checks

Publication options

Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?

Code of conduct

Editor and Review Templates

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions