MARIO: single-cell proteomic data matching and integration pipeline

Description

This github repo includes mario-py and mario-R, which is a Python package for matching and integrating multi-modal single cell data with partially overlapping features. The method is specifically tailored toward proteomic datasets, and for detailed description on the algorithm, including the core methodology, mathmetical ingredients, application on various biological samples, and extensive benchmarking, please refer to the paper.

This work has been lead by Shuxiao Chen from Zongming Lab @Upenn and Bokai Zhu from Nolan lab @Stanford.

Getting Started

Dependencies

For easy usage, we suggest builing a conda virtualenv with python = 3.8.

conda create -n mario python=3.8

Installing

To install MARIO, we can easily install it with pip function (package name pyMARIO):

python -m pip install pyMARIO

How to use

Quick example:

To use in MARIO in python :

from mario.match import pipelined_mario
final_matching_lst, embedding_lst = pipelined_mario(data_lst=[df1, df2])

Where df1 and df2 are two dataframes for match and integration, with row as cells, columns as features. Remember for shared features, the column names should be identical. Input list can be multiple dataframes, as MARIO accomodates for multiple dataset match and integration.

The result contains the a matching list (matching), and a embedding list (integration). For detailed usage please refer to the Full tutorial section.

Similarly, to use in MARIO in R (with package reticulate) :

library(reticulate)
myenvs=conda_list() # get conda virtualenv list
envname=myenvs$name[12] # specify which virtualenv to use, should use the one for MARIO
use_condaenv(envname, required = TRUE)
mario.match <- import("mario.match") # import main mario-py module

pipelined_res = mario.match$pipelined_mario(data_lst=list(df1, df2))

Where the result also contains the matching list and embedding list.

Full tutorial:

For step by step tutorials on how to use MARIO, with fine-tuned parameters for optimal results and full functionality, please refer to the documents we provided here:

Python - Jupyter notebook: Match and Integration of Human Bonemarrow datasets

Python - Jupyter notebook: Match and Integration of multiple Xspecies datasets

R - Rmarkdown: Match and Integration of Human Bonemarrow datasets

License and Citation

MARIO is under the Academic Software License Agreement, please use accordingly.

Name		Name	Last commit message	Last commit date
Latest commit History 203 Commits
Manuscript_Archive_Code		Manuscript_Archive_Code
media		media
src		src
tutorials		tutorials
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MARIO: single-cell proteomic data matching and integration pipeline

Description

Getting Started

Dependencies

Installing

How to use

Quick example:

Full tutorial:

License and Citation

About

Releases

Packages

Contributors 3

Languages

License

shuxiaoc/mario-py

Folders and files

Latest commit

History

Repository files navigation

MARIO: single-cell proteomic data matching and integration pipeline

Description

Getting Started

Dependencies

Installing

How to use

Quick example:

Full tutorial:

License and Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages