Skip to content

MARIO: single-cell proteomic data matching and integration using both shared and distinct features

License

Notifications You must be signed in to change notification settings

shuxiaoc/mario-py

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MARIO: single-cell proteomic data matching and integration pipeline

Description

This github repo includes mario-py and mario-R, which is a Python package for matching and integrating multi-modal single cell data with partially overlapping features. The method is specifically tailored toward proteomic datasets, and for detailed description on the algorithm, including the core methodology, mathmetical ingredients, application on various biological samples, and extensive benchmarking, please refer to the paper.

This work has been lead by Shuxiao Chen from Zongming Lab @Upenn and Bokai Zhu from Nolan lab @Stanford.

Getting Started

Dependencies

For easy usage, we suggest builing a conda virtualenv with python = 3.8.

conda create -n mario python=3.8

Installing

To install MARIO, we can easily install it with pip function (package name pyMARIO):

python -m pip install pyMARIO

How to use

Quick example:

To use in MARIO in python :

from mario.match import pipelined_mario
final_matching_lst, embedding_lst = pipelined_mario(data_lst=[df1, df2])

Where df1 and df2 are two dataframes for match and integration, with row as cells, columns as features. Remember for shared features, the column names should be identical. Input list can be multiple dataframes, as MARIO accomodates for multiple dataset match and integration.

The result contains the a matching list (matching), and a embedding list (integration). For detailed usage please refer to the Full tutorial section.

Similarly, to use in MARIO in R (with package reticulate) :

library(reticulate)
myenvs=conda_list() # get conda virtualenv list
envname=myenvs$name[12] # specify which virtualenv to use, should use the one for MARIO
use_condaenv(envname, required = TRUE)
mario.match <- import("mario.match") # import main mario-py module

pipelined_res = mario.match$pipelined_mario(data_lst=list(df1, df2))

Where the result also contains the matching list and embedding list.

Full tutorial:

For step by step tutorials on how to use MARIO, with fine-tuned parameters for optimal results and full functionality, please refer to the documents we provided here:

Python - Jupyter notebook: Match and Integration of Human Bonemarrow datasets

Python - Jupyter notebook: Match and Integration of multiple Xspecies datasets

R - Rmarkdown: Match and Integration of Human Bonemarrow datasets

License and Citation

MARIO is under the Academic Software License Agreement, please use accordingly.

About

MARIO: single-cell proteomic data matching and integration using both shared and distinct features

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages