Skip to content

liulab-dfci/lisa

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LISA

Web version and documentation is hosted at http://lisa.cistrome.org. For large scale gene set analysis, we recommend user to install local version.

Preparation of Anaconda environment

wget -c https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
export PATH="${HOME}/miniconda3/bin:$PATH"

conda create -n lisa python=3.6 && conda config --add channels conda-forge && conda config --add channels bioconda

Installation

conda activate lisa
# or for old conda
source activate lisa
export MKL_THREADING_LAYER=GNU

conda install -c qinqian lisa

To update, use git clone https://github.com/qinqian/lisa && cd lisa && python setup.py develop.

Get pre-computed datasets from CistromeDB

User can download hg38 or mm10 datasets based on their experiments for human or mouse, the password can be obtained after LISA is published.

wget --user=lisa --password='xxx'  http://lisa.cistrome.org/cistromedb_data/lisa_v1.0_hg38.tar.gz

# or

wget --user=lisa --password='xxx'  http://lisa.cistrome.org/cistromedb_data/lisa_v1.1_mm10.tar.gz

Then, user need to uncompress the datasets, and update the configuration for lisa.

tar xvfz lisa_v1.0_hg38.tar.gz
lisa_update_conf --folder hg38/ --species hg38

# or

tar xvfz lisa_v1.0_mm10.tar.gz
lisa_update_conf --folder mm10/ --species mm10

Usage

Given multiple gene set file gene_set1, gene_set2, gene_set3 et al., each file has one gene (RefSeq id or gene symbol) for each row, user can predict transcriptional regulator ranking using the following commands with random background genes

time lisa model --method="all" --web=True --new_rp_h5=None --new_count_h5=None --species hg38 --epigenome "['DNase', 'H3K27ac']" --cluster=False --covariates=False --random=True --prefix first_run --background=None --stat_background_number=1000 --threads 4 gene_set1 gene_set2 gene_set3 ...

Alternatively, user can generate a fixed background genes based on TAD and promoter activity, and input it to lisa,

lisa_premodel_background_selection --species hg38 --epigenomes="['DNase']" --gene_set=None --prefix=test --random=None --background=dynamic_auto_tad
cut -f 5 -d: test.background_gene.3000 > test.fixed.background_gene

time lisa model --method="all" --web=True --new_rp_h5=None --new_count_h5=None --species hg38 --epigenome "['DNase', 'H3K27ac']" --cluster=False --covariates=False --random=True --prefix first_run --background=test.fixed.background_gene --stat_background_number=1000 --threads 4 gene_set1 gene_set2 gene_set3 ...

User can also input a customized background genes, which should include more than 30 unique RefSeq genes, all input genes are used for modeling and computing statistics, so --stat_background_number is ignored.

time lisa model --method="all" --web=True --new_rp_h5=None --new_count_h5=None --species hg38 --epigenome "['DNase', 'H3K27ac']" --cluster=False --covariates=False --random=True --prefix first_run --background=test.fixed.background_gene --threads 4 gene_set1 gene_set2 gene_set3 ...

Update LISA

git clone http://github.com/qinqian/lisa/
source activate lisa
cd lisa && python setup.py develop
lisa_update_conf --folder hg38/ --species hg38
lisa_update_conf --folder mm10/ --species mm10

Remove LISA

conda env remove -n lisa
rm -r mm10/ hg38/

Citation

Qin Q, Fan J, Zheng R, Wan C, Mei S, Wu Q. Inferring transcriptional regulators through integrative modeling of public chromatin accessibility and ChIP-seq data. 2019.

Please note that the reference is a preprint hosted at biorxiv.

About

epigenome analysis to rank transcription factors

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • JavaScript 52.1%
  • C 32.2%
  • HTML 10.0%
  • Python 4.7%
  • CSS 0.3%
  • Shell 0.2%
  • Other 0.5%