Skip to content
master
Switch branches/tags
Code

LISA

Lisa web version and documentation is hosted at Lisa website. The web version is down recently due to the upgrading, which might be fixed in the near future. We recommend users to install command line Lisa version 1 below (only Mac OSX and Linux system have been tested). Another exploratory Lisa version 2 (https://github.com/liulab-dfci/lisa2/) is under development as well. Both the command line versions are worth trying for large-scale gene set analysis to infer the transcriptional regulators. For more information and citation, please see Qin Q, Fan J, Zheng R, Wan C, Mei S, Wu Q, et al. Inferring transcriptional regulators through integrative modeling of public chromatin accessibility and ChIP-seq data. Genome Biology;(2020)21:32.

Preparation of Anaconda environment and Installation

wget -c https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
export PATH="${HOME}/miniconda3/bin:$PATH"

conda config --add channels defaults && conda config --add channels conda-forge && conda config --add channels bioconda && conda install mamba -c conda-forge && mamba create -n lisa -c qinqian lisa_minimal python=3.6.6

#or 

conda config --add channels defaults && conda config --add channels conda-forge && conda config --add channels bioconda && conda install mamba -c conda-forge && mamba create -n lisa -c qinqian lisa=1.0 python=3.6.6

export MKL_THREADING_LAYER=GNU

Update package

use git clone https://github.com/qinqian/lisa && cd lisa && python setup.py develop.

Get pre-computed datasets from CistromeDB

User can download hg38 or mm10 datasets based on their experiments for human or mouse.

wget -c http://lisa.cistrome.org/cistromedb_data/lisa_v1.2_hg38.tar.gz

# or

wget -c http://lisa.cistrome.org/cistromedb_data/lisa_v1.2_mm10.tar.gz

Then, user need to uncompress the datasets, and update the configuration for lisa.

tar xvfz lisa_v1.2_hg38.tar.gz
lisa_update_conf --folder absolute_path_hg38/ --species hg38

# or

tar xvfz lisa_v1.2_mm10.tar.gz
lisa_update_conf --folder absolute_path_mm10/ --species mm10

Usage

First, activate the conda environment:

conda activate lisa_minimal

# or 

conda activate lisa

Given multiple gene set file gene_set1, gene_set2, gene_set3 et al., each file has one gene (RefSeq id or gene symbol) for each row, user can predict transcriptional regulator ranking using the following commands with random background genes

time lisa model --clean=True --method="all" --web=True --new_rp_h5=None --new_count_h5=None --species hg38 --epigenome "['DNase', 'H3K27ac']" --cluster=False --covariates=False --random=True --prefix first_run --background=None --stat_background_number=1000 --threads 4 gene_set1 gene_set2 gene_set3 ...

Alternatively, user can generate a fixed background genes based on TAD and promoter activity, and input it to lisa,

lisa_premodel_background_selection --species hg38 --epigenomes="['DNase']" --gene_set=None --prefix=test --random=None --background=dynamic_auto_tad
cut -f 5 -d: test.background_gene.3000 > test.fixed.background_gene

time lisa model --clean=True --method="all" --web=True --new_rp_h5=None --new_count_h5=None --species hg38 --epigenome "['DNase', 'H3K27ac']" --cluster=False --covariates=False --random=True --prefix first_run --background=test.fixed.background_gene --stat_background_number=1000 --threads 4 gene_set1 gene_set2 gene_set3 ...

User can also input a customized background genes, which should include more than 30 unique RefSeq genes, all input genes are used for modeling and computing statistics, so --stat_background_number is ignored.

time lisa model --method="all" --clean=True --web=True --new_rp_h5=None --new_count_h5=None --species hg38 --epigenome "['DNase', 'H3K27ac']" --cluster=False --covariates=False --random=True --prefix first_run --background=test.fixed.background_gene --threads 4 gene_set1 gene_set2 gene_set3 ...

Uninstall LISA

conda env remove -n lisa

or 

conda env remove -n lisa_minimal

rm -r absolute_path_mm10/ absolute_path_hg38/

Preprocessing datasets to update the database

These two repository and repository includes scripts for preprocessing CistromeDB datasets, includes Peak-RP and Chrom-RP.

Troubleshooting

Sometimes the numpy and pandas would have conflicts version after installation of Lisa, the way to fix that is to uninstall both numpy and pandas, and reinstall the fixed versions of the two packages, these versions include: numpy 1.15.4 with pandas 0.23.4 (thanks @ChangliangWang), numpy 1.15.1 with pandas 0.25.2, numpy 1.15.1 with pandas 1.0.5, numpy 1.17.2 with pandas 1.0.5. Also, user may need to reinstall scikit-learn 0.21.3 with latest Lisa since lbfgs solver does not support l1 penalty for logistic regression.