PyRHE

PyRHE is an efficient and portable Python package for RHE-mc (Randomized Haseman–Elston regression for Multi-variance Components). It converts large matrix into tensors to accelerate large matrix multiplication and incorporates multiprocessing to process Jackknife blocks in parallel. It is designed to run on both CPU and CUDA-enabled GPU, and is easy to install and integrate into other applications.

Installation

git clone git@github.com:sriramlab/PyRHE.git
pip install pyrhe/
# Also install proper version of PyTorch from https://pytorch.org/

Example Usage

Run PyRHE as follows:

python run_rhe.py <command_line arguments>

Alternatively, you may run PyRHE using a newline-separated config file:

python run_rhe.py --config <config file>

See the example folder for an example usage.

Parameters

genotype (-g): The path of PLINK BED genotype file
phenotype (-p): The path of phenotype file
covariate (-c): The path of covariate file
annotation (-annot): The path of genotype annotation file.
num_vec (-k): The number of random vectors (10 is recommended). 
num_block (-jn): The number of jackknife blocks (100 is recommended). 
    The higher the number of jackknife blocks, the higher the memory usage.
output (-o): The path of the output file prefix
streaming: Whether to use the streaming version or not
num_workers: The number of workers
seed (-s): The random seed
device: Device to use (cpu or gpu)
      Using CPU already enables great performance. You can further improve performance using GPU
cuda_num: CUDA number of GPU
geno_impute_method: How to impute missing genotype ("binary" (binary imputation) or "mean" (mean imputation))
cov_impute_method: How to impute missing covariate ("ignore" (ignore individuals with missing covariate) or "mean" (mean imputation))
samp_prev: Sample prevalence of binary phenotype (for conversion to liability scale)
pop_prev: Population prevalence of binary phenotype (for conversion to liability scale)
trace (-tr): Save the stochastic trace estimates as trace summary statistics (.trace) with metadata (.MN)
trace_dir: Directory to save the trace estimates

PyRHE is easily incorporated in your own project. Here is an example notebook for how to do so.

Comparison between PyRHE & Original RHE

Accuracy of Estimation

Runtime Comparision

Example testing pipeline:

Here is the example testing pipeline. You can run testing pipelines when some files are missing (e.g., annotation file).

1. Set Up Create a .env file and specify the RESULT_DIR (where you store the results) and DATA_DIR (store the simulated phenotype, generated annotation file, etc.)

2. Generate Annotation File:

cd core
python generate_annot.py -g {geno_path} -b {num_bin} -o {output_file}

3. Simulate Phenotype:

Use the Simulator to simulate phenotype without covariate. If want to add covariate, do

cd core
python simulate_pheno.py -b {num_bin} -c {cov_file_path}

4. Run original RHE
Running the original RHE using

python run_original.py -g {geno_path} -b {num_bin} -c {cov_file_path} -k {num_vec} -jn {num_block} --output {output_file}

Then parse the outputs using parse_output.py

5. Run PyRHE

python run_rhe.py -g {geno_path} -b {num_bin} -k {num_vec} -c {cov_file_path} -jn {num_block} --output {output_file}

Name		Name	Last commit message	Last commit date
Latest commit History 121 Commits
commands		commands
example		example
pyrhe		pyrhe
.gitignore		.gitignore
README.md		README.md
constant.py		constant.py
parse_output.py		parse_output.py
plotting_200k.ipynb		plotting_200k.ipynb
plotting_real.ipynb		plotting_real.ipynb
rhe_mc_all_in_one.py		rhe_mc_all_in_one.py
run_original.py		run_original.py
run_rhe.py		run_rhe.py
simulate_pheno.py		simulate_pheno.py
small_sample.ipynb		small_sample.ipynb
test_real.py		test_real.py

sriramlab/PyRHE

Folders and files

Latest commit

History

Repository files navigation

PyRHE

Installation

Example Usage

Parameters

Comparison between PyRHE & Original RHE

Accuracy of Estimation

Runtime Comparision

Example testing pipeline:

About

Resources

Stars

Watchers

Forks

Languages