This repository contains the analysis code for POP-Inf paper: Assumption-lean and data-adaptive post-prediction inference.
Simulation scripts are included in the folder simulations
. Results and figures can be reproduced via the following three steps:
- Run simulations for mean estimation and OLS coefficient estimation in
1-1_sim_mean.Rmd
and1-2_sim_ols.Rmd
, respectively. These scripts will automatically create a subfolderresults
containing raw results. - Summarize simulation results into csv files:
2-1_sum_mean.Rmd
and2-2_sum_ols.Rmd
. - Reproduce simulation figures in the manuscript:
3_make_figure.Rmd
.
Our GTEx real data analysis is based on the codes and data used in Hypergraph factorization for multi-tissue gene expression imputation and The impact of sex on gene expression across human tissues. We thank the authors for sharing the codes.
There are five steps for the pipeline:
- Clean the GTEx data from data downloaded from GTEx:
1_clean_gtex.R
. - Run the HYFA to impute the gene expression:
2_run_hyfa.sh
. - Process the observed and imputed gene expression:
3_process_gtex.R
. - Run POP-Inf and compare it with alternative approaches to detect sex-biased genes with imputed gene expression:
4_popinf.R
. - Make the Figure3 in the POP-Inf paper:
5_make_figure.R
.
Please submit an issue or contact Jiacheng (jiacheng.miao@wisc.edu) or Xinran (xinran.miao@wisc.edu) for questions.
@article{miao2023assumption,
title={Assumption-lean and Data-adaptive Post-Prediction Inference},
author={Miao, Jiacheng and Miao, Xinran and Wu, Yixuan and Zhao, Jiwei and Lu, Qiongshi},
journal={arXiv preprint arXiv:2311.14220},
year={2023}
}