Skip to content

S PrediXcan Command Line Tutorial

Alvaro Barbeira edited this page Mar 2, 2020 · 1 revision

Introduction

This tutorial covers S-PrediXcan command line usage, as described in the Readme.

Prerequisites

#Background

S-PrediXcan's command line implementation is flexible when processing GWAS summary statistics. That is, it supports input of GWAS files in different formats, with different kinds of data. S-PrediXcan is meant to be used with precomputed prediction models and LD references, published in http://predictdb.org/

This page shows a sample usage and setup.

#Get Sample data

If you have not done this previously, you can download example data from here: https://uchicago.box.com/s/us7qhue3juubq66tktpogeansahxszg9

This will download a sample GWAS study and a set of genetic reference data for S-PrediXcan to run. In this case, a transcriptome model built from Depressive Genes and Network whole blood samples, and SNP covariance matrices built from Thousand Genomes samples.

#Run S-PrediXcan

If the download script didn't experience any errors, you are ready to go. Just type:

$ ./SPrediXcan.py \
--model_db_path data/DGN-WB_0.5.db \
--covariance data/covariance.DGN-WB_0.5.txt.gz \
--gwas_folder data/GWAS \
--gwas_file_pattern ".*gz" \
--snp_column SNP \
--effect_allele_column A1 \
--non_effect_allele_column A2 \
--beta_column BETA \
--pvalue_column P \
--output_file results/test.csv

These command line options mean:

  • --model_db_path Path to tissue transriptome model
  • --covariance Path to file containing covariance information. This covariance should have information related to the tissue transcriptome model.
  • --gwas_folder Folder containing GWAS summary statistics data.
  • --gwas_file_pattern This option allows the program to select which files from the input to use based on their name. This allows to ignore several support files that might be generated at your GWAS analysis, such as plink logs.
  • --snp_column Argument with the name of the column containing the RSIDs.
  • --effect_allele_column Argument with the name of the column containing the effect allele (i.e. the one being regressed on).
  • --non_effect_allele_column Argument with the name of the column containing the non effect allele.
  • --beta_column Tells the program the name of a column containing -phenotype beta data for each SNP- in the input GWAS files.
  • --pvalue_column Tells the program the name of a column containing -PValue for each SNP- in the input GWAS files.
  • --output_file Path where results will be saved to.

The previous command will run S-PrediXcan on the sample GWAS files (loaded from a default location), and produce a tests.csv at a results folder. S-PrediXcan accepts a folder with GWAS statistics split among several files, or a single file with all summary statistics.

Parameter defaults

S-PrediXcan supports several command line parameters not shown here, to accomodate for different GWAS formats. For example, it assumes by default that A1 is the effect_allele_column, and A2 is the non_effect_allele_column. If your GWAS has different names for those columns, you can specify them using those arguments.

Conclusion

The previous steps showed how to use S-PrediXcan's command line tool. It relied on a specific set of reference data; however, the reference data set might be useful for application to different GWAS's concerned with blood phenotypes.

Please check the Reference for a thorough description of command line parameters.