Skip to content
Hae Kyung Im edited this page Jun 3, 2020 · 17 revisions

Welcome!

Here you will find several articles that might help you getting up and running with the PrediXcan family of tools, or troubleshoot common issues.

Introduction

In the beginning, there was PrediXcan. It allowed to integrate molecular mechanisms' information with GWAS complex traits. A principled way to associate genes to traits was born, and everything was good.

PrediXcan relied on:

  • individual-level genotype and phenotype data
  • prediction model of a gene (or any molecular feature) to compute associations between genes and a phenotype.

To ease application to GWAS cohorts with large sample sizes, S-PrediXcan was born. This uses GWAS associations' summary statistics and a compilation of population LD to infer gene-trait associations without needing individual-level data. And everything was better.

But things could be even better, so MultiXcan (and its sibling S-MultiXcan for GWAS summary statistics) came into being. MultiXcan leverages multiple tissues simultaneously, exploiting cross-tissue QTL sharing.

These are called the "PrediXcan" family of tools. They are components of the MetaXcan Framework, which uses molecular mechanism associations to complex traits, prioritized by complementary colocalization measures and analyses, to prioritize candidate genes, introns, or other mechanisms such as brain morphology. Currently we use ENLOC as colocalization measure.

GTEx v8 models

We have released two different families of prediction models on GTEx v8:

  • Elastic Net: Available here
  • MASHR-based: available here both including sQTL and eQTL models, and S-MultiXcan results.

The Elastic Net models can be used just like previous GTEx v6 and v7 from PredictDB. The MASHR-based models are biologically informed and perform better, but demand some GWAS preprocessing. See here for a conceptual overview of best practices, and here for a tutorial.

Containerization example

Kevin Kunzmann published a PrediXcan analysis in a containerized environment here.

Software Release Notes

0.6.2 - 0.6.11

Bug fixing.

Bug fixing. Added new command lien argument to filter which models to use when running SMultiXcan.sh

This version includes the first release of MulTiXcan and S-MulTiXcan methods. It adds a minimalistic implem entation of PrediXcan that supports naive adjusting for covariates.

  • Added --gwas_file option to input a single GWAS file.
  • Addressed a plethora of edge cases in malformed input GWAS files.
  • MetaXcanUI.py is no longer supported, and left for reference purposes.

This version is a major overhaul of the main MetaXcan analysis tools. The GWAS parsing engine was repurposed using pandas. MetaXcan calculation was optimized. We observed runtime decreases between 30% and 60%. Many command line argument changes.

  • "--weight_db_path" changed to "--model_db_path" in MetaXcan.py, M03_betas.py, M04_zscores.py, MetaMany.py
  • MetaXcan no longer writes intermediate statistics, and now works entirely in memory. This means that the "--beta_folder" in MetaXcan.py is no longer available. If you need these stats, they are still available from M03_betas.py.
  • "--compressed_gwas" argument was dropped. Now gzip compression or flat file status is inferred from the file name. i.e. files ending with common gzip extensions will be assumed to be compressed.
  • "--scheme", "--zscore_scheme", "--normalization_scheme", "--selected_dosage_folder" arguments were dropped.
  • "--overwrite" optional argument was added. If set at the command line parameters, the results file will be overwritten if it exists.
  • "--beta_zscore_column" renamed to "--zscore_column"
  • Pandas module is now a dependency. Please install it if it is not yet part of your environment.
  • Results file header is now all lower case.
  • Updated MetaMany.py
  • Command line parameter changes
  • Refactor to support new features
  • Fixed bug fore --zscore_column parameter, where signs wouldn't be flipped when GWAS and transcriptome model didn't agree on effect allele.
  • Support for new PredictDB database format. (more info on this soon)
  • New output parameters: prediction performance pvalue, prediction performance qvalue, MetaXcan association effect size.
  • A swarm of tiny usability improvements across all scripts.