Skip to content
psoerensen edited this page Feb 20, 2016 · 2 revisions

Genomic Feature Linear Mixed Model Analyses

The qgg package provides a range of genomic feature linear mixed modeling approaches for predicting quantitative trait phenotypes from high resolution genomic polymorphism data. Genomic features are regions on the genome that are hypothesized to be enriched for causal variants affecting the trait. Several genomic feature classes can be formed based on previous studies and different sources of information including genes, chromosomes, biological pathways, gene ontologies, sequence annotation, prior QTL regions, or other types of external evidence. Using prior information on genomic features is important because prediction is difficult for populations of unrelated individuals when the number of causal variants is low relative to the total number of polymorphisms, and causal variants individually have small effects on the traits. The models were implemented using likelihood or Bayesian methods.

Genomic feature best linear unbiased prediction (GFBLUP) models can be fitted. We have demonstrated that the GFBLUP model using prior information on Gene Ontology categories can increase prediction accuracy (compared to existing models) for three quantitative traits in the unrelated, sequenced inbred lines of the Drosophila melanogaster Genetic Reference Panel. These results were supported by simulation studies further illustrating the impact of trait- and genomic feature-specific factors on prediction accuracy. We have extended these models to include multiple features and multiple traits and are currently applying these to a range of quantitative traits in dairy cattle and pigs, and disease traits in human. Different genetic models (e.g. additive, dominance, gene by gene and gene by environment interactions) can be used. Further extensions include a weighted GFBLUP model using differential weighting of the individual genetic marker relationships.

Bayesian multiple trait and multiple genomic feature models can be fitted. The models are implemented using an empirical Bayesian method that handles multiple features and multiple traits. The models were implemented using spectral decomposition that plays an important computational role in the Markov chain Monte Carlo strategy. This is a very flexible and formal statistical framework for using prior information to decompose genomic (co)variances and predict trait phenotypes. We applied the model to reveal a strong genetic control of environmental variation of quantitative traits of Drosophila melanogaster using whole-genome sequence data. Furthermore we are currently extending these models to binary outcomes in relation to human diseases and to evaluate different strategies for optimal use of prior information.

We have developed GBLUP model derived SNP set tests. The premise of the genomic feature models presented above is that genomic features are enriched for causal variants affecting the traits. However, in reality, the number, location and effect sizes of the true causal variants in the genomic feature are unknown. Therefore we have developed and evaluated a number of SNP set tests derived from a standard Genomic BLUP model. Despite the GBLUP model being considered as a “black box modelling approach” we have shown, using simulations, that it is possible to derive powerful SNP set tests for identifying genomic features enriched for causal variants. These approaches are computationally very fast allowing us to rapidly analyze different layers of genomic feature classes to discover genomic features potentially enriched for causal variants. Results from these analyses are built into the above mentioned prediction models. We have applied these tests to genotypes and quantitative traits obtained from dairy cattle, pigs, Drosophila melanogaster, and disease traits in human. We are currently extending the Genomic BLUP derived set tests for multiple and longitudinal traits.

Clone this wiki locally