/
pippin.Rmd
126 lines (82 loc) · 2.98 KB
/
pippin.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
---
title: "GObayesC Report"
output:
workflowr::wflow_html:
toc: true
latex_engine: "xelatex"
code_folding: "hide"
editor_options:
chunk_output_type: console
---
```{r 0-setup, include=FALSE, warning=FALSE}
#regular
if(1){
library(dplyr)
library(data.table)
library(ggplot2)
library(cowplot)
library(qqman)
library(viridis)
library(scales)
library(tidyverse)
library(ggcorrplot)
library(melt)
library(reshape2)
library(knitr)
library(kableExtra)
#options
options(bitmapType = "cairo")
options(error = function() traceback(3))
#seed
set.seed(123)
#ggplot holder list
gg <- vector(mode='list', length=12)
}
```
``` {r prepload}
load('snake/data/topTables.Rdata')
options(knitr.kable.NA = '')
```
## Summary
We wanted to ensure that the genes of interest found multiple times across GO terms were significant in our models by looking at their posterior inclusion probability in each subset. The cutoff for associated genes was set to 0.5 while the cutoff for non-associated genes was set to 0.7. We then tallied the filtered genes as subsets of their individual distribution and then altogether to find genes that appeared scarcely in both.
Our findings here are mostly consistent with initial findings. Notably, the Adipokinetic hormone Receptor(AkhR) is a top hit while the hormone itself(Akh) is not found in either sex.
### Female Genes
``` {r femaleTop, warnings=FALSE}
kable(fGO_top, caption= 'GO genes', 'simple') %>%
kable_styling(full_width = FALSE, position = "float_left")
kable(fNON_top, caption= 'non-GO genes', 'simple') %>%
kable_styling(full_width = FALSE, position = "float_right")
kable(fALL_topGenes, caption = 'Top Female Genes', "simple")
```
### Male Genes
``` {r maleTop, warnings=FALSE}
kable(mGO_top, caption= 'GO genes', 'simple') %>%
kable_styling(full_width = FALSE, position = "float_left")
kable(mNON_top, caption= 'non-GO genes', 'simple') %>%
kable_styling(full_width = FALSE, position = "float_right")
kable(mALL_topGenes, caption = 'Top Female Genes', "simple")
```
## Posterior Inclusion Of Probability Plots
Below are the PIP plots from the top GO terms separated by sex. These were made on the full model( all 198 lines, no CV) rather than cross validation for prediction. The left column contains PIP plots for the distribution associated with the GO-related genes, while the right column contains PIP plot for all other genes. Each row is a GO term.
### Female Plots
```{r listReadF}
sex <- 'f'
plotList <- list.files(path=paste0('snake/data/go/26_pip/sex', sex), full.names = TRUE)
ggF <- lapply(plotList, readRDS)
for(i in 1:(length(ggF)/2)){
b <- 2*i
a <- b - 1
print(plot_grid(ggF[[a]], ggF[[b]], ncol=2))
}
```
### Male Plots
```{r listReadM}
sex <- 'm'
plotList <- list.files(path=paste0('snake/data/go/26_pip/sex', sex), full.names = TRUE)
ggM <- lapply(plotList, readRDS)
for(i in 1:(length(ggM)/2)){
b <- 2*i
a <- b - 1
print(plot_grid(ggM[[a]], ggM[[b]], ncol=2))
}
```