analysis/pippin.Rmd

---
title: "GObayesC Report"
output:
  workflowr::wflow_html:
    toc: true
    latex_engine: "xelatex"
    code_folding: "hide"
editor_options:
  chunk_output_type: console
---

```{r 0-setup, include=FALSE, warning=FALSE}
#regular
if(1){
  library(dplyr)
  library(data.table)
  library(ggplot2)
  library(cowplot)
  library(qqman)
  library(viridis)
  library(scales)
  library(tidyverse)
  library(ggcorrplot)
  library(melt)
  library(reshape2)
  library(knitr)
  library(kableExtra)
  
  #options
  options(bitmapType = "cairo")
  options(error = function() traceback(3))
  
  #seed
  set.seed(123)
  
  #ggplot holder list
  gg <- vector(mode='list', length=12)
  
}
```

``` {r prepload}

load('snake/data/topTables.Rdata')

options(knitr.kable.NA = '')


```


## Summary

We wanted to ensure that the genes of interest found multiple times across GO terms were significant in our models by looking at their posterior inclusion probability in each subset. The cutoff for associated genes was set to 0.5 while the cutoff for non-associated genes was set to 0.7. We then tallied the filtered genes as subsets of their individual distribution and then altogether to find genes that appeared scarcely in both. 

Our findings here are mostly consistent with initial findings. Notably, the Adipokinetic hormone Receptor(AkhR) is a top hit while the hormone itself(Akh) is not found in either sex. 


### Female Genes

``` {r femaleTop, warnings=FALSE}

kable(fGO_top, caption= 'GO genes', 'simple') %>%
  kable_styling(full_width = FALSE, position = "float_left")

kable(fNON_top, caption= 'non-GO genes', 'simple') %>%
  kable_styling(full_width = FALSE, position = "float_right")

kable(fALL_topGenes, caption = 'Top Female Genes', "simple")

```

### Male Genes

``` {r maleTop, warnings=FALSE}

kable(mGO_top, caption= 'GO genes',  'simple') %>%
  kable_styling(full_width = FALSE, position = "float_left")

kable(mNON_top, caption= 'non-GO genes', 'simple') %>%
  kable_styling(full_width = FALSE, position = "float_right")

kable(mALL_topGenes, caption = 'Top Female Genes', "simple")

```


## Posterior Inclusion Of Probability Plots

Below are the PIP plots from the top GO terms separated by sex. These were made on the full model( all 198 lines, no CV) rather than cross validation for prediction. The left column contains PIP plots for the distribution associated with the GO-related genes, while the right column contains PIP plot for all other genes. Each row is a GO term. 

### Female Plots

```{r listReadF}
sex <- 'f'
plotList <- list.files(path=paste0('snake/data/go/26_pip/sex', sex), full.names = TRUE)

ggF <- lapply(plotList, readRDS)

for(i in 1:(length(ggF)/2)){
  b <- 2*i
  a <- b - 1
  print(plot_grid(ggF[[a]], ggF[[b]], ncol=2))
}

```

### Male Plots

```{r listReadM}
sex <- 'm'
plotList <- list.files(path=paste0('snake/data/go/26_pip/sex', sex), full.names = TRUE)

ggM <- lapply(plotList, readRDS)

for(i in 1:(length(ggM)/2)){
  b <- 2*i
  a <- b - 1
  print(plot_grid(ggM[[a]], ggM[[b]], ncol=2))
}

```