# PIP calibration
- A well calibrated method should produce points near the `y = x` line.
- dots (>0, 0) means the true effects for all genes in that bin are 0 (not signal), but at least one PIP is greater than 0, false positive.
- dots (>0, 1) means missed at least one gene with effect, false negative.
- Missing points means there is no gene with PIP in that bin/range.

In [1]:
library(ggplot2)
library(cowplot)
library(dplyr)


********************************************************

Note: As of version 1.0.0, cowplot does not change the

  default ggplot2 theme anymore. To recover the previous

  behavior, execute:
  theme_set(theme_cowplot())

********************************************************



Attaching package: ‘dplyr’


The following objects are masked from ‘package:stats’:

    filter, lag


The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union




## Use blocks with at least one effect
- 985 genes
- 88 blocks

In [2]:
dat = readRDS("/home/min/GIT/cnv-gene-mapping/data/deletion_simu_30_shape0.777_scale0.843/PIP_calib_block_with_effect.rds")

In [3]:
head(dat)

Unnamed: 0_level_0,logit,logit1,susie,pymc3_new,is_signal
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
0,0.125,0.015036,0,0.0174,0
1,0.125,0.015036,0,0.0139,0
2,0.125,0.015036,0,0.0174,0
3,0.125,0.015036,0,0.0168,0
4,0.125,0.015036,0,0.019,1
5,0.125,0.015036,0,0.0184,0


In [4]:
bin_size = 10
bins = cbind(seq(1:bin_size)/bin_size-1/bin_size, seq(1:bin_size)/bin_size)

In [5]:
bins

0,1
0.0,0.1
0.1,0.2
0.2,0.3
0.3,0.4
0.4,0.5
0.5,0.6
0.6,0.7
0.7,0.8
0.8,0.9
0.9,1.0


In [6]:
calc_pip = function(data) {
    pip_cali = list()
    for (name in rev(colnames(data))[-1]) {
        for (i in 1:nrow(bins)) {
            tmp = data[which(data[[name]] >= bins[i,1] & data[[name]] < bins[i,2]), ]
            pip_cali[[name]] = rbind(pip_cali[[name]], c(sum(tmp[[name]]), sum(tmp$is_signal), length(tmp$is_signal)))
        }
        #pip_cali[[name]][which(is.na(pip_cali[[name]]))] = 0 
    }
    return(pip_cali)
}

In [6]:
pip_cali = calc_pip(dat)

In [7]:
get_cali = function(alist, col) {
    res = alist[[col]]
    if (!is.null(alist[[col]])) res = res + alist[[col]]
    res[,c(1,2)] = res[,c(1,2)] / res[,3]
    return(res)
}

In [15]:
res = list("susie" = get_cali(pip_cali, 'susie'),
             "logit" = get_cali(pip_cali, 'logit'),
             "logit_regional" = get_cali(pip_cali, 'logit1'),
             "pymc3" = get_cali(pip_cali, 'pymc3_new'))

In [14]:
dot_plot = function(dataframe) {
  ggplot(dataframe, aes(x=mean_pip, y=observed_freq)) +
    geom_errorbar(aes(ymin=observed_freq-se, ymax=observed_freq+se), colour="gray", size = 0.2, width=.01) +
    geom_point(size=1.5, shape=21, fill="#002b36") + # 21 is filled circle
    xlab("Mean PIP") +
    ylab("Observed frequency") +
    coord_cartesian(ylim=c(0,1), xlim=c(0,1)) +
    geom_abline(slope=1,intercept=0,colour='red', size=0.2) +
    ggtitle(name) +
    expand_limits(y=0) + 
    theme_cowplot()
}

In [18]:
for (name in names(res)) {
    res[[name]][,3] = sqrt(res[[name]][,2] * (1 - res[[name]][,2]) / res[[name]][,3]) * 2
    res[[name]] = as.data.frame(res[[name]])
    colnames(res[[name]]) = c("mean_pip", "observed_freq", "se")
    pdf(paste0("/home/min/GIT/cnv-gene-mapping/data/deletion_simu_30_shape0.777_scale0.843/", name, '_' , 'effect.pdf'), width=3, height=3, pointsize=16)
    print(dot_plot(res[[name]]))
    dev.off()
}

“Removed 4 rows containing missing values (geom_point).”
“Removed 5 rows containing missing values (geom_point).”
“Removed 5 rows containing missing values (geom_point).”
“Removed 3 rows containing missing values (geom_point).”


## All blocks
- 2290 genes
- 528 blocks

In [8]:
dat1 = readRDS("/home/min/GIT/cnv-gene-mapping/data/deletion_simu_30_shape0.777_scale0.843/PIP_calib_all_block.rds")

In [9]:
head(dat1)

Unnamed: 0_level_0,varbvs_pip,susie_pip,logit_pip3,logit_pip2,logit_pip,pymc3,is_signal
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
0,0.05422173,0,0.01503649,0.01503649,0.125,0.0174,0
1,0.05421956,0,0.01503649,0.01503649,0.125,0.0139,0
2,0.05421748,0,0.01503649,0.01503649,0.125,0.0174,0
3,0.05421555,0,0.01503649,0.01503649,0.125,0.0168,0
4,0.05421383,0,0.01503649,0.01503649,0.125,0.019,1
5,0.05421235,0,0.01503649,0.01503649,0.125,0.0184,0


In [10]:
pip_cali_1 = calc_pip(dat1)

In [11]:
names(pip_cali_1)

In [12]:
res1 = list("susie" = get_cali(pip_cali_1, 'susie_pip'),
             "logit" = get_cali(pip_cali_1, 'logit_pip'),
             "logit_regional" = get_cali(pip_cali_1, 'logit_pip2'),
             "pymc3" = get_cali(pip_cali_1, 'pymc3'))

In [15]:
for (name in names(res1)) {
    res1[[name]][,3] = sqrt(res1[[name]][,2] * (1 - res1[[name]][,2]) / res1[[name]][,3]) * 2
    res1[[name]] = as.data.frame(res1[[name]])
    colnames(res1[[name]]) = c("mean_pip", "observed_freq", "se")
    pdf(paste0("/home/min/GIT/cnv-gene-mapping/data/deletion_simu_30_shape0.777_scale0.843/", name, '_' , 'all_blocks_10bins.pdf'), width=3, height=3, pointsize=16)
    print(dot_plot(res1[[name]]))
    dev.off()
}

“Removed 1 rows containing missing values (geom_point).”
