## Toy example of gene configuration overlapped with CNV

### Convergence of genes and CNV
Suppose there are three adjacent genes, gene1, gene2 and gene3, on a genomic region. Only gene1 is causal gene. Simulated CNVs overlap with none, soly one, two adjacent or all the three genes.

For one specific sample, each gene is either overlapped with one CNV or none, denoted as {0,1}. Whether the three genes overlap with CNV are denoted as one of the seven following scenarios: {0,0,0}, {1,0,0}, {0,1,0}, {0,0,1}, {1,1,0}, {0,1,1}, {1,1,1}.

### Cases and Controls assignment
#### 3 adjacent genes
The fraction for each scenario is uncertain. Here I suppose {0,0,0} accounts for 0.5, the other six scenarios account for the other 0.5 evenly. For example, there are 600 samples for {0,0,0}, and 100 samples for the other six respectively. Half of all simulated samples are cases and the other are controls. Scenarios {0,0,0}, {0,1,0}, {0,0,1} and {0,1,1} tend to be controls, while the other three, {1,0,0}, {1,1,0} and {1,1,1}, tend to be cases, because they cover the causal gene.

To determine the category, I simply simulated a normal random variable for each sample, and add a constant for each of those three scenarios, respectively, which tend to be cases. To be specific, 1.0 ($1*\sigma$) is added to {1,0,0}, 0.66 is added to {1,1,0}, 0.33 is added to {1,1,1}. These constants can be adjusted. Since {1,0,1} are uncommon, and can be divided into {1,0,0} and {0,1,1}, I just ignored it.

| Number | Scenario | Constant added | 
|:-----:|:-----:|:-----:|
| 600 | {0,0,0} | |
| 100 | {1,0,0} | 1.0 or 0.66 |
| 100 | {0,1,0} | |
| 100 | {0,0,1} | |
| 100 | {1,1,0} | 0.66 or 0.44 |
| 100 | {0,1,1} | |
| 100 | {1,1,1} | 0.33 or 0.22 |
#### 4 adjacent genes
There are 16 scenarios, {1,0,0,0}, {1,1,0,0}, {1,1,1,1} etc. First gene is causal.

Extreme case

| Number | Scenario | Constant added | 
|:-----:|:-----:|:-----:|
| 1000 | {0,0,0,0} | |
| 500 | {1,1,0,0} | 0.75 |
| 500 | {0,0,1,1} | |
| 0 | Others | |

Normal case

| Number | Scenario | Constant added | 
|:-----:|:-----:|:-----:|
| 200 | {0,0,0,0} | |
| 150 | {1,1,1,1} | 0.25 |
| 100 | {1,0,0,0} | 1.0 |
| 20 | {0,1,0,0} | |
| 20 | {0,0,1,0} | |
| 100 | {0,0,0,1} | |
| 150 | {1,1,0,0} | 0.75 |
| 30 | {0,1,1,0} | |
| 150 | {0,0,1,1} | |
| 10 | {1,0,1,0} | 1.0 |
| 10 | {0,1,0,1} | |
| 10 | {1,0,0,1} | 1.0 |
| 200 | {1,1,1,0} | 0.50 |
| 200 | {0,1,1,1} | |
| 10 | {1,1,0,1} | 1.0 |
| 10 | {1,0,1,1} | 1.0 |


### Methods
1) Use R package varbvs to calculate the posterior inclusion probability (PIP) for each gene.

2) Use DAP to calculate PIP for each gene

3) Calculate odds ratio and p-value of Fisher's exact test for each gene.

### Results

#### 4 genes, extreme case, sample size 2000, consumed 0.64 second.
1) varbvs

    variable	prob	PVE	coef	Pr(coef.>0.95)
    gene1	1.000000000	NA	1.20021579	[+1.007,+1.402]
    gene4	0.005535446	NA	-0.02558938	[-0.224,+0.168]
    gene3	0.005535398	NA	-0.02558722	[-0.231,+0.173]
    gene2	0.005522906	NA	0.01360461	[-0.208,+0.220]

2) DAP

    Posterior inclusion probability

    1 chr6.100001   5.48922e-01     26.142
    2 chr6.100002   5.48922e-01     26.142
3) OR 
Odds ratio for 4 genes:

    gene1 3.3693563248650782
    gene2 3.3693563248650782
    gene3 0.66511562323745066
    gene4 0.66511562323745066

#### 4 genes, sample size 1370
1) varbvs

    variable	prob	PVE	coef	Pr(coef.>0.95)
    gene1	1.00000000	NA	0.8859984	[+0.674,+1.094]
    gene2	0.12003916	NA	-0.2736790	[-0.484,-0.057]
    gene3	0.09093133	NA	-0.2622626	[-0.474,-0.047]
    gene4	0.02197247	NA	-0.1788246	[-0.384,+0.059]

2) DAP

    Posterior inclusion probability

    1 chr6.100001   1.00000e+00     12.570
    2 chr6.100002   3.72234e-01     -1.489
    3 chr6.100003   1.37045e-01     -0.561

3) OR and pvalue for Fisher's test
Odds ratio for 4 genes:

    gene1 2.4161986980694548
    gene2 1.1126891862250494
    gene3 0.77924589107315501
    gene4 0.60945878072949899

$-\text{log}_{10}$(p-value) for 4 genes:

    gene1 5.3922160065576534
    gene2 0.45027764339334964
    gene3 1.5928938699383275
    gene4 4.9480343315517779

#### 3 genes, sample size 1210
1) varbvs

    variable	prob	   PVE	coef	   Pr(coef.>0.95)
	gene1	1.00000000	NA	 1.3036892	[+1.028,+1.561]
	gene2	0.04018127	NA	-0.2375393	[-0.476,+0.010]
	gene3	0.02796016	NA	-0.2240704	[-0.488,+0.038]

2) DAP

    Posterior inclusion probability

    1 chr6.100001   1.00000e+00     15.761
    2 chr6.100002   6.98946e-02     -0.817
    3 chr6.100003   3.97663e-02     -1.567

3) OR and pvalue for Fisher's test
Odds ratio for 3 genes:

    gene1    3.763157894736842
    gene2    1.2527202527202528
    gene3    0.91492378589152779
 
$-\text{log}_{10}$(p-value) for 3 genes:

    gene1    5.4061609811311113
    gene2    1.1209781660647247
    gene3    0.2607852230581864

### Issue
#### Constant added might be too large
#### Size for each scenario might not accurate