## Toy example of gene configuration overlapped with CNV

### Convergence of genes and CNV
Suppose there are three adjacent genes, gene1, gene2 and gene3, on a genomic region. Only gene1 is causal gene. Simulated CNVs overlap with none, soly one, two adjacent or all the three genes.

For one specific sample, each gene is either overlapped with one CNV or none, denoted as {0,1}. Whether the three genes overlap with CNV are denoted as one of the seven following patterns: {0,0,0}, {1,0,0}, {0,1,0}, {0,0,1}, {1,1,0}, {0,1,1}, {1,1,1}.

### Cases and Controls assignment
#### 3 adjacent genes
The fraction for each pattern is uncertain. Here I suppose {0,0,0} accounts for 0.5, the other six patterns account for the other 0.5 evenly. For example, there are 600 samples for {0,0,0}, and 100 samples for the other six respectively. Half of all simulated samples are cases and the other are controls. Patterns {0,0,0}, {0,1,0}, {0,0,1} and {0,1,1} tend to be controls, while the other three, {1,0,0}, {1,1,0} and {1,1,1}, tend to be cases, because they cover the causal gene.

To determine the category, I simply simulated a normal random variable for each sample, and add a constant for each of those three patterns, respectively, which tend to be cases. To be specific, 1.0 ($1*\sigma$) is added to {1,0,0}, 0.66 is added to {1,1,0}, 0.33 is added to {1,1,1}. These constants can be adjusted. Since {1,0,1} are uncommon, and can be divided into {1,0,0} and {0,1,1}, I just ignored it.

| Number | Pattern | Constant added | # in Cases | # in Ctrls |
|:-----:|:-----:|:-----:|:-----:|:-----:|
| 600 | {0,0,0} | | 238 | 362 |
| 105 | {1,0,0} | 1.0 | 83 | 22 |
| 100 | {0,1,0} | | 41 | 59 |
| 105 | {0,0,1} | | 44 | 61 |
| 100 | {1,1,0} | 1.0 | 78 | 22 |
| 100 | {0,1,1} | | 38 | 62 |
| 100 | {1,1,1} | 1.0 | 83 | 17 |
#### 4 adjacent genes
There are 11 patterns, {1,0,0,0}, {1,1,0,0}, {1,1,1,1} etc. First gene is causal.

Extreme case

| Number | Pattern | Constant added | # in Cases | # in Ctrls |
|:-----:|:-----:|:-----:|:-----:|:-----:|
| 1000 | {0,0,0,0} | | 405 | 595 |
| 500 | {1,1,0,0} | 1.0 | 392 | 108 |
| 500 | {0,0,1,1} | | 203 | 297 |
| 0 | Others | |  |  |

Normal case

| Number | Pattern | Constant added | # in Cases | # in Ctrls |
|:-----:|:-----:|:-----:|:-----:|:-----:|
| 200 | {0,0,0,0} | | 71 | 129 |
| 150 | {1,1,1,1} | 1.0 | 106 | 44 |
| 100 | {1,0,0,0} | 1.0 | 62 | 38 |
| 20 | {0,1,0,0} | | 5 | 15 |
| 20 | {0,0,1,0} | | 7 | 13 |
| 100 | {0,0,0,1} | | 34 | 66 |
| 150 | {1,1,0,0} | 1.0 | 104 | 46 |
| 30 | {0,1,1,0} | | 7 | 23 |
| 150 | {0,0,1,1} | | 50 | 100 |
| 200 | {1,1,1,0} | 1.0 | 146 | 54 |
| 200 | {0,1,1,1} | | 68 | 132 |

#### 15 adjacent genes. CNVs overlap with maximumly 10 genes.
There are 25 patterns. 8th gene is causal.

| Number | Pattern | Constant added | # in Cases | # in Ctrls |
|:-----:|:-----:|:-----:|:-----:|:-----:|
| 50 | {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0} | |  |  |
| 50 | {1,0,0,0,0,0,0,0,0,0,0,0,0,0,0} | |  |  |
| 50 | {1,1,0,0,0,0,0,0,0,0,0,0,0,0,0} | |  |  |
| 50 | {1,1,1,0,0,0,0,0,0,0,0,0,0,0,0} | |  |  |
| 50 | {1,1,1,1,0,0,0,0,0,0,0,0,0,0,0} | |  |  |
| 50 | {1,1,1,1,1,0,0,0,0,0,0,0,0,0,0} | |  |  |
| 50 | {1,1,1,1,1,1,0,0,0,0,0,0,0,0,0} | |  |  |
| 50 | {1,1,1,1,1,1,1,0,0,0,0,0,0,0,0} | |  |  |
| 50 | {1,1,1,1,1,1,1,1,0,0,0,0,0,0,0} | 1.0| 33 | 17 |
| 50 | {1,1,1,1,1,1,1,1,1,0,0,0,0,0,0} | 1.0| 33 | 17 |
| 50 | {1,1,1,1,1,1,1,1,1,1,0,0,0,0,0} | 1.0| 34 | 16 |
| 50 | {0,1,1,1,1,1,1,1,1,1,1,0,0,0,0} | 1.0| 33 | 17 |
| 50 | {0,0,1,1,1,1,1,1,1,1,1,1,0,0,0} | 1.0| 30 | 20 |
| 50 | {0,0,0,1,1,1,1,1,1,1,1,1,1,0,0} | 1.0| 24 | 26 |
| 50 | {0,0,0,0,1,1,1,1,1,1,1,1,1,1,0} | 1.0| 34 | 16 |
| 50 | {0,0,0,0,0,1,1,1,1,1,1,1,1,1,1} | 1.0| 37 | 13 |
| 50 | {0,0,0,0,0,0,1,1,1,1,1,1,1,1,1} | 1.0| 39 | 11 |
| 50 | {0,0,0,0,0,0,0,1,1,1,1,1,1,1,1} | 1.0| 40 | 10 |
| 50 | {0,0,0,0,0,0,0,0,1,1,1,1,1,1,1} | |  |  |
| 50 | {0,0,0,0,0,0,0,0,0,1,1,1,1,1,1} | |  |  |
| 50 | {0,0,0,0,0,0,0,0,0,0,1,1,1,1,1} | |  |  |
| 50 | {0,0,0,0,0,0,0,0,0,0,0,1,1,1,1} | |  |  |
| 50 | {0,0,0,0,0,0,0,0,0,0,0,0,1,1,1} | |  |  |
| 50 | {0,0,0,0,0,0,0,0,0,0,0,0,0,1,1} | |  |  |
| 50 | {0,0,0,0,0,0,0,0,0,0,0,0,0,0,1} | |  |  |



### Methods
1) Use R package varbvs to calculate the posterior inclusion probability (PIP) for each gene.

2) Use DAP to calculate PIP for each gene

3) Calculate odds ratio and p-value of Fisher's exact test for each gene.

### Results

#### 4 genes, extreme case, sample size 2000, constant 1.0.
1) varbvs

    variable	prob	PVE	coef	Pr(coef.>0.95)
    gene1	1.000000000	NA	1.20021579	[+1.007,+1.402]
    gene4	0.005535446	NA	-0.02558938	[-0.224,+0.168]
    gene3	0.005535398	NA	-0.02558722	[-0.231,+0.173]
    gene2	0.005522906	NA	0.01360461	[-0.208,+0.220]

2) DAP, consumed 0.64 second

    Posterior inclusion probability

    1 chr6.100001   5.48922e-01     26.142
    2 chr6.100002   5.48922e-01     26.142
3) OR 
Odds ratio for 4 genes:
    
    gene1 5.3250487329434701
    gene2 5.3250487329434701
    gene3 0.60288793413009223
    gene4 0.60288793413009223

#### 4 genes, sample size 1320, constant 1.0
1) varbvs

    variable	prob	PVE	coef	Pr(coef.>0.95)
    gene1	1.00000000	NA	0.8859984	[+0.674,+1.094]
    gene2	0.12003916	NA	-0.2736790	[-0.484,-0.057]
    gene3	0.09093133	NA	-0.2622626	[-0.474,-0.047]
    gene4	0.02197247	NA	-0.1788246	[-0.384,+0.059]

2) DAP

    Posterior inclusion probability

    1 chr6.100001   1.00000e+00     12.570
    2 chr6.100002   3.72234e-01     -1.489
    3 chr6.100003   1.37045e-01     -0.561

3) OR Odds ratio for 4 genes:
    
    gene1 4.5364635364635362
    gene2 2.144790718835305
    gene3 1.1176051318602993
    gene4 0.5967530767216549

#### 4 genes, sample size 1320, constant 0, constant 1.0
1) varbvs

    variable	prob	PVE	coef	Pr(coef.>0.95)
    gene2	0.002438732	NA	-0.08537077	[-0.284,+0.130]
    gene1	0.002415618	NA	-0.08443071	[-0.317,+0.131]
    gene3	0.002001100	NA	0.04889482	[-0.164,+0.262]
    gene4	0.001847391	NA	0.02403800	[-0.190,+0.221]

    
2) DAP, time consumed 0.33s

    Posterior inclusion probability

    1 chr6.100002   9.24884e-03     -1.558
    2 chr6.100001   9.12697e-03     -1.561
    3 chr6.100003   7.58374e-03     -1.646
    4 chr6.100004   6.96667e-03     -1.681

3) OR
    
    gene1 0.9179898641152402
    gene2 0.91716273398838122
    gene3 1.050646996191883
    gene4 1.0247459787784485

#### 3 genes, sample size 1210, constant 1.0
1) varbvs

    variable	prob	   PVE	coef	   Pr(coef.>0.95)
	gene1	1.00000000	NA	 1.3036892	[+1.028,+1.561]
	gene2	0.04018127	NA	-0.2375393	[-0.476,+0.010]
	gene3	0.02796016	NA	-0.2240704	[-0.488,+0.038]

2) DAP

    Posterior inclusion probability

    1 chr6.100001   1.00000e+00     15.761
    2 chr6.100002   6.98946e-02     -0.817
    3 chr6.100003   3.97663e-02     -1.567

3) OR Odds ratio for 3 genes:

    gene1    3.763157894736842
    gene2    1.2527202527202528
    gene3    0.91492378589152779

#### 10 genes, sample size 1250, constant 0 (randomly assign to cases and controls)
1) varbvs

    variable	   prob	PVE	coef    	Pr(coef.>0.95)
    pos7	0.004022293	NA	0.16992886	[-0.070,+0.398]
    pos6	0.003431462	NA	0.15675717	[-0.086,+0.388]
    pos8	0.002576724	NA	0.13018907	[-0.080,+0.345]

2) DAP

    Posterior inclusion probability

    1 chr6.100007   4.64897e-03     -1.183
    2 chr6.100006   3.94239e-03     -1.256
    3 chr6.100008   2.94352e-03     -1.383

3) OR

    'pos1': 1.0833111170354572,
    'pos10': 0.93547787665434723,
    'pos11': 0.93547787665434723,
    'pos12': 1.0134258463653549,
    'pos13': 1.0833111170354572,
    'pos14': 1.0689571472704003,
    'pos15': 1.0833111170354572,
    'pos2': 1.0689571472704003,
    'pos3': 1.0134258463653549,
    'pos4': 0.97367756064690025,
    'pos5': 0.89875878267560294,
    'pos6': 0.87506996082193966,
    'pos7': 0.88683686402317585,
    'pos8': 0.93547787665434723,
    'pos9': 0.92307692307692313
    
#### 10 genes, sample size 1250, constant 1.0. The 8th gene is causal.
1) varbvs

    variable	prob   	PVE	   coef   	Pr(coef.>0.95)
    pos8 	1.000000000	NA	1.4260558	[+1.190,+1.669]
    pos14	0.031398993	NA	0.2259943	[+0.022,+0.443]
    pos15	0.024979621	NA	0.2115926	[-0.011,+0.440]

2) DAP

    Posterior inclusion probability

    1 chr6.100008   9.97239e-01     31.553
    2 chr6.100014   2.25464e-02     -0.851
    3 chr6.100015   1.98889e-02     -1.668

3) OR

    'pos6': 2.2143140364789851
    'pos7': 3.062323517547398
    'pos8': 4.2358222355229689
    'pos9': 3.3401850825195103
    'pos10': 2.5102106276061158
    'pos11': 1.9034140736036771

#### PIP for different constants
| Constant | varbvs PIP | DAP PIP | gene | 
|:-----:|:-----:|:-----:|:-----:|
| 0.0 | 0.00299 | 0.00347 | random, but same |
| 0.2 | 0.02985 | 0.04073 | 8 |
| 0.5 | 0.99999 | 0.99835 | 8 |
| 1.0 | 1.0 | 0.99960 | 8 |

### Issue
#### Constant added might be too large
#### Size for each pattern might not accurate