## Toy example of gene configuration overlapped with CNV

#### Convergence of genes and CNV
Suppose there are three adjacent genes, gene1, gene2 and gene3, on a genomic region. Only gene1 is causal gene. Simulated CNVs overlap with none, soly one, two adjacent or all the three genes.

For one specific sample, each gene is either overlapped with one CNV or none, denoted as {0,1}. Whether the three genes overlap with CNV are denoted as one of the seven following scenarios: {0,0,0}, {1,0,0}, {0,1,0}, {0,0,1}, {1,1,0}, {0,1,1}, {1,1,1}.

#### Cases and Controls assignment

The fraction for each scenario is uncertain. Here I suppose {0,0,0} accounts for 0.5, the other six scenarios account for the other 0.5 evenly. For example, there are 600 samples for {0,0,0}, and 100 samples for the other six respectively. Half of all simulated samples are cases and the other are controls. Scenarios {0,0,0}, {0,1,0}, {0,0,1} and {0,1,1} tend to be controls, while the other three, {1,0,0}, {1,1,0} and {1,1,1}, tend to be cases, because they cover the causal gene. To determine the category, I simply simulated a normal random variable for each sample, and add a constant for each of those three scenarios, respectively, which tend to be cases. To be specific, 1.0/0.66 is added to {1,0,0}, 0.66/0.44 is added to {1,1,0}, 0.33/0.22 is added to {1,1,1}. These constants can be adjusted.

| Number | Scenario | Constant added | 
|:-----:|:-----:|:-----:|
| 600 | {0,0,0} | |
| 100 | {1,0,0} | 1.0 or 0.66 |
| 100 | {0,1,0} | |
| 100 | {0,0,1} | |
| 100 | {1,1,0} | 0.66 or 0.44 |
| 100 | {0,1,1} | |
| 100 | {1,1,1} | 0.33 or 0.22 |

#### Methods
1) Use R package varbvs to calculate the posterior inclusion probability (pip) for each gene.

2) Calculate odds ratio and p-value of Fisher's exact test for each gene.

#### Results
1) Sample size: 1200,  constant added: $\textbf{0.66, 0.44 and 0.22}$

PIP for three genes:

    variable	prob	    PVE	coef	     Pr(coef.>0.95)
    gene1	0.999999816	NA 	0.8815534	[+0.620,+1.157]
    gene2	0.012486887	NA	-0.1412246	[-0.394,+0.098]
    gene3	0.009969412	NA	-0.1118736	[-0.377,+0.150]
The pip for gene1 (causal) is close to 1 and pip for the other two genes are close to 0.

Odds ratio for three genes: odds ratio for gene1 is most largest.

    gene1    2.4518160045430979
    gene2    1.1974117130949771
    gene3    0.98237911337572137

$-\text{log}_{10}$(p-value) for three genes:

    gene1    5.4061609811311113
    gene2    0.798730116322797
    gene3    0.0237162043415813

2) Sample size: 1200,  constant added: $\textbf{1.0, 0.66 and 0.33}$

    variable	prob	   PVE	coef	   Pr(coef.>0.95)
	gene1	1.00000000	NA	 1.3036892	[+1.028,+1.561]
	gene2	0.04018127	NA	-0.2375393	[-0.476,+0.010]
	gene3	0.02796016	NA	-0.2240704	[-0.488,+0.038]

Odds ratio for three genes:

    gene1    3.763157894736842
    gene2    1.2527202527202528
    gene3    0.91492378589152779
 
$-\text{log}_{10}$(p-value) for three genes:

    gene1    5.4061609811311113
    gene2    1.1209781660647247
    gene3    0.2607852230581864