# Adding null component to SuSiE

Here we evaluate the possible benefit adding a null component to SuSiE. The hope is that the CS will be easier to prune (without using purity) and that the pruned CS can achieve smaller FDR.

## A simple simulation case

In [1]:
set.seed(1)
n = 500
p = 1000
b = rep(0,p)
b[200] = 1
b[800] = 1
X = matrix(rnorm(n*p),nrow=n,ncol=p)
X[,200] = X[,400]
X[,600] = X[,800]
y = X %*% b + rnorm(n)

diag_susie = function(purity = 0, dedup = F, ...) {
    s = susieR::susie(X, y, L=5, scaled_prior_variance=0.2, track_fit=F, coverage=NULL, ...)
    sets = susieR::susie_get_CS(s, X=cbind(X,0), coverage=0.95,min_abs_corr=purity,dedup=dedup)
    str(sets$cs)
    print('PIP for the null (1st PIP) and causal (other PIPs)')
    pip = susieR::susie_get_PIP(s, sets$cs_index)
    print(pip[c(p+1,200,400,600,800)])
    return(s)
}

run_susie = function(purity = 0.1, dedup = T, ...) {
    diag_susie(purity=purity, dedup=dedup, ...)
}

### Run SuSiE in "diagnostics" mode

"Diagnostics" means that we set no purity threshold and remove no duplicate CS. The PIP computation will be based on the un-processed result.

First, fit with SuSiE as is, but over-specifiy $L$ and report all CS obtained (setting `min_abs_corr` to zero)

In [2]:
s = diag_susie()

List of 5
 $ L2: int [1:2] 600 800
 $ L1: int [1:2] 200 400
 $ L3: int [1:876] 1 2 4 5 6 8 9 10 11 12 ...
 $ L4: int [1:876] 1 2 4 5 6 8 9 10 11 12 ...
 $ L5: int [1:877] 1 2 4 5 6 8 9 10 11 12 ...
[1] "PIP for the null (1st PIP) and causal (other PIPs)"
[1]        NA 0.5006026 0.5006026 0.5006017 0.5006017


So the 3rd and 4th CS are large. Now add a penalty to null,

In [3]:
s = diag_susie(null_weight=0.005)

List of 5
 $ L2: int [1:2] 600 800
 $ L1: int [1:2] 200 400
 $ L4: int [1:872] 1 2 4 5 6 8 9 10 11 12 ...
 $ L5: int [1:873] 1 2 4 5 6 8 9 10 11 12 ...
 $ L3: int [1:871] 1 2 4 5 6 8 9 10 11 12 ...
[1] "PIP for the null (1st PIP) and causal (other PIPs)"
[1] 0.1008317 0.5005786 0.5005786 0.5005777 0.5005777


As expected the CS got slightly narrower, and the PIP for the null is larger than $1/p$ (0.001 in this case). Now I increase the penalty,

In [4]:
s = diag_susie(null_weight=0.01)

List of 5
 $ L2: int [1:2] 600 800
 $ L1: int [1:2] 200 400
 $ L5: int [1:868] 1 2 4 5 6 8 9 10 11 12 ...
 $ L4: int [1:867] 1 2 4 5 6 8 9 10 11 12 ...
 $ L3: int [1:866] 1 2 4 5 6 8 9 10 11 12 ...
[1] "PIP for the null (1st PIP) and causal (other PIPs)"
[1] 0.1886082 0.5005565 0.5005565 0.5005557 0.5005557


In [5]:
s = diag_susie(null_weight=0.05)

List of 5
 $ L2: int [1:2] 600 800
 $ L1: int [1:2] 200 400
 $ L3: int [1:822] 1 2 4 5 6 8 9 10 11 12 ...
 $ L5: int [1:828] 1 2 4 5 6 8 9 10 11 12 ...
 $ L4: int [1:825] 1 2 4 5 6 8 9 10 11 12 ...
[1] "PIP for the null (1st PIP) and causal (other PIPs)"
[1] 0.6041899 0.5004224 0.5004224 0.5004218 0.5004218


In [6]:
s = diag_susie(null_weight=0.1)

List of 5
 $ L2: int [1:2] 600 800
 $ L1: int [1:2] 200 400
 $ L5: int [1:772] 2 4 5 6 8 9 10 11 12 13 ...
 $ L3: int [1:768] 2 4 5 6 8 9 10 11 12 13 ...
 $ L4: int [1:769] 2 4 5 6 8 9 10 11 12 13 ...
[1] "PIP for the null (1st PIP) and causal (other PIPs)"
[1] 0.8101352 0.5003204 0.5003204 0.5003199 0.5003199


In [7]:
s = diag_susie(null_weight=0.2)

List of 5
 $ L2: int [1:2] 600 800
 $ L1: int [1:2] 200 400
 $ L3: int [1:647] 2 5 6 8 10 11 12 13 14 15 ...
 $ L5: int [1:648] 2 5 6 8 10 11 12 13 14 15 ...
 $ L4: int [1:647] 2 5 6 8 10 11 12 13 14 15 ...
[1] "PIP for the null (1st PIP) and causal (other PIPs)"
[1] 0.9419801 0.5002054 0.5002054 0.5002052 0.5002052


In [8]:
s = diag_susie(null_weight=0.3)

List of 5
 $ L2: int [1:2] 600 800
 $ L1: int [1:2] 200 400
 $ L3: int [1:501] 5 8 10 11 13 14 15 16 17 18 ...
 $ L4: int [1:502] 5 8 10 11 13 14 15 16 17 18 ...
 $ L5: int [1:506] 5 8 10 11 13 14 15 16 17 18 ...
[1] "PIP for the null (1st PIP) and causal (other PIPs)"
[1] 0.9782793 0.5001411 0.5001411 0.5001409 0.5001409


In [9]:
s = diag_susie(null_weight=0.4)

List of 5
 $ L2: int [1:2] 600 800
 $ L1: int [1:2] 200 400
 $ L4: int [1:340] 8 10 11 13 14 23 25 26 31 32 ...
 $ L5: int [1:342] 8 10 11 13 14 23 25 26 31 32 ...
 $ L3: int [1:339] 8 10 11 13 14 23 25 26 31 32 ...
[1] "PIP for the null (1st PIP) and causal (other PIPs)"
[1] 0.9914398 0.5001001 0.5001001 0.5001000 0.5001000


In [10]:
s = diag_susie(null_weight=0.5)

List of 5
 $ L2: int [1:2] 600 800
 $ L1: int [1:2] 200 400
 $ L5: int [1:179] 8 10 25 31 32 33 36 37 38 45 ...
 $ L3: int [1:179] 8 10 25 31 32 33 36 37 38 45 ...
 $ L4: int [1:179] 8 10 25 31 32 33 36 37 38 45 ...
[1] "PIP for the null (1st PIP) and causal (other PIPs)"
[1] 0.9966153 0.5000713 0.5000713 0.5000713 0.5000713


In [11]:
s = diag_susie(null_weight=0.6)

List of 5
 $ L2: int [1:2] 600 800
 $ L1: int [1:2] 200 400
 $ L3: int [1:52] 8 36 63 90 112 173 246 248 339 352 ...
 $ L4: int [1:52] 8 36 63 90 112 173 246 248 339 352 ...
 $ L5: int [1:52] 8 36 63 90 112 173 246 248 339 352 ...
[1] "PIP for the null (1st PIP) and causal (other PIPs)"
[1] 0.9987379 0.5000499 0.5000499 0.5000499 0.5000499


In [12]:
s = diag_susie(null_weight=0.7)

List of 5
 $ L2: int [1:2] 600 800
 $ L1: int [1:2] 200 400
 $ L3: int [1:4] 112 768 954 1001
 $ L4: int [1:4] 112 768 954 1001
 $ L5: int [1:4] 112 768 954 1001
[1] "PIP for the null (1st PIP) and causal (other PIPs)"
[1] 0.9995926 0.5000333 0.5000333 0.5000333 0.5000333


In [13]:
s = diag_susie(null_weight=0.8)

List of 5
 $ L2: int [1:2] 600 800
 $ L3: int 1001
 $ L4: int 1001
 $ L5: int 1001
 $ L1: int [1:2] 200 400
[1] "PIP for the null (1st PIP) and causal (other PIPs)"
[1] 0.9999042 0.5000200 0.5000200 0.5000200 0.5000200


In [14]:
s = diag_susie(null_weight=0.98)

List of 5
 $ L2: int [1:2] 600 800
 $ L3: int 1001
 $ L4: int 1001
 $ L5: int 1001
 $ L1: int [1:2] 200 400
[1] "PIP for the null (1st PIP) and causal (other PIPs)"
[1] 0.9999999 0.5000017 0.5000017 0.5000017 0.5000017


### Run SuSiE in default mode

Here we set purity threshold to 0.1 and remove duplicate CS.

In [None]:
s = run_susie()

In this example the default SuSiE with purity filter is good enough. No need to bother with a penalty.

In [15]:
s = run_susie(null_weight=0.7)

List of 2
 $ L2: int [1:2] 600 800
 $ L1: int [1:2] 200 400
[1] "PIP for the null (1st PIP) and causal (other PIPs)"
[1] 0.0 0.5 0.5 0.5 0.5


In [17]:
s = run_susie(null_weight=0.75)

List of 3
 $ L2: int [1:2] 600 800
 $ L4: int [1:2] 954 1001
 $ L1: int [1:2] 200 400
[1] "PIP for the null (1st PIP) and causal (other PIPs)"
[1] 0.9405439 0.5000088 0.5000088 0.5000088 0.5000088


At least from this example, we will get stuck to a null set only when `null_weight` is very high.

## A null simulation

We simulate random data,

In [18]:
set.seed(1)
n = 500
p = 1000
X = matrix(rnorm(n*p),nrow=n,ncol=p)
y = rnorm(n)

and run SuSiE with / without purity filter:

In [20]:
s = diag_susie()

List of 5
 $ L5: int [1:887] 1 2 4 5 6 8 9 10 11 12 ...
 $ L4: int [1:887] 1 2 4 5 6 8 9 10 11 12 ...
 $ L2: int [1:887] 1 2 4 5 6 8 9 10 11 12 ...
 $ L3: int [1:887] 1 2 4 5 6 8 9 10 11 12 ...
 $ L1: int [1:887] 1 2 4 5 6 8 9 10 11 12 ...
[1] "PIP for the null (1st PIP) and causal (other PIPs)"
[1]          NA 0.002444137 0.007357031 0.002841289 0.002235344


In [23]:
s = run_susie()

 NULL
[1] "PIP for the null (1st PIP) and causal (other PIPs)"
[1]          NA 0.002444137 0.007357031 0.002841289 0.002235344


For this simple case, the purity filter itself is good enough to tell between signal and noise.

## A case demonstrating the usefulness of penalty

I took this data-set from our simulation. This is a case SuSiE makes false discovery in multi-signal setting.