# Trait Simulation Update

## Huwenbo Shi
#### UCLA Bioinformatics -- Prof. Bogdan Pasaniuc's Lab
#### shihuwenbo@ucla.edu

## Functionalities

1. Simulate under a generalized linear model or a generalized linear mixed model
2. Can simulate single or multiple correlated traits
3. Provide convenient ways to specify the simulation model 

## Examples

### First, simulate some data

In [1]:
include("../src/TraitSimulation.jl");
using DataFrames, Distributions, TraitSimulation;
srand(1);
npeople, nsnp = (10, 5);
snp_data = Matrix{Float64}(npeople, nsnp);
freqs = [0.2, 0.3, 0.4, 0.7, 0.5];
for i=1:nsnp
    snp_data[:,i] = rand(Binomial(2,freqs[i]), npeople);
end
hdl_data, ldl_data = (Vector{Float64}(npeople),
    Vector{Float64}(npeople));
for i=1:npeople
    hdl_data[i] = rand(Uniform(20,80));
    ldl_data[i] = rand(Uniform(20,80));
end
data = [snp_data hdl_data ldl_data]
data_frame = convert(DataFrame, data);
names!(data_frame, [:X1, :X2, :X3, :X4, :X5, :HDL, :LDL])

Unnamed: 0,X1,X2,X3,X4,X5,HDL,LDL
1,0.0,1.0,1.0,2.0,1.0,34.162000739722814,40.79102085151763
2,1.0,0.0,0.0,1.0,1.0,38.76241810016405,20.474557003433645
3,0.0,0.0,2.0,2.0,0.0,49.31676980477008,32.65809212951216
4,0.0,0.0,2.0,1.0,1.0,77.11498039014404,79.99427953391682
5,2.0,0.0,2.0,1.0,1.0,35.09973098191831,79.19998201392798
6,0.0,1.0,0.0,0.0,2.0,53.34506523947434,46.22647847657751
7,0.0,0.0,1.0,2.0,1.0,45.48307102970789,66.39338290744263
8,2.0,1.0,0.0,1.0,0.0,36.871413937143785,32.568342391884244
9,0.0,1.0,0.0,1.0,2.0,35.0827525875335,21.222492122760038
10,2.0,0.0,1.0,0.0,1.0,37.26209073654137,71.57072816525965


### Simulate a single trait with Normal response

$\mu = -0.2X_1 + 0.1X_2 \times X_5 + 0.3\log(\text{HDL} + \text{LDL})$

$y \sim N(\mu, 1.0)$

In [None]:
model = Model(Y ~ -0.2X1+0.1X2*X5+0.3log(HDL+LDL),
    IdentityLink(), NormalResponse(1.0))
simulate(model, data_frame)

### Simulate three traits with different mean but same response distribution

$\mu_1 = 0.2X1+3.0$, $\mu_2 = 0.3X_3+2.0$, $\mu_3 = 0.3X_4+\text{HDL}$

$y_1 \sim N(\mu_1, 1.0)$, $y_2 \sim N(\mu_2, 1.0)$, $y_3 \sim N(\mu_3, 1.0)$

In [None]:
model = Model([Y1 ~ 0.2X1+3.0, Y2 ~ 0.1X3+2.0, Y3 ~ 0.3X4+HDL],
    IdentityLink(), NormalResponse(1.0))
simulate(model, data_frame)

### Simulate three traits with Binomial, Poisson, and Normal response

$\mu_1 = 0.2X_1 + 3.0$, 
$y_1 \sim \text{Bin}(100, \mu_1)$

$\mu_2 = 0.1X_3 + 2.0$, 
$y_2 \sim \text{Pois}(\mu_2)$

$\mu_3 = 0.3X_4 + HDL$, 
$y_3 \sim N(\mu_3, 2.0)$

In [None]:
μ = [Y1 ~ 0.2X1+3.0, Y2 ~ 0.1X3+2.0, Y3 ~ 0.3X4+HDL]
link = [LogitLink(), LogLink(), IdentityLink()]
dist = [BinomialResponse(100), PoissonResponse(), NormalResponse(2.0)]
model = Model(μ, link, dist)
simulate(model, data_frame)

### Simulate a single Poisson distributed trait with two variance components

$\mu = (0.2X_1 + 2.0) + X u + \epsilon$, $u \sim N(0, 0.04K)$, $\epsilon \sim N(0, 0.8I)$

$y \sim \text{Pois}(\mu)$

In [20]:
μ = Y ~ 0.2X1+2.0
K = cor(data')
I = eye(npeople)
Σ = [VarianceComponent(0.2, K), VarianceComponent(0.8, I)]
model = Model(μ, Σ, LogLink(), PoissonResponse())
simulate(model, data_frame)

Unnamed: 0,Y
1,3
2,3
3,7
4,0
5,4
6,1
7,0
8,21
9,5
10,7


### A simple way to expression variance component model

Using the macro ```@vc``` instead of ```[VarianceComponent(0.2, K), VarianceComponent(0.8, I)]```

In [19]:
μ = Y ~ 0.2X1+2.0
K = cor(data')
I = eye(npeople)
model = Model(μ, (@vc 0.2K + 0.8I), LogLink(), PoissonResponse())
simulate(model, data_frame)

Unnamed: 0,Y
1,13
2,56
3,0
4,11
5,28
6,8
7,2
8,24
9,1
10,5


### Simulate two traits with two variance components and cross covariance

In [21]:
A = [0.2 -0.1; -0.1 0.3]
B = [0.8 -0.2; -0.2 0.7]
μ = [Y1 ~ X1+0.2X2*X3+1.0, Y2 ~ X3+0.1log(HDL+LDL)+0.1]
model = Model(μ, (@vc A ⊗ K + B ⊗ I), IdentityLink(), NormalResponse(1.0))
simulate(model, data_frame)

Unnamed: 0,Y1,Y2
1,-0.1022299766945473,1.242014570782589
2,1.6985645798391524,-1.7314710486455065
3,0.6579712947986169,2.2033770111233077
4,1.0769182112301432,1.7076290809531234
5,2.9038228662096914,3.303032046765953
6,1.468375569093222,-0.3125435357685245
7,-1.309610909380149,2.193982272458793
8,3.753520436943161,-2.1563700974996456
9,1.1805594385983615,0.2966892266339943
10,2.7257943887238394,-0.0419361451563232


## Future work

1. Add missingness to the trait simulation module
2. Speed up the code in variance component simulation
3. Add code to check user input and handle error gracefully