# GxE problem

- I updated the simulation function (scaling model) so that sigma is always positive (exponential)
- I am using `yt` in the function below to estimate the model

In [5]:
library(brms)
library(data.table)
library(texreg)

In [15]:
# Domingue's simulation function
simData = function(E,i=1,b0=.8,b1=.2,b2=0,b3=.05,h=sqrt(.6), a0 = 0, a1=.5, sigma=1,scaling=TRUE) {
    N = length(E)
    G = rnorm(N,0,1)
    eps = rnorm(N,1,sigma)
    if (scaling){
        e = sqrt(1-h^2)
        # I don't use ystar and y
        ystar = h*G+e*eps
        y = a0 + a1*E+(b0 + b1*E)*ystar
        # sigma of the error term should be
        fsigma = exp(b0*e + b1*e*E) 
        # final y values
        yt = a0 + a1*E + b0*h*G + b1*h*E*G + rnorm(N, 0, fsigma)

    } else {   
        y = b1*G + b2*E+ b3*G*E + eps
        fsigma = exp(0.1 + 0.4*E) 
        yt = a0 + b1*G + b2*E+ b3*G*E + rnorm(N, 0, fsigma)
    }
    df = data.frame( E=E, y=y,yt = yt, g=G)
    df
}


## Scaling model

In [7]:
E = rnorm(5000, 0, 1)
dts = data.table(simData(E, scaling = TRUE))
summary(dts)



       E                   y                 yt                 g           
 Min.   :-3.650110   Min.   :-2.3114   Min.   :-7.14639   Min.   :-4.10540  
 1st Qu.:-0.689086   1st Qu.:-0.2348   1st Qu.:-1.33631   1st Qu.:-0.67692  
 Median : 0.008993   Median : 0.4117   Median :-0.08744   Median :-0.03472  
 Mean   :-0.004884   Mean   : 0.5005   Mean   :-0.03744   Mean   :-0.01684  
 3rd Qu.: 0.671552   3rd Qu.: 1.1462   3rd Qu.: 1.17967   3rd Qu.: 0.66152  
 Max.   : 3.408695   Max.   : 4.7236   Max.   : 7.76733   Max.   : 3.37867  

In [8]:
cnames = c("ystar", "y")
m1 = lm(y ~ g + E + g * E, data = dts)
m2 = lm(yt ~ g + E + g * E, data = dts)
cat(screenreg(list(m1, m2)))


             Model 1      Model 2    
-------------------------------------
(Intercept)     0.51 ***    -0.02    
               (0.01)       (0.02)   
g               0.63 ***     0.62 ***
               (0.01)       (0.02)   
E               0.62 ***     0.56 ***
               (0.01)       (0.02)   
g:E             0.16 ***     0.17 ***
               (0.01)       (0.02)   
-------------------------------------
R^2             0.74         0.20    
Adj. R^2        0.74         0.20    
Num. obs.    5000         5000       
*** p < 0.001; ** p < 0.01; * p < 0.05



# Bayesian distributional model


In [None]:
# distributional model using bayesian stats

f = bf(yt ~ g + E + g * E, sigma ~ 1 + E)
m3 = brm(f, data = dts, family = brmsfamily("gaussian", link_sigma = "log"))


In [10]:
# able to get the sigma coefficients of the simulation
cat(screenreg(m3))


                 Model 1      
------------------------------
Intercept           -0.02     
                 [-0.07; 0.02]
sigma_Intercept      0.51 *   
                 [ 0.50; 0.53]
g                    0.62 *   
                 [ 0.58; 0.66]
E                    0.56 *   
                 [ 0.52; 0.60]
g:E                  0.17 *   
                 [ 0.13; 0.21]
sigma_E              0.12 *   
                 [ 0.11; 0.14]
------------------------------
R^2                  0.20     
Num. obs.         5000        
loo IC           19336.35     
WAIC             19336.34     
* 0 outside the confidence interval.


In [12]:

# I cannot reject the null hypothesis
hyp <- "g * sigma_E = g:E * sigma_Intercept"
(hyp <- hypothesis(m3, hyp, alpha = 0.05))


Hypothesis Tests for class b:
                Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio
1 (g*sigma_E)-(g:E*... = 0    -0.01      0.01    -0.04     0.02         NA
  Post.Prob Star
1        NA     
---
'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
'*': For one-sided hypotheses, the posterior probability exceeds 95%;
for two-sided hypotheses, the value tested against lies outside the 95%-CI.
Posterior probabilities of point hypotheses assume equal prior probabilities.

In [16]:
dt = data.table(simData(E, scaling = FALSE))
summary(dt)

       E                   y                 yt                 g           
 Min.   :-3.650110   Min.   :-2.2042   Min.   :-7.16783   Min.   :-3.18134  
 1st Qu.:-0.689086   1st Qu.: 0.3189   1st Qu.:-0.72768   1st Qu.:-0.68430  
 Median : 0.008993   Median : 1.0036   Median : 0.01175   Median :-0.02271  
 Mean   :-0.004884   Mean   : 1.0189   Mean   :-0.02700   Mean   :-0.02305  
 3rd Qu.: 0.671552   3rd Qu.: 1.7094   3rd Qu.: 0.69747   3rd Qu.: 0.62100  
 Max.   : 3.408695   Max.   : 5.2326   Max.   : 8.86018   Max.   : 3.42192  

In [None]:
f = bf(yt ~ g + E + g * E, sigma ~ 1 + E)
m4 = brm(f, data = dt, family = brmsfamily("gaussian", link_sigma = "log"))

In [18]:
cat(screenreg(m4))


                 Model 1      
------------------------------
Intercept           -0.02     
                 [-0.04; 0.01]
sigma_Intercept      0.09 *   
                 [ 0.08; 0.11]
g                    0.21 *   
                 [ 0.18; 0.24]
E                   -0.02     
                 [-0.04; 0.00]
g:E                  0.06 *   
                 [ 0.04; 0.08]
sigma_E              0.39 *   
                 [ 0.37; 0.41]
------------------------------
R^2                  0.03     
Num. obs.         5000        
loo IC           15091.61     
WAIC             15091.60     
* 0 outside the confidence interval.


In [19]:
# I reject the null hypothesis
hyp <- "g * sigma_E = g:E * sigma_Intercept"
(hyp <- hypothesis(m4, hyp, alpha = 0.05))

Hypothesis Tests for class b:
                Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio
1 (g*sigma_E)-(g:E*... = 0     0.08      0.01     0.06     0.09         NA
  Post.Prob Star
1        NA    *
---
'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
'*': For one-sided hypotheses, the posterior probability exceeds 95%;
for two-sided hypotheses, the value tested against lies outside the 95%-CI.
Posterior probabilities of point hypotheses assume equal prior probabilities.