### Experiments with Simulated Data
Suppose we have $n=100$ training data points and another $50$ testing data points

Discrete Variables $X={X_1,X_2,X_3,X_4}$; $X_i \in \{0,1\}$; $X\in \mathcal{R}^{n\times 4}$

Continous Variable $\mathbf{z}$; $\mathbf{z} \in \mathcal{R}^{n \times 1}$

Continous Target Variable $\mathbf{y}$; $\mathbf{y} \in \mathcal{R}^{n \times 1}$

True Moderator Variable $X_1,X_2$

Data generated :
    $$y = z\beta_i+b_i+\epsilon$$

In [34]:
import numpy as np
import pickle

#### Generating simulation data

In [26]:
probs = [0.1,0.3,0.5,0.7]
n = 150
X = np.zeros((n,4))
for i,p in enumerate(probs):
    X[:,i]=np.array([np.random.binomial(1,p) for j in range(n)])
z = np.random.randn(n).reshape(n,1)
y = np.zeros(n)
features = np.hstack([X,z])
beta = np.random.randn(4,2)*2
print beta

[[ 2.24613543 -0.1223977 ]
 [-0.43766271 -0.05272313]
 [ 1.22065594  0.32961376]
 [-0.52867745 -3.21989783]]


In [35]:
for i in range(n):
    if X[i][0]==0:
        if X[i][1]==0: y[i] = beta[0][0]*z[i]+beta[0][1]
        else: y[i] = beta[1][0]*z[i]+beta[1][1]
    else:
        if X[i][1]==0: y[i] = beta[2][0]*z[i]+beta[2][1]
        else: y[i] = beta[3][0]*z[i]+beta[3][1]
y = y+np.random.randn(n)

In [None]:
with open('data.pkl','w') as f:
    pickle.dump((X,z,y),f)
with open('coeff.pkl','w') as f:
    pickle.dump(beta,f)