This is a take on Scirpus' MCMC notebook. Most of it is similar until the end in which I compare it with OOF stacking methods.

In [None]:
import numpy as np
import pymc3 as pm
from sklearn.metrics import mean_absolute_error
import matplotlib.pyplot as plt
%matplotlib inline

The following cell creates the two models with noise based on a target.
One should note that the first model has more noise than the second model so one would expect model 1 to perform worse than model 2

In [None]:
size = 500
true_intercept = 1
true_slope = 2
x = np.linspace(0, 1, size)
# y = a + b*x
true_regression_line = true_intercept + true_slope * x
# add noise
model1 = true_regression_line + np.random.normal(scale=.5, size=size) #Noisy
model2 = true_regression_line + np.random.normal(scale=.2, size=size) #Less Noisy

In [None]:
np.random.seed = 0
permutation_set = np.random.permutation(size)
train_set = permutation_set[0:size//2]
test_set = permutation_set[size//2:size]

Let us see what the MAE looks like

In [None]:
print(mean_absolute_error(true_regression_line[test_set],model1[test_set]))
print(mean_absolute_error(true_regression_line[test_set],model2[test_set]))

As expected the noisier model does worse

Now let us look at the straight average

In [None]:
print(mean_absolute_error(true_regression_line[test_set],(model1*.5+model2*.5)[test_set]))

As one can see this isn't as good as our top model

Now comes the cool part.  We are going to use MCMC to draw samples from our data and get stats on how we can obtain a model that gets the best out of our raw models.

Important:  Please look at the documentation [here][1] (https://pymc-devs.github.io/pymc3/index.html) for details


  [1]: https://pymc-devs.github.io/pymc3/index.html

In [None]:
data = dict(x1=model1[train_set], x2=model2[train_set], y=true_regression_line[train_set])
with pm.Model() as model:
    # specify glm and pass in data. The resulting linear model, its likelihood and 
    # and all its parameters are automatically added to our model.
    pm.glm.glm('y ~ x1 + x2', data)
    step = pm.NUTS() # Instantiate MCMC sampling algorithm
    trace = pm.sample(2000, step, progressbar=False)

It takes a while - now is time to look at what goodness it gives to us

In [None]:
pm.traceplot(trace, figsize=(7,7))
plt.tight_layout();

One can see that for every drawn sample it gives the parameter values for the intercept, x1 and x2

In [None]:
intercept = np.median(trace.Intercept)
print(intercept)
x1param = np.median(trace.x1)
print(x1param)
x2param = np.median(trace.x2)
print(x2param)

I created a quick imitation of test/train split in order to compare of OOF ensembling methods.

In [None]:
model1_train = model1[train_set]
model2_train = model2[train_set]
x_train = np.vstack((model1_train, model2_train)).T

model1_test = model1[test_set].T
model2_test = model2[test_set].T
x_test = np.vstack((model1_test, model2_test)).T

y = true_regression_line[train_set]

Now to check if Linear Regression  finds a similar solution.

In [None]:
from sklearn.linear_model import LinearRegression
clfLR = LinearRegression()
clfLR.fit(x_train, y)
y_pred_LR = clfLR.predict(x_test)
print(clfLR.intercept_)
print(clfLR.coef_[0])
print(clfLR.coef_[1])

And a simple neural net.

In [None]:
from sklearn.neural_network import MLPRegressor
clfMLP = MLPRegressor()
clfMLP.fit(x_train, y)
y_pred_MLP = clfMLP.predict(x_test)

And a GBM.

In [None]:
from sklearn.ensemble import GradientBoostingRegressor
clfGBR = GradientBoostingRegressor(random_state=0)
clfGBR.fit(x_train, y)
y_pred_GBR = clfGBR.predict(x_test)

Now let's compare:

In [None]:
print('Model 1:',mean_absolute_error(true_regression_line[test_set],model1[test_set]))
print('Model 2:', mean_absolute_error(true_regression_line[test_set],model2[test_set]))
print('Average:',mean_absolute_error(true_regression_line[test_set],(model1*.5+model2*.5)[test_set]))
print('MCMC:',mean_absolute_error(true_regression_line[test_set],
                                  (intercept+x1param*model1+x2param*model2)[test_set]))
print('LR:',mean_absolute_error(true_regression_line[test_set], y_pred_LR))
print('MLP:',mean_absolute_error(true_regression_line[test_set], y_pred_MLP))
print('GBM:',mean_absolute_error(true_regression_line[test_set], y_pred_GBR))

Looks like MCMC did not outperform linear regression, however it was pretty close. Both of them come up with similar coefficients. Additionally, MCMC gives you a good sense of the standard deviation, although it runs significantly slower. Seems like a good tool to use the in the ensembling tool belt! Thanks Scirpus!