<a href="https://colab.research.google.com/github/namson98/Complete-Python-3-Bootcamp/blob/master/Ensemble_Learning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Ensemble learning

Please, make a copy of this colaboratory in order to be able to make changes **(File -> Save a copy in Drive)**.

This colaboratory includes practical exercises designed to support theoretical lecture on Ensemble Learning.

In [None]:
import pandas as pd
import numpy as np

import warnings
warnings.filterwarnings('ignore')

# For plotting like a pro
!pip install plotnine
from plotnine import *

# Basic ensemble via averaging (in the case of regression)

As in the lecture slides we shall start by building one the most basic types of ensembles - combining predictions from different models into one. Here we will look into combinding linear regression algorithms (vanila, ridge and lasso). In practice you can use almost any regression model as part of such ensemble. Needless to say that combining predictions produced by the same model on the same data will not result in additional performance.

In [None]:
example_data = pd.DataFrame({'x':[1,2,3,4,5], 'y':[2,4,5,4,5]})

In [None]:
fig = (
    ggplot(data = example_data,
          mapping = aes(x = 'x', y = 'y')) +
    geom_point(fill = '#36B059', 
               size = 5.0,
               stroke = 2.5,
               colour = '#2BE062',
               shape = 'o') +
    labs(
        title ='',
        x = 'X',
        y = 'y',
    ) +
    xlim(0, 6) +
    ylim(0, 7) +
    theme_bw() + 
    theme(figure_size = (5, 5),
          axis_line = element_line(size = 0.5, colour = "black"),
          panel_grid_major = element_line(size = 0.05, colour = "black"),
          panel_grid_minor = element_line(size = 0.05, colour = "black"),
          axis_text = element_text(colour ='black'))
)
fig

Let's split this mangnificently large dataset further to make `training` and `test` sets

In [None]:
example_data

In [None]:
train_df = example_data.iloc[[0,2,3],:] # select 1st, 3rd and 4th samples into the training set
print(train_df)

test_df = example_data.iloc[[1,4],:] # select 2nd and 5th samples into the test set
print(test_df)

In [None]:
fig = (
    ggplot(data = train_df,
          mapping = aes(x = 'x', y = 'y')) +
    geom_point(fill = '#36B059', 
               size = 5.0,
               stroke = 2.5,
               colour = '#2BE062',
               shape = 'o') +
    labs(
        title ='',
        x = 'X',
        y = 'y',
    ) +
    xlim(0, 6) +
    ylim(0, 7) +
    theme_bw() + 
    theme(figure_size = (5, 5),
          axis_line = element_line(size = 0.5, colour = "black"),
          panel_grid_major = element_line(size = 0.05, colour = "black"),
          panel_grid_minor = element_line(size = 0.05, colour = "black"),
          axis_text = element_text(colour ='black'))
)

fig = fig + geom_point(data = test_df,
          mapping = aes(x = 'x', y = 'y'), fill = 'blue', 
               size = 5.0,
               stroke = 2.5,
               colour = 'lightblue',
               shape = 'o')
fig

Now that data is ready let's train three linear regression models: basic linear regression, ridge regression and lasso regression.

In [None]:
from sklearn.linear_model import LinearRegression, Lasso, Ridge

# Initialising all three regression models
lr = LinearRegression()

lambd = 1 # you can use lambda_

# Ridge regression (template)
lr_ridge = Ridge(lambd)

# Lasso 
lr_lasso = Lasso(lambd)

We fit all three models on our improvised training data (`train_df`)

In [None]:
lr.fit(train_df[['x']], train_df[['y']])
lr_ridge.fit(train_df[['x']], train_df[['y']])
lr_lasso.fit(train_df[['x']], train_df[['y']])

Visualise all three lines on one plot:

In [None]:
fig = fig + geom_abline(intercept = lr.intercept_, slope = lr.coef_[0], linetype="dashed", size=1)
fig = fig + geom_abline(intercept = lr_ridge.intercept_, slope = lr_ridge.coef_[0], color="red", linetype="solid", size=1)
fig = fig + geom_abline(intercept = lr_lasso.intercept_, slope = lr_lasso.coef_[0], color="blue", linetype="solid", size=1)
fig

What can you say about this plot? How well each model performs? On train? On test?

In [None]:
# predicting test set by each of the models
lr_pred = lr.predict(test_df[['x']])
lr_ridge_pred = lr_ridge.predict(test_df[['x']])
lr_lasso_pred = lr_lasso.predict(test_df[['x']]).reshape((2,1)) # we need to reshape the resulting vector

In [None]:
print(lr_pred.shape)
print(lr_ridge_pred.shape)
print(lr_lasso_pred.shape)

Here we add a function that computes Residual Sum of Squares

In [None]:
def rss(predicted, true): # RSS == Residual Sum of Squares
  return(np.sum((true - predicted)**2))

In [None]:
# compute RSS for each model 
lr_rss = rss(lr_pred, test_df[['y']])
lr_ridge_rss = rss(lr_ridge_pred, test_df[['y']])
lr_lasso_rss = rss(lr_lasso_pred, test_df[['y']])

print(f'RSS for Linear Regression: {np.array(lr_rss)}')
print(f'RSS for Ridge Regression: {np.array(lr_ridge_rss)}')
print(f'RSS for Lasso Regression: {np.array(lr_lasso_rss)}')

Let's now combine predictions of three regression models into an ensemble by averaging

In [None]:
ensemble_preds = np.mean([lr_pred, lr_ridge_pred, lr_lasso_pred], axis = 0)
print(ensemble_preds)

What about RSS of averaged ensemble? What would be an expected value?

In [None]:
print(np.mean([lr_rss, lr_ridge_rss, lr_lasso_rss]))

What do we actually get?

In [None]:
ensemble_rss = rss(ensemble_preds, test_df[['y']])
print(np.array(ensemble_rss))

How do we visualise resulting model?

In [None]:
# We will create a dummy data that will be projected onto the ensemble that will help us visualise its predictions
background_data = pd.DataFrame({'x': np.linspace(start=0, stop=6, num=61)})
background_data['ensemble_y'] = np.mean((lr.predict(background_data[['x']]), lr_ridge.predict(background_data[['x']]), lr_lasso.predict(background_data[['x']]).reshape((61,1))), axis = 0)

Let's make our ensemble purple (mature colour)

In [None]:
fig + geom_path(data = background_data, mapping = aes(x = 'x', y = 'ensemble_y'), size = 1.5, colour = 'purple') 

# Basic ensemble via majority vote (in case of classification)
In classification majority vote is used when predictions of different models are merged into an ensemble. But first we shall generate some synthetic data.

In [None]:
np.random.seed(2342347823) # random seed, this number was random, no need to make conspiracies around it

D = 2 # two dimensions
N = 100 # points per class

# Generating N points for the first class
mu_vec1 = np.zeros(D) # creates a vector of zeros, these are averages across each dimension
cov_mat1 = np.eye(D) # creates a diagonal matrix of size D x D, all values except diagonal are 0
class1_sample = np.random.multivariate_normal(mu_vec1, cov_mat1, N)

In [None]:
# The same stuff as above, just averages are shifted into 1
mu_vec2 = np.ones(D) # creates a vector of ones
cov_mat2 = np.eye(D)
class2_sample = np.random.multivariate_normal(mu_vec2, cov_mat2, N)

In [None]:
# a lot of boring things....
# gluing together two matrices generated above
train = np.concatenate((class1_sample, class2_sample), axis=0)
train_data = pd.DataFrame(train)

# Create names for columns, actually there are only two this time
train_data.columns = [ 'x' + str(i) for i in (np.arange(D)+1)]

# Create a class column
train_data['class'] = np.concatenate((np.repeat(0, N), np.repeat(1, N)))

# This is important for plotting and modelling
train_data['class'] = train_data['class'].astype('category')

# Randomly splitting data into train (60%) and validation (40%)
from sklearn.model_selection import train_test_split
train, val = train_test_split(train_data, random_state = 111, test_size = 0.40) 

In [None]:
# Function that draws data points and colour them based on the class
def draw_points_ggplot2(point_set):
  fig = (
    ggplot(data = point_set,
          mapping = aes(x = 'x1', y = 'x2')) +
    geom_point(aes(colour = 'class', 
                   shape = 'class',
                   fill = 'class'), 
               size = 5.0,
               stroke = 2.5) +
    labs(
        title ='',
        x = 'x1',
        y = 'x2',
    ) +
    theme_bw() + 
    scale_color_manual(['#EC5D57', '#51A7F9']) + 
    scale_fill_manual(['#C82506', '#0365C0']) + 
    scale_shape_manual(['o', 's']) + 
    theme(figure_size = (5, 5),
          axis_line = element_line(size = 0.5, colour = "black"),
          panel_grid_major = element_line(size = 0.05, colour = "black"),
          panel_grid_minor = element_line(size = 0.05, colour = "black"),
          axis_text = element_text(colour ='black'))
  )
  return(fig)

In [None]:
# let's test it!
draw_points_ggplot2(train)

In [None]:
draw_points_ggplot2(val)

Now we will train three different classifiers, namely DT, KNN and LogisticRegression, which is a classifier despite its name. 

In [None]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.linear_model import LogisticRegression

np.random.seed(1111) # random seed for consistency

# define all three classifiers
model1 = DecisionTreeClassifier(max_depth = 5)
model2 = KNeighborsClassifier()
model3 = LogisticRegression()

# train classifiers
model1.fit(train[['x1', 'x2']],train[['class']])
model2.fit(train[['x1', 'x2']],train[['class']])
model3.fit(train[['x1', 'x2']],train[['class']])

# predict validation set
val['model1'] = model1.predict(val[['x1', 'x2']])
val['model2'] = model2.predict(val[['x1', 'x2']])
val['model3'] = model3.predict(val[['x1', 'x2']])

**Exercise** now that we have predictions from all three models, it is time to combine them using majority vote, make a new column `ensemble` in the pandas data.frame `val` with ensembled predictions from three models. 

Hint: what mathematical function returns the value that appears most often.

In [None]:
##### YOUR CODE STARTS #####
val['ensemble'] = val[['model1', 'model2', 'model3']].mode(axis = 1)
##### YOUR CODE ENDS #####

One handy way to compute accuracy of the `sklearn` model is to use function `score`. Each classification model has it own `score` method but in our case all of them return accuracy by default. You can use your own metric or choose one from the exhaustive list: https://scikit-learn.org/stable/modules/model_evaluation.html. 

In [None]:
print(f"Accuracy of DT {model1.score(val[['x1', 'x2']], val['class'])*100}%")
print(f"Accuracy of NN {model2.score(val[['x1', 'x2']], val['class'])*100}%")
print(f"Accuracy of LR {model3.score(val[['x1', 'x2']], val['class'])*100}%")

This trick will not work for ensemble (as we don't have a model object to call function `score`). Let's calculate the accuracy in old school way.

In [None]:
##### YOUR CODE STARTS #####
print(f"Accuracy of ensemble of DT, NN and LR {(np.sum(val['ensemble'] == val['class'])/len(val))*100}%")
##### YOUR CODE ENDS #####

Let's visualise decision boundaries of three classifiers and the ensemble.

The following function generates a synthetic 2D point grid, that spans from `start` to `stop` along `x1` and `x2` dimension. You should be able to specify the number of points per unit of distance, e.g. if there would be only one dimension (e.g. `x1`) that would span from 0 (`start`) to 2 (`stop`) with 3 points per unit (`ppu`) of distance you would need to create a vector `[0, 0.4, 0.8, 1.2, 1.6, 2.0]`. You can create this output using function `np.linspace(start=0, stop=2, num=3*(2+0))`. Now you should do this for 2D.

In [None]:
def generate_grid(start, stop, ppu):
  num_points = (np.abs(start) + np.abs(stop))*ppu
  grid_data = pd.concat([pd.DataFrame({'x1': np.repeat(x, num_points), 
                                       'x2': np.linspace(start=start, stop=stop, num=num_points)}) for x in np.linspace(start=start, stop=stop, num=num_points)])
  return(grid_data)

In [None]:
start = -3 
stop = 4
ppu = 20 # points per unit

grid_data = generate_grid(start, stop, ppu)
print(grid_data.shape) # it should be (num_points squared, 2)

Now that you have the grid, predict each point of this grid by each of our models, including the ensemble:

In [None]:
##### YOUR CODE STARTS #####
grid_data['model1'] = model1.predict(grid_data[['x1', 'x2']])
grid_data['model2'] = model2.predict(grid_data[['x1', 'x2']])
grid_data['model3'] = model3.predict(grid_data[['x1', 'x2']])
grid_data['ensemble'] = grid_data[['model1', 'model2', 'model3']].mode(axis = 1)
##### YOUR CODE ENDS #####

We are ready to visualise each model covered with its respecting decision area

In [None]:
draw_points_ggplot2(val) + geom_point(data = grid_data, mapping = aes(x = 'x1', y = 'x2', colour = 'factor(model1)'),  size = .5, alpha = 0.2) + annotate("text", label = "DecisionTree", x = 2.8, y = 3.5, size = 12, colour = "black")

In [None]:
draw_points_ggplot2(val) + geom_point(data = grid_data, mapping = aes(x = 'x1', y = 'x2', colour = 'factor(model2)'),  size = .5, alpha = 0.2) + annotate("text", label = "K-NN", x = 2.8, y = 3.5, size = 12, colour = "black")

In [None]:
draw_points_ggplot2(val) + geom_point(data = grid_data, mapping = aes(x = 'x1', y = 'x2', colour = 'factor(model3)'),  size = .5, alpha = 0.2) + annotate("text", label = "LogisticRegression", x = 2.5, y = 3.5, size = 12, colour = "black")

In [None]:
draw_points_ggplot2(val) + geom_point(data = grid_data, mapping = aes(x = 'x1', y = 'x2', colour = 'factor(ensemble)'),  size = .5, alpha = 0.2) + annotate("text", label = "Ensemble", x = 2.8, y = 3.5, size = 12, colour = "black")

`VotingClassifier` function from  `sklearn`, implements simple ensemble using different classifiers. Let's see how it works.

In [None]:
# Import our guest
from sklearn.ensemble import VotingClassifier
np.random.seed(1111) # nothing interesting here, read on

# Specify correct estimators/classifiers
ensemble_model = VotingClassifier(estimators=[('dt', model1), ('knn', model2), ('lr', model3)], voting='hard')

# Train the VotingClassifier model on training data 
ensemble_model.fit(train[['x1', 'x2']],train[['class']])

# Predict validation data using trained model
val['ensemble'] = ensemble_model.predict(val[['x1', 'x2']])

# Use score function to evaluate VotingClassifier's performance
print(f"Accuracy of sklearn ensemble {ensemble_model.score(val[['x1', 'x2']], val[['class']])*100}%")

To remind ourselves the accurace of our hand made ensemble:

In [None]:
print(f"Accuracy of ensemble of DT, NN and LR {np.mean(val['ensemble'] == val['class'])*100}%")

## Weighted ensemble (Classification)
As we learnt in the lecture sometimes we prefer to trust some classifiers more than others and this is reflected in the way how ensembles are constructed. Here we will use MNIST dataset to test weighted ensembling approach. 

In the meantime, some setup code:

In [None]:
# old school TF
%tensorflow_version 1.x
import tensorflow as tf
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR) # credit to Dmitry Lekhovitsky

# MNIST lives here:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=False)

images = np.vstack([img.reshape(-1,) for img in mnist.train.images])
labels = mnist.train.labels
print(f"images are of shape: {images.shape} and labels: {labels.shape}")

train_images = images[0:2000,:]
train_labels = labels[0:2000]

test_images = images[2000:3000,:]
test_labels = labels[2000:3000]

train_images = pd.DataFrame(np.matrix(train_images))
test_images = pd.DataFrame(np.matrix(test_images))

In [None]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import VotingClassifier

# initialize model templates
model1 = DecisionTreeClassifier()
model2 = KNeighborsClassifier()
model3 = LogisticRegression()
model = VotingClassifier(estimators=[('dt', model1), ('knn', model2), ('lr', model3)], voting='hard')

Train each model on the training images

In [None]:
np.random.seed(1111) 
model1.fit(train_images, train_labels)

np.random.seed(1111) 
model2.fit(train_images, train_labels)

np.random.seed(1111) 
model3.fit(train_images, train_labels)

np.random.seed(1111) 
model.fit(train_images, train_labels)
print('done')

Here we predit classes for test images

In [None]:
model1_pred = model1.predict(test_images)
model2_pred = model2.predict(test_images)
model3_pred = model3.predict(test_images)
ensemble_pred = model.predict(test_images)

In [None]:
print(f"Accuracy of DT {model1.score(test_images, test_labels)*100}%")
print(f"Accuracy of NN {model2.score(test_images, test_labels)*100}%")
print(f"Accuracy of LR {model3.score(test_images, test_labels)*100}%")
print(f"Accuracy of ensemble {model.score(test_images, test_labels)*100}%")

87.6% is not the greatest performance. Let's see if we can improve it.
`VotingClassifier` has a parameter `weights` which specifies the "level of trust" that we have in each of the models, higher the weight more we trusth the model. Let's replicate the basic ensemble using parameter `weights`

In [None]:
np.random.seed(1111) 

##### YOUR CODE STARTS #####
# set equal weights for each of the classifiers to reproduce the basic majority vote ensemble:
model = VotingClassifier(estimators=[('dt', model1), ('knn', model2), ('lr', model3)], voting='hard', weights = [0.3,0.3,0.3])
##### YOUR CODE ENDS #####

model.fit(train_images, train_labels)
print(f"Accuracy of ensemble {model.score(test_images, test_labels)*100}%")

Now let's change the weights. As we saw in the lecture model's performance across CV iterations seems to be a reasonable ground for estimating the trust we have in model.



---

Let's use a `cross_val_score` function from `sklearn`!

In [None]:
from sklearn.model_selection import cross_val_score

X = np.array(train_images)
y = np.array(train_labels)

scores_model1 = cross_val_score(model1, X, y, cv=4)
print(f'Average validation accuracy for model1 is {np.mean(scores_model1)}')

scores_model2 = cross_val_score(model2, X, y, cv=4)
print(f'Average validation accuracy for model2 is {np.mean(scores_model2)}')

scores_model3 = cross_val_score(model3, X, y, cv=4)
print(f'Average validation accuracy for model3 is {np.mean(scores_model3)}')

**NB!** Keep in mind that `cross_val_score` does not shuffle your data, you can pass `StratifiedKFold` object as a value for `cv` parameter. This `StratifiedKFold` object should be created from your data using option `shuffle=True`. 

Now we can use these scores to infer model weights.

In [None]:
np.random.seed(1111) 

##### YOUR CODE STARTS #####
model = VotingClassifier(estimators=[('dt', model1), ('knn', model2), ('lr', model3)], voting='hard', weights = [np.mean(scores_model1), np.mean(scores_model2), np.mean(scores_model3)])
##### YOUR CODE ENDS ##### (please do not delete this line)

# Train a new ensemble
model.fit(train_images, train_labels)

# Evaluate it's performance on the test images
print(f"Accuracy of ensemble {model.score(test_images, test_labels)*100}%")

# Bagging (**B**ootstrap + **AGG**regation = **BAGG**ing)

## Bootstrap (1st step)

Here is the familiar decision tree model we have built before:

In [None]:
from sklearn.tree import DecisionTreeClassifier

np.random.seed(1111) # random seed for consistency

model1 = DecisionTreeClassifier()
model1.fit(train[['x1', 'x2']],train[['class']])
print(f"Accuracy of a signle DT {model1.score(val[['x1', 'x2']], val[['class']])*100}%")

Let's bootstrap 3 equal random parts of training data. 

**NB!** What is the difference between using `np.random.seed = 1111` as a separate command and `random_state = 1111` inside resample function?

In [None]:
from sklearn.utils import resample
n_bootstraps = 3
np.random.seed(1111)

# from StackOverFlow
resamples = [resample(train[['x1', 'x2']], n_samples = int(train[['x1', 'x2']].shape[0]*0.8)).index.values for i in range(n_bootstraps)]

In [None]:
# first resample
train_resample1 = train.loc[resamples[0]]

# second resample
train_resample2 = train.loc[resamples[1]]

# third resample
train_resample3 = train.loc[resamples[2]]

In [None]:
draw_points_ggplot2(train_resample1)

In [None]:
draw_points_ggplot2(train_resample2)

Let's train **3** identical DTs on each resample.

In [None]:
# We couldn't use only one variable as
# we wouldn't be able to capture progress of each DT independently
model1 = DecisionTreeClassifier()
model2 = DecisionTreeClassifier()
model3 = DecisionTreeClassifier()

In [None]:
np.random.seed(1111) # random seed for consistency

# Note, that we cannot use VotingClassifier as before, 
# as each tree has to be trained on its own data
model1.fit(train_resample1[['x1','x2']], train_resample1[['class']])
model2.fit(train_resample2[['x1','x2']], train_resample2[['class']])
model3.fit(train_resample3[['x1','x2']], train_resample3[['class']])

In [None]:
start = -3 
stop = 4
ppu = 20 # points per unit

grid_data = generate_grid(start, stop, ppu)

grid_data['model1'] = model1.predict(grid_data[['x1', 'x2']])
grid_data['model2'] = model2.predict(grid_data[['x1', 'x2']])
grid_data['model3'] = model3.predict(grid_data[['x1', 'x2']])

Let's visualise these resamples

In [None]:
draw_points_ggplot2(train_resample1) + geom_point(data = grid_data, mapping = aes(x = 'x1', y = 'x2', colour = 'factor(model1)'),  size = .5, alpha = 0.2)

In [None]:
draw_points_ggplot2(train_resample2) + geom_point(data = grid_data, mapping = aes(x = 'x1', y = 'x2', colour = 'factor(model2)'),  size = .5, alpha = 0.2)

In [None]:
draw_points_ggplot2(train_resample3) + geom_point(data = grid_data, mapping = aes(x = 'x1', y = 'x2', colour = 'factor(model3)'),  size = .5, alpha = 0.2)

## Aggregation (2nd step)

In [None]:
draw_points_ggplot2(val) + geom_point(data = grid_data, mapping = aes(x = 'x1', y = 'x2', colour = 'factor(model3)'),  size = .5, alpha = 0.2)

In [None]:
val['model1'] = model1.predict(val[['x1', 'x2']])
val['model2'] = model2.predict(val[['x1', 'x2']])
val['model3'] = model3.predict(val[['x1', 'x2']])

In [None]:
val['bagg_ensemble'] = val[['model1', 'model2', 'model3']].mode(axis = 1)

In [None]:
print(f"Accuracy of hand made bagged ensemble with 3 DTs {np.sum(val['bagg_ensemble'] == val['class'])/len(val[['class']])*100}%")

# Bagging in sklearn

In [None]:
# In sklearn, there is also BaggingRegressor as might have imagined
# BaggingClassifier is called Bagging meta-estimator
from sklearn.ensemble import BaggingClassifier

# Base classifier
from sklearn.tree import DecisionTreeClassifier

# humdrum random seed thingy (aka piano in the bushes)
np.random.seed(1111)

bagger = BaggingClassifier(base_estimator=DecisionTreeClassifier(), n_estimators=3, max_samples=0.8)

# Train bagger
bagger.fit(train[['x1','x2']], train['class'])

print(f"Accuracy of sklearn bagging with {3} DTs {bagger.score(val[['x1', 'x2']], val[['class']])*100}%")

What if we try more estimators?

In [None]:
# Initialise our bagging classifier that consists of 9 DTs
bagger = BaggingClassifier(base_estimator=DecisionTreeClassifier(),
                        n_estimators=3, max_samples=0.8, random_state = 1).fit(train[['x1','x2']], train['class'])

print(f"Accuracy of sklearn bagging with {11} DTs {bagger.score(val[['x1', 'x2']], val[['class']])*100}%")



---


# Random Forest algorithm
I would call Random Forest - the working horse of ML. Here we will not implement the Random Forest algorithm, but we will get very close to its understanding.

First, let's regenerate some data:

In [None]:
np.random.seed(2342347823) # random seed for consistency

D = 50
N = 50

# Generating 50 points for the first class
mu_vec1 = np.zeros(D) 
cov_mat1 = np.eye(D) # creates a diagonal matrix of size D x D, all values except diagonal are 0
class1_sample = np.random.multivariate_normal(mu_vec1, cov_mat1, N)

# Generating 50 points for the second class
mu_vec2 = np.ones(D)
cov_mat2 = np.eye(D)
class2_sample = np.random.multivariate_normal(mu_vec2, cov_mat2, N)

train = np.concatenate((class1_sample, class2_sample), axis=0)
train_data = pd.DataFrame(train)

# Create names for columns, x1, x2 ... x50
train_data.columns = [ 'x' + str(i) for i in (np.arange(D)+1)]

# Create a class column
train_data['class'] = np.concatenate((np.repeat(0, N), np.repeat(1, N)))

# This is important for plotting and modelling
train_data['class'] = train_data['class'].astype('category')

from sklearn.model_selection import train_test_split
train, val = train_test_split(train_data, random_state = 111, test_size = 0.40) 

Regular `DecisionTree` will suffer from the curse of dimensionality with this high-dimensional data:

In [None]:
from sklearn.tree import DecisionTreeClassifier
model = DecisionTreeClassifier() 

np.random.seed(1111)
# 2D data
model.fit(train[['x1','x2']], train['class'])
print(f"Validation accuracy is {model.score(val[['x1','x2']], val[['class']])*100}%")

# 5D data
model.fit(train.iloc[:, :5], train['class'])
print(f"Validation accuracy is {model.score(val.iloc[:, :5], val[['class']])*100}%")

# 50D data
model.fit(train.iloc[:, :50], train['class'])
print(f"Validation accuracy is {model.score(val.iloc[:, :50], val[['class']])*100}%")

More stable estimates can be obtained using `cross_val_score` function

In [None]:
model = DecisionTreeClassifier() 

np.random.seed(1111)

# as `cross_val_score` does not shuffle the data by itself
shuffled_train_data = train_data.sample(frac=1)

# 2D data
scores_model1 = cross_val_score(model, shuffled_train_data[['x1','x2']], shuffled_train_data['class'], cv=4)
print(f'Average validation accuracy on 2D is {np.mean(scores_model1)*100}%')

# 5D data
scores_model2 = cross_val_score(model, shuffled_train_data.iloc[:, :5], shuffled_train_data['class'], cv=4)
print(f'Average validation accuracy on 5D is {np.mean(scores_model2)*100}%')

# 50D data
scores_model3 = cross_val_score(model, shuffled_train_data.iloc[:, :50], shuffled_train_data['class'], cv=4)
print(f'Average validation accuracy on 50D is {np.mean(scores_model3)*100}%')

Not a fair comparison, because DTs are single trees, while random forest is a bagging classifier

In [None]:
bagger = BaggingClassifier(base_estimator=DecisionTreeClassifier(), max_samples=0.8, n_estimators=9, random_state=1111)

# 2D data
scores_model1 = cross_val_score(bagger, shuffled_train_data[['x1','x2']], shuffled_train_data['class'], cv=4)
print(f'Average validation accuracy on 2D is {np.mean(scores_model1)*100}%')

# 5D data
scores_model2 = cross_val_score(bagger, shuffled_train_data.iloc[:, :5], shuffled_train_data['class'], cv=4)
print(f'Average validation accuracy on 5D is {np.mean(scores_model2)*100}%')

# 50D data
scores_model3 = cross_val_score(bagger, shuffled_train_data.iloc[:, :50], shuffled_train_data['class'], cv=4)
print(f'Average validation accuracy on 50D is {np.mean(scores_model3)*100}%')

To make another step from the bag of decision trees to random forest, we can set a value for `max_features` parameter to something that is less than 1.0 (e.g. 0.8). This would ensure that every tree in the bag receives a random set of initial features.

In [None]:
bagger = BaggingClassifier(base_estimator=DecisionTreeClassifier(), max_samples = 0.8, max_features = 0.8, n_estimators=9, random_state=1111)

# 2D data
scores_model1 = cross_val_score(bagger, shuffled_train_data[['x1','x2']], shuffled_train_data['class'], cv=4)
print(f'Average validation accuracy on 2D is {np.mean(scores_model1)*100}%')

# 5D data
scores_model2 = cross_val_score(bagger, shuffled_train_data.iloc[:, :5], shuffled_train_data['class'], cv=4)
print(f'Average validation accuracy on 5D is {np.mean(scores_model2)*100}%')

# 50D data
scores_model3 = cross_val_score(bagger, shuffled_train_data.iloc[:, :50], shuffled_train_data['class'], cv=4)
print(f'Average validation accuracy on 50D is {np.mean(scores_model3)*100}%')

Finally, let's train the random classifier itself.

In [None]:
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier()

np.random.seed(1111)

# 2D data
scores_model1 = cross_val_score(model, shuffled_train_data[['x1','x2']], shuffled_train_data['class'], cv=4)
print(f'Average validation accuracy for model1 is {np.mean(scores_model1)*100}%')

# 5D data
scores_model2 = cross_val_score(model, shuffled_train_data.iloc[:, :5], shuffled_train_data['class'], cv=4)
print(f'Average validation accuracy for model1 is {np.mean(scores_model2)*100}%')

# 50D data
scores_model3 = cross_val_score(model, shuffled_train_data.iloc[:, :50], shuffled_train_data['class'], cv=4)
print(f'Average validation accuracy for model1 is {np.mean(scores_model3)*100}%')

Extremely Randomized Trees (extreme RF)

In [None]:
from sklearn.ensemble import ExtraTreesClassifier
model = ExtraTreesClassifier()

np.random.seed(1111)

# 2D data
scores_model1 = cross_val_score(model, shuffled_train_data[['x1','x2']], shuffled_train_data['class'], cv=4)
print(f'Average validation accuracy for model1 is {np.mean(scores_model1)*100}%')

# 5D data
scores_model2 = cross_val_score(model, shuffled_train_data.iloc[:, :5], shuffled_train_data['class'], cv=4)
print(f'Average validation accuracy for model1 is {np.mean(scores_model2)*100}%')

# 50D data
scores_model3 = cross_val_score(model, shuffled_train_data.iloc[:, :50], shuffled_train_data['class'], cv=4)
print(f'Average validation accuracy for model1 is {np.mean(scores_model3)*100}%')

# Setup before the part II

In [None]:
import pandas as pd
import numpy as np

import warnings
warnings.filterwarnings('ignore')

# For plotting like a pro
!pip install plotnine
from plotnine import *

In [None]:
# Function that draws data points and colour them based on the class
def draw_points_ggplot2(point_set):
  fig = (
    ggplot(data = point_set,
          mapping = aes(x = 'x1', y = 'x2')) +
    geom_point(aes(colour = 'class', 
                   shape = 'class',
                   fill = 'class'), 
               size = 5.0,
               stroke = 2.5) +
    labs(
        title ='',
        x = 'x1',
        y = 'x2',
    ) +
    theme_bw() + 
    scale_color_manual(['#EC5D57', '#51A7F9']) + 
    scale_fill_manual(['#C82506', '#0365C0']) + 
    scale_shape_manual(['o', 's']) + 
    theme(figure_size = (5, 5),
          axis_line = element_line(size = 0.5, colour = "black"),
          panel_grid_major = element_line(size = 0.05, colour = "black"),
          panel_grid_minor = element_line(size = 0.05, colour = "black"),
          axis_text = element_text(colour ='black'))
  )
  return(fig)

In [None]:
def generate_grid(start, stop, ppu):
  num_points = (np.abs(start) + np.abs(stop))*ppu
  grid_data = pd.concat([pd.DataFrame({'x1': np.repeat(x, num_points), 
                                       'x2': np.linspace(start=start, stop=stop, num=num_points)}) for x in np.linspace(start=start, stop=stop, num=num_points)])
  return(grid_data)

In [None]:
start = 0 
stop = 6
ppu = 20 # points per unit

grid_data = generate_grid(start, stop, ppu)
grid_data.shape

# Boosting

## Adaptive boosting (Adaboost)
We shall build decision stumps (decision trees of depth 1) on the toy data.

In [None]:
example_data = pd.DataFrame({'x1':[1,2,3,4,5], 'x2':[2,4,5,4,5], 'class':[1,0,1,1,0]})
example_data['class'] = example_data['class'].astype('category') # note that we turn class into categories
draw_points_ggplot2(example_data)

In [None]:
from sklearn.tree import DecisionTreeClassifier
dt = DecisionTreeClassifier()
dt.fit(example_data[['x1', 'x2']], example_data[['class']])

start = 0 
stop = 6
ppu = 20 # points per unit

grid_data = generate_grid(start, stop, ppu)
grid_data['dt'] = dt.predict(grid_data[['x1', 'x2']])

# visualise the initial egalitarian tree
draw_points_ggplot2(example_data) + geom_point(data = grid_data, mapping = aes(x = 'x1', y = 'x2', colour = 'factor(dt)'),  size = .5, alpha = 0.2)

Let's build the first decision stump:

In [None]:
from sklearn.tree import DecisionTreeClassifier
model1 = DecisionTreeClassifier(max_depth=1) # remember that it can only have 1 level

In [None]:
initial_weights = np.ones(len(example_data)) # egalitarian world of samples
print(initial_weights)

np.random.seed(1111)

model1.fit(example_data[['x1', 'x2']], example_data[['class']], sample_weight = initial_weights)

In [None]:
start = 0 
stop = 6
ppu = 20 # points per unit

grid_data = generate_grid(start, stop, ppu)
grid_data['model1'] = model1.predict(grid_data[['x1', 'x2']])

# visualise the initial egalitarian tree
draw_points_ggplot2(example_data) + geom_point(data = grid_data, mapping = aes(x = 'x1', y = 'x2', colour = 'factor(model1)'),  size = .5, alpha = 0.2) + geom_text(aes(label = initial_weights), nudge_y = 0.4)

In [None]:
model1.predict(example_data[['x1', 'x2']]) != example_data['class']

In [None]:
incorrect = model1.predict(example_data[['x1', 'x2']]) != example_data['class']
print(np.array(incorrect))

**Exercise** update the weights as discussed in the lecture (add score of 0.5 to those points that were misclassified and remove 0.5 from classified correctly) 

In [None]:
import copy
new_weights = copy.deepcopy(initial_weights)
##### YOUR CODE STARTS #####
new_weights[np.array(~incorrect)] = 
new_weights[np.array(incorrect)] = 
##### YOUR CODE ENDS #####

print(new_weights)

In [None]:
draw_points_ggplot2(example_data) + geom_text(aes(label = new_weights), nudge_y = 0.2) + geom_point(data = grid_data, mapping = aes(x = 'x1', y = 'x2', colour = 'factor(model1)'),  size = .5, alpha = 0.2)

**Exercise** Repeat the same process for models #2 and #3.

Let's build the second tree using these new weights

In [None]:
np.random.seed(1111)
##### YOUR CODE STARTS #####
model2 = 
model2.fit
##### YOUR CODE ENDS #####

Visualising boundaries of the second tree:

In [None]:
##### YOUR CODE STARTS #####
grid_data['model2'] = 
##### YOUR CODE ENDS #####
draw_points_ggplot2(example_data) + geom_point(data = grid_data, mapping = aes(x = 'x1', y = 'x2', colour = 'factor(model2)'),  size = .5, alpha = 0.2) + geom_text(aes(label = new_weights), nudge_y = 0.4)

In [None]:
##### YOUR CODE STARTS #####
incorrect = 
##### YOUR CODE ENDS #####

print(np.array(incorrect))

Changing the weights for the second time:

In [None]:
##### YOUR CODE STARTS #####
newer_weights = copy.deepcopy(new_weights)
newer_weights[np.array(incorrect)] = 
newer_weights[np.array(~incorrect)] = 
##### YOUR CODE ENDS #####
print(newer_weights)

In [None]:
np.random.seed(1111)

##### YOUR CODE STARTS #####
model3 = 
model3.fit
##### YOUR CODE ENDS #####

Visualising the decision boundaries of the third tree

In [None]:
##### YOUR CODE STARTS #####
grid_data['model3'] = 
##### YOUR CODE ENDS #####
draw_points_ggplot2(example_data) + geom_point(data = grid_data, mapping = aes(x = 'x1', y = 'x2', colour = 'factor(model3)'),  size = .5, alpha = 0.2) + geom_text(aes(label = newer_weights), nudge_y = 0.4)

Putting all these trees together into one model

In [None]:
grid_data['ensemble'] = grid_data[['model1', 'model2', 'model3']].mode(axis = 1)
draw_points_ggplot2(example_data) + geom_point(data = grid_data, mapping = aes(x = 'x1', y = 'x2', colour = 'factor(ensemble)'),  size = .5, alpha = 0.2)

Let's compare to the official `AdaBoostClassifier` implmentation from the `sklearn`. Pay attention to the parameters, we want 3 models, with each one of them being `DecisionTreeClassifier` with `max_depth = 1`.


In [None]:
from sklearn.ensemble import AdaBoostClassifier
model = AdaBoostClassifier(n_estimators=3, base_estimator=DecisionTreeClassifier(max_depth=1), random_state=1)

# train AdaBoost on our data
model.fit(example_data[['x1','x2']], example_data[['class']])

Here we visualise AdaBoost decision boundaries

In [None]:
grid_data['ada_ensemble'] = model.predict(grid_data[['x1', 'x2']])
draw_points_ggplot2(example_data) + geom_point(data = grid_data, mapping = aes(x = 'x1', y = 'x2', colour = 'factor(ada_ensemble)'),  size = .5, alpha = 0.2)

## Gradient boosting machines (GBM)

In [None]:
example_data = pd.DataFrame({'x1':[1,2,3,4,5], 'x2':[2,4,5,4,5], 'class':[1,0,1,1,0]})
# note that now we actually don't turn "class" into categorical
# we will treat this problem as regreession now

Fit the first **`DecisionTreeRegressor`** model on the original data. I have not found any restrictions on the size of the tree for the gradient boosting algorithm, but let's keep decision stumps as before.


In [None]:
from sklearn.tree import DecisionTreeRegressor
model1 = DecisionTreeRegressor(max_depth=1) # let's keep 1 level trees

np.random.seed(111)

model1.fit(example_data[['x1', 'x2']], example_data[['class']])

Now, let's predict the data using this first tree

In [None]:
predictions_model1 = model1.predict(example_data[['x1','x2']])
print(f'predictions of the first tree: {predictions_model1}')

Find the residuals (subtract predictions from the ground truth)

In [None]:
errors_model1 = example_data['class'] - predictions_model1
print(f'residuals: {errors_model1}')

Now use these errors as a `target` for the second tree!

In [None]:
np.random.seed(1111)

model2 = DecisionTreeRegressor(max_depth=1)
model2.fit(X = example_data[['x1', 'x2']], y = errors_model1)

**Exercise** implement the same procedure for the second and third models

In [None]:
##### YOUR CODE STARTS #####
predictions_model2 = 
##### YOUR CODE ENDS #####
print(f'predictions of the second tree: {predictions_model2}')

Add these to the predictions obtained by the first model. Subtract the resulting sum from the ground truth.

In [None]:
##### YOUR CODE STARTS #####
errors_model2 = 
##### YOUR CODE ENDS #####
print(f'residuals: {errors_model2}')

Do the same for the last third tree

In [None]:
np.random.seed(1111)

##### YOUR CODE STARTS #####
model3 = 
model3.fit
##### YOUR CODE ENDS #####

In [None]:
##### YOUR CODE STARTS #####
predictions_model3 = 
##### YOUR CODE ENDS #####
print(f'predictions of the first tree: {predictions_model3}')

In [None]:
##### YOUR CODE STARTS #####
errors_model3 = 
##### YOUR CODE ENDS #####
print(f'residuals: {errors_model3}')

In [None]:
grid_data['gbm'] = model1.predict(grid_data[['x1', 'x2']]) + model2.predict(grid_data[['x1', 'x2']]) + model3.predict(grid_data[['x1', 'x2']])

In [None]:
fig = (
    ggplot(data = grid_data,
          mapping = aes(x = 'x1', y = 'x2')) +
    geom_point(aes(colour = 'gbm'), 
               size = 1.0) +
    labs(
        title ='',
        x = 'x1',
        y = 'x2',
    ) +
    theme_bw() + 
    theme(figure_size = (5, 5),
          axis_line = element_line(size = 0.5, colour = "black"),
          panel_grid_major = element_line(size = 0.05, colour = "black"),
          panel_grid_minor = element_line(size = 0.05, colour = "black"),
          axis_text = element_text(colour ='black'))
  )
fig

In [None]:
grid_data.loc[grid_data['gbm'] < 0.5, 'gbm'] = 0
grid_data.loc[grid_data['gbm'] >= 0.5, 'gbm'] = 1

In [None]:
example_data['class'] = example_data['class'].astype('category') # now we can cast `class` back into categorical
draw_points_ggplot2(example_data) + geom_point(data = grid_data, mapping = aes(x = 'x1', y = 'x2', colour = 'factor(gbm)'),  size = .5, alpha = 0.2)

### Gradient Boosting from sklearn (compare the results)

In [None]:
from sklearn.ensemble import GradientBoostingClassifier

gbm = GradientBoostingClassifier(n_estimators=3, random_state=1) # uses DecisionTreeRegressor by default

# train GBM on our data
gbm.fit(example_data[['x1','x2']], example_data[['class']])

In [None]:
grid_data['gbm_ensemble'] = model.predict(grid_data[['x1', 'x2']])
draw_points_ggplot2(example_data) + geom_point(data = grid_data, mapping = aes(x = 'x1', y = 'x2', colour = 'factor(gbm_ensemble)'),  size = .5, alpha = 0.2)

## Homework exercise 1: eXtreme Gradient Boosting (XGBoost)


<font color='red'> Let's finally build for ourselves a new shiny XGBoost model, the most popular algorithm for Kaggle competitions. </font>

<font color='red'> First, we need to load data (we shall use MNIST data again). </font>

In [None]:
# old school TF
%tensorflow_version 1.x

# MNIST lives here:
from tensorflow.examples.tutorials.mnist import input_data

# Downloading MNIST from tensorflow into MNIST_data/ folder
mnist = input_data.read_data_sets("MNIST_data/", one_hot=False)

# Extracting individual images and labels
images = np.vstack([img.reshape(-1,) for img in mnist.train.images])
labels = mnist.train.labels

# Split into train and test as before
train_images = images[0:2000,:]
train_labels = labels[0:2000]

test_images = images[2000:3000,:]
test_labels = labels[2000:3000]

<font color='red'> **(a)** Use the tutorial page (https://xgboost.readthedocs.io/en/latest/python/python_intro.html and https://www.kaggle.com/anktplwl91/mnist-xgboost) to fill in the gaps in the following code and traing the XGBoost model. **(3 points)** </font>

In [None]:
import xgboost as xgb

##### YOUR CODE STARTS #####

# XGBoosts wants data to be wrapped into special formats
dtrain = 
dtest = 

# most meaningful parameters
param_list = [("objective", "multi:softmax"), ("eval_metric", "merror"), ("num_class", 10)]

# Number of trees
n_rounds = 600

# if nothing seems to improve for 50 iterations - stop
early_stopping = 50

# train for training and test for ... validation!    
eval_list = 

# 1,2,3.. go!
bst = xgb.train(param_list, dtrain, n_rounds, evals=eval_list, early_stopping_rounds=early_stopping, verbose_eval=True)
##### YOUR CODE ENDS #####

<font color='red'> **(b)** Use the same tutorial page (https://xgboost.readthedocs.io/en/latest/python/python_intro.html) to find out how to evaluate the model **(1 point)** </font>

In [None]:
##### YOUR CODE STARTS #####

##### YOUR CODE ENDS #####

<font color='red'> Are you impressed with XGBoost performance? </font>

<font color='red'> **(c)** Train a simple KNN model from sklearn (KNeighborsClassifier) on the same trainign data and evaluate on the same validation data **(2 points)** </font>

In [None]:
from sklearn.neighbors import KNeighborsClassifier
knn = 
knn.fit
knn.score

<font color='red'> How these two models compare? </font>

In [None]:
# Write your comment here:


<font color='red'> **(d)** gain additional 2 bonus points if you can improve XGBoost's performance by at least 5% without changing the model parameters. **(2 bonus points)** </font>

In [None]:
##### YOUR CODE STARTS #####
# Do something here to improve XGBoost by 5%

##### YOUR CODE ENDS #####

In [None]:
##### YOUR CODE STARTS #####
# copy your solution to (a) here:

##### YOUR CODE ENDS #####

In [None]:
##### YOUR CODE STARTS #####
# evaluate your XGBoost model as before
# it should be better than before

##### YOUR CODE ENDS #####

# Stacking
On top of everything we have seen, you can still improve the results by training the meta-learner (meta-model) that would use predictions of other models as input.

In [None]:
# old school TF
%tensorflow_version 1.x

# MNIST lives here:
from tensorflow.examples.tutorials.mnist import input_data

# Downloading MNIST from tensorflow into MNIST_data/ folder
mnist = input_data.read_data_sets("MNIST_data/", one_hot=False)

# Extracting individual images and labels
images = np.vstack([img.reshape(-1,) for img in mnist.train.images])
labels = mnist.train.labels

# Split into train and test as before
train_images = images[0:2000,:]
train_labels = labels[0:2000]

test_images = images[2000:3000,:]
test_labels = labels[2000:3000]

First we should again train familiar three models

In [None]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import VotingClassifier

model1 = DecisionTreeClassifier()
model2 = KNeighborsClassifier()
model3 = LogisticRegression()

np.random.seed(1111) 
model1.fit(train_images, train_labels)

np.random.seed(1111) 
model2.fit(train_images, train_labels)

np.random.seed(1111) 
model3.fit(train_images, train_labels)


We can go ahead and test these machine learning models on the test data

In [None]:
model1_pred = model1.predict(test_images)
model2_pred = model2.predict(test_images)
model3_pred = model3.predict(test_images)

print(f"Accuracy of DT {model1.score(test_images, test_labels)*100}%")
print(f"Accuracy of NN {model2.score(test_images, test_labels)*100}%")
print(f"Accuracy of LR {model3.score(test_images, test_labels)*100}%")

Classical stacking, we need to run CV algorithm and record predictions made by each model on the hold out data. Then we will use theses predictions as training data for the meta-learner.

In [None]:
from sklearn.model_selection import StratifiedKFold

hold_out_pred_model1 = []
hold_out_pred_model2 = []
hold_out_pred_model3 = []

n_folds = 4

X = np.array(train_images)
y = np.array(train_labels)

# initialise splitting mechanism
folds = StratifiedKFold(n_splits=n_folds, shuffle = False, random_state=111) # no need to shuffle the data

# here actual splitting is done
folds.get_n_splits(X, y)

fold_indx = 1

# folds.split is an iterator that loops over different folds
# returning a tuple with train and val indeces
for train_index, val_index in folds.split(X, y):
  print(f"CV #{fold_indx}")
  X_train, X_val = X[train_index], X[val_index]
  y_train, y_val = y[train_index], y[val_index]

  # train all three models
  model1.fit(X_train, y_train)
  model2.fit(X_train, y_train)
  model3.fit(X_train, y_train)

  # make predictions on hold out set
  hold_out_pred_model1.append(model1.predict_proba(X_val)) # we use predict_proba function to get a vector of probabilities for each class
  hold_out_pred_model2.append(model2.predict_proba(X_val))
  hold_out_pred_model3.append(model3.predict_proba(X_val))

  fold_indx += 1

Let's concatenate all these predictions into one dataset. Each model outputs probabilities for each class (there are 10 classes in the dataset), which means that for each digit (2000 in the training data) we will have 10 values from each model, which adds up to 30 values in total (from 3 models). 

In [None]:
train_stacking = np.concatenate([np.concatenate(hold_out_pred_model1, axis = 0), 
                                 np.concatenate(hold_out_pred_model2, axis = 0), 
                                 np.concatenate(hold_out_pred_model3, axis = 0)], 
                                axis = 1)
train_stacking.shape

We need also a test set for the stacking model, but this is simpler

In [None]:
model1_pred = model1.predict_proba(test_images)
model2_pred = model2.predict_proba(test_images)
model3_pred = model3.predict_proba(test_images)

test_stacking = np.concatenate([model1_pred, 
                                model2_pred, 
                                model3_pred], 
                                axis = 1)

test_stacking.shape

Train another model (e.g. LogisticRegression or DecisionTree or something else) on these predictions

In [None]:
from sklearn.svm import SVC
stacking_model = SVC()

np.random.seed(1111) 
stacking_model.fit(train_stacking, train_labels)

In [None]:
print(f"Accuracy of stacking ensemble {stacking_model.score(test_stacking, test_labels)*100}%")

## Homework exercise 2: implement blending approach
<font color='red'> In this exercise you will practice using blending approach to meta-learning. </font>

<font color='red'> **(a)** to implement blending we first need to create a separate validation set that would be independent from training and test data. Below, use images from 0 to 1500 as training data, images from 1500 to 2000 as validation and from 2000 to 3000 as a test set. **(1 point)** </font>

In [None]:
##### YOUR CODE STARTS #####
train_images = 
train_labels = 

val_images = 
val_labels = 

test_images = 
test_labels = 
##### YOUR CODE ENDS #####

<font color='red'> **(b)** Train three models (decision tree, k nearest neighbors classifier, and the logistic regression) with default parameters on the train data. **(1 point)** </font>

In [None]:
##### YOUR CODE STARTS #####
model1 = 
model2 = 
model3 = 
##### YOUR CODE ENDS #####

np.random.seed(1111) 
##### YOUR CODE STARTS #####
model1
##### YOUR CODE ENDS #####

np.random.seed(1111) 
##### YOUR CODE STARTS #####
model2
##### YOUR CODE ENDS #####

np.random.seed(1111) 
##### YOUR CODE STARTS #####
model3
##### YOUR CODE ENDS #####

<font color='red'> **(c)** Create a training set for the meta-learner by concatenating the predictions made by individual models on validation images. Hint: use function `np.concatenate` and `predict_proba` as we did for stacking. **(1 point)** </font>

In [None]:
##### YOUR CODE STARTS #####
train_blending = 
##### YOUR CODE ENDS #####

train_blending_labels = val_labels
train_blending.shape # if all was done correctly this shape should be (500, 30)

<font color='red'> **(d)** Create a test set for the meta-learner by concatenating the predictions made by each model on test images. Use the same function as in the cell above. **(1 point)** </font>

In [None]:
##### YOUR CODE STARTS #####
test_blending = 
##### YOUR CODE ENDS #####

test_blending.shape

<font color='red'> **(e)** Use SVM model as a meta-learner and train it on the `train_blending` data. **(1 point)** </font>

In [None]:
from sklearn.svm import SVC
np.random.seed(1111) 

##### YOUR CODE STARTS #####
blending_model = 
blending_model
##### YOUR CODE ENDS #####

<font color='red'> **(f)** Evaluate the performance of the blending ensemble on the test set and comment on the difference between blending and stacking.  **(1 point)** </font>

In [None]:
##### YOUR CODE STARTS #####
print()
##### YOUR CODE ENDS #####

In [None]:
# What is your take on the difference between blending and stacking?
# Which one would you prefer and why?
# Comment here:


# Thank you!