###### The University of Melbourne, School of Computing and Information Systems
# COMP30027 Machine Learning, 2020 Semester 1

## Week 8 - Practical Workshop

Today, we first examine **Logistic Regression** classifier. Then we will use many of the classifier models that we covered so far to build different ensembles and analyse the outputs.


### Exercise 1. 
Let's start with *Logistic Regression*. Use the IRIS dataset (again) and train a Logistic Regression model.

In [1]:
from sklearn import datasets
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split


In [None]:
iris = datasets.load_iris()

X = iris.data
y = iris.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=88)

lgr = LogisticRegression()
lgr.fit(..,..)
print("Accuracy:",lgr.score(..,..))

#### Exercise 1. (a)
Now using the same split, compare the results form Logistic Regression with other classifiers we covered so far. You may use: Zero-R, Gaussian Naive Bayes, Multinomial Naive Bayes,linear SVM, kNN and Decision Tree.

Compare their accuracy and the time required for prediction. Analyse the results.

Note: Please use the classifiers default hyper parameters (No tunning).

In [None]:
from sklearn import svm
from sklearn.naive_bayes import GaussianNB, MultinomialNB
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.dummy import DummyClassifier
import time


models = [DummyClassifier(strategy='most_frequent'),
          GaussianNB(),
          MultinomialNB(),
          ...
          DecisionTreeClassifier(),
          KNeighborsClassifier(),
          LogisticRegression()]
titles = ['Zero-R',
          'GNB',
          'MNB',
          'LinearSVC',
          'Decision Tree',
          'KNN',
          'Logistic Regression']

for title, model in zip(titles, models):
    model.fit(X_train,y_train)
    start = time.time()
    acc = ...
    end = time.time()
    t = ...
    print(title, "Accuracy:",acc, 'Time:', t)

Why `sklearn` is not happy with us when using Linear SVM?

#### Exercise 1.(b)
Do the same comparision using the 10-fold Cross-Validation evaluation strategy. Analyse the results.

In [None]:
from sklearn.model_selection import cross_val_score

for title, model in zip(titles, models):
    start = time.time()
    acc = ...
    end = time.time()
    t = ...
    print(title, "Accuracy:",acc, 'time:', t)

### Exercise 2
Getting to the concept of *stacking*. We want train a meta-classifier (level-1 model) over the outputs of the base classifiers (level-0 model). 

#### Exercise 2.(a)
Using the IRIS dataset, build a stacking of the classifiers:
- Zero_R
- Logistic Regression
- KNN
- Gaussian NB
- Multinomial NB
- Decsion Tree

Scikit-learn does support stacking. Check the followig: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.StackingClassifier.html

However, to have a better understanding of Stacking, we can implement it by ourselves based on the following steps:
- We need to train each of our models (using fit()),
- And then classify each training instance (using predict()),
- We build up a matrix where the instances are composed of attributes, which correspond to the predictions of each model on this training instance1.
- We then train our final learner on this matrix of predictions.

**NOTE:** You should think about which classifier is most suited to being the final meta–classifier in this situation.

In [5]:
from sklearn.metrics import accuracy_score

np.random.seed(1)

class StackingClassifier():

    def __init__(self, classifiers, metaclassifier):
        self.classifiers = classifiers
        self.metaclassifier = metaclassifier

    def fit(self, X, y):
        for clf in self.classifiers:
            clf.fit(X, y)
        X_meta = self._predict_base(X)
        self.metaclassifier.fit(X_meta, y)
    
    def _predict_base(self, X):
        yhats = []
        for clf in self.classifiers:
            yhat = clf.predict_proba(X)
            yhats.append(yhat)
        yhats = np.concatenate(yhats, axis=1)
        assert yhats.shape[0] == X.shape[0]
        return yhats
    
    def predict(self, X):
        X_meta = self._predict_base(X)     
        yhat = self.metaclassifier.predict(X_meta)
        return yhat
    def score(self, X, y):
        yhat = self.predict(X)
        return accuracy_score(y, yhat)
    


classifiers = [...]
titles = [...]

meta_classifier = ...
stacker = StackingClassifier(classifiers, meta_classifier)

In [None]:
print("IRIS dataset\n")

for title,clf in zip(titles,classifiers):
    clf.fit(X_train,y_train)
    print(title, "Accuracy:",...)
    
stacker.fit(X_train, y_train)
print('\nStacker Accuracy:', ...)


#### [OPTIONAL] Exercise 2.(b)
Use the same stack to process the `car` dataset use holdout strategy with 30% split ratio.

In [None]:
from sklearn.preprocessing import OneHotEncoder

def load_data(i_file):
    .
    .
    .
    return X, y

X, y = load_data('car.data')

#print('labels:', set(y))
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=30027)

print("Car dataset\n")

for title,clf in zip(titles,classifiers):
    clf.fit(X_train,y_train)
    print(title, "Accuracy:",...)
    
stacker.fit(X_train, y_train)
print('\nStacker Accuracy:', ...)


### Exercise 3
Bagging is often associated with Decision Trees, but in scikit-learn , it can be applied to any learner. 

If we use bagging with Decision Tree, we will build a number of Decision Trees by re-sampling the data:
- For each tree, we randomly select (with repetition) N instances out of the possible N instances, so that we have the same sized data as the deterministic decision tree, but each one is based around a different data set
- We then build the tree as usual.
- We classify the test instance by **voting** - each tree gets a vote (the class it would predict for the test instance), and the class with the plurality wins.

#### Exercise 3.(a)
Load the `lymphography` dataset and implement bagging of 10 estimator for kNN. 


In [93]:
from sklearn.ensemble import BaggingClassifier

X, y = load_data('lymphography.data')

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=30027)

KNN = KNeighborsClassifier()
bagging = BaggingClassifier(base_estimator=...,n_estimators=..., max_samples=0.5, max_features=0.5)

KNN.fit(X_train,y_train)
bagging.fit(X_train,y_train)

print("KNN:",...)
print("KNN Bagging Accuracy:",...)

KNN bagging Accuracy: 0.7618213660245184


Look at the documentation https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.BaggingClassifier.html.

Bagging classifier builds N base classifier, each base classifier is trained/fitted on a subset of features/samples. For each base classifier:

- Randomly select max_features * X.shape[1] subset of features.
- Randomly select max_samples * X.shape[0] subset of samples.
- Create a new X_base from the selected features and samples.
- Fit the base classifier on X_base and y_base.

Then use Voting or averaging to combine the prediction of the base classifier for X_test.

#### Exercise 3.(b)
What are the significance of max_samples and max_features , and why might we wish to use values less than 1.0?

Build 3 differnt Decision Tree bagging classifiers using differnt combinations of max_samples and max_features. Can you analyse the results?

In [None]:
DT = DecisionTreeClassifier()
bagging_one = BaggingClassifier(base_estimator=DecisionTreeClassifier(),n_estimators=10,\
                              max_samples=..., max_features=...)
bagging_two = BaggingClassifier(base_estimator=DecisionTreeClassifier(),n_estimators=10,\
                              max_samples=..., max_features=...)
bagging_three = BaggingClassifier(base_estimator=DecisionTreeClassifier(),n_estimators=10,\
                              max_samples=..., max_features=...)

DT.fit(X_train,y_train)
bagging_one.fit(X_train,y_train)
bagging_two.fit(X_train,y_train)
bagging_three.fit(X_train,y_train)

print("DT:",DT.score(X_test,y_test))
print("Option 1: bagging Accuracy:",bagging_one.score(X_test,y_test))
print("Option 2: bagging Accuracy:",bagging_two.score(X_test,y_test))
print("Option 3: bagging Accuracy:",bagging_three.score(X_test,y_test))