###### The University of Melbourne, School of Computing and Information Systems
# COMP30027 Machine Learning, 2021 Semester 1

## Week 8 - Practical Workshop

Today, we first examine **Logistic Regression** classifier. Then we will use many of the classifier models that we covered so far to build different ensembles and analyse the outputs.


### Exercise 1. 
Let's start with *Logistic Regression*. Use the IRIS dataset (again) and train a Logistic Regression model.

In [1]:
from sklearn import datasets
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split


In [2]:
iris = datasets.load_iris()

X = iris.data
y = iris.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=88)

lgr = LogisticRegression()
lgr.fit(X_train,y_train)
print("Accuracy:",lgr.score(X_test,y_test))

Accuracy: 0.88




#### Exercise 1. (a)
Now using the same split compare the results form Logistic Regression with other classifiers we covered so far. You may use: Zero-R, Gaussian Naive Bayes, Multinomial Naive Bayes,linear SVM, kNN and Decision Tree.

Compare their accuracy and the time required for prediction. Analyse the results.

Note: Please use the classifiers default hyper parameters (No tunning).

In [3]:
from sklearn import svm
from sklearn.naive_bayes import GaussianNB, MultinomialNB
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.dummy import DummyClassifier
import time


models = [DummyClassifier(strategy='most_frequent'),
          GaussianNB(),
          MultinomialNB(),
          svm.LinearSVC(),
          DecisionTreeClassifier(),
          KNeighborsClassifier(),
          LogisticRegression()]
titles = ['Zero-R',
          'GNB',
          'MNB',
          'LinearSVC',
          'Decision Tree',
          'KNN',
          'Logistic Regression']

for title, model in zip(titles, models):
    model.fit(X_train,y_train)
    start = time.time()
    acc = model.score(X_test,y_test)
    end = time.time()
    t = end - start
    print(title, "Accuracy:",acc, 'Time:', t)


Zero-R Accuracy: 0.24 Time: 0.0
GNB Accuracy: 0.96 Time: 0.001001119613647461
MNB Accuracy: 0.58 Time: 0.0
LinearSVC Accuracy: 0.9 Time: 0.0
Decision Tree Accuracy: 0.9 Time: 0.0
KNN Accuracy: 0.92 Time: 0.0020012855529785156
Logistic Regression Accuracy: 0.88 Time: 0.0




Why `sklearn` is not happy with us when using Linear SVM?

*Because the problem is not fully linear (or linearly separable)*

#### Exercise 1.(b)
Do the same comparision using the 10-fold Cross-Validation evaluation strategy. Analyse the results.

In [4]:
from sklearn.model_selection import cross_val_score

for title, model in zip(titles, models):
    start = time.time()
    acc = np.mean(cross_val_score(model, X, y, cv=10))
    end = time.time()
    t = end - start
    print(title, "Accuracy:",acc, 'time:', t)

Zero-R Accuracy: 0.33333333333333337 time: 0.0040035247802734375
GNB Accuracy: 0.9533333333333334 time: 0.0070476531982421875
MNB Accuracy: 0.9533333333333334 time: 0.006998777389526367
LinearSVC Accuracy: 0.9666666666666668 time: 0.06806135177612305
Decision Tree Accuracy: 0.96 time: 0.006005525588989258
KNN Accuracy: 0.9666666666666668 time: 0.012011051177978516
Logistic Regression Accuracy: 0.9533333333333334 time: 0.010009050369262695




*There are a few things we can notice here; one is that scikit-learn is not impressed that we are trying to cross-validate with so few instances of each class. This will make stratification impossible, which is undesirable in an evaluation framework.*

*Looking at the performance, we can see that except Zero-R most of the classifier have a roughly similar results. What is somewhat surprising here is that the Gaussian NB results are very different beween holdout strategy and cross-validation - perhaps the distributions are not normal because the data has outliers*


### Exercise 2
Getting to the concept of *stacking*. We want train a meta-classifier (level-1 model) over the outputs of the base classifiers (level-0 model). 

#### Exercise 2.(a)
Using the IRIS dataset, build a stacking of the classifiers:
- Zero_R
- Logistic Regression
- KNN
- Gaussian NB
- Multinomial NB
- Decsion Tree

Scikit-learn does support stacking. Check the followig: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.StackingClassifier.html

However, to have a better understanding of Stacking, we can implement it by ourselves based on the following steps:
- We need to train each of our models (using fit()),
- And then classify each training instance (using predict()),
- We build up a matrix where the instances are composed of attributes, which correspond to the predictions of each model on this training instance1.
- We then train our final learner on this matrix of predictions.

**NOTE:** You should think about which classifier is most suited to being the final meta–classifier in this situation.

#### *Answer*

*As mentioned in the lecture slides a simple choice for final meta-classifier of stacking can be logistic regression. We can also try other classifiers like decision tree, or nonlinear SVC as meta-classifier.*



In [11]:
from sklearn.metrics import accuracy_score

np.random.seed(1)

class StackingClassifier():

    def __init__(self, classifiers, metaclassifier):
        self.classifiers = classifiers
        self.metaclassifier = metaclassifier

    def fit(self, X, y):
        for clf in self.classifiers:
            clf.fit(X, y)
        X_meta = self._predict_base(X)
        self.metaclassifier.fit(X_meta, y)
    
    def _predict_base(self, X):
        yhats = []
        for clf in self.classifiers:
            yhat = clf.predict_proba(X)
            yhats.append(yhat)
        yhats = np.concatenate(yhats, axis=1)
        assert yhats.shape[0] == X.shape[0]
        return yhats
    
    def predict(self, X):
        X_meta = self._predict_base(X)     
        yhat = self.metaclassifier.predict(X_meta)
        return yhat
    def score(self, X, y):
        yhat = self.predict(X)
        return accuracy_score(y, yhat)
    


classifiers = [DummyClassifier(strategy='most_frequent'),
                LogisticRegression(),
                KNeighborsClassifier(),
                GaussianNB(),
                MultinomialNB()]
titles = ['Zero_R',
          'Logistic Regression',
          'KNN',
          'Gaussian NB',  
          'Multinomial NB']



meta_classifier_lr = LogisticRegression()
stacker_lr = StackingClassifier(classifiers, meta_classifier_lr)

meta_classifier_dt = DecisionTreeClassifier()
stacker_dt = StackingClassifier(classifiers, meta_classifier_dt)

In [12]:
print("IRIS dataset\n")
for title,clf in zip(titles,classifiers):
    clf.fit(X_train,y_train)
    print(title, "Accuracy:",clf.score(X_test,y_test))
    
stacker_lr.fit(X_train, y_train)
print('\nStacker Accuracy (Logistic Regression):', stacker_lr.score(X_test, y_test))

stacker_dt.fit(X_train, y_train)
print('\nStacker Accuracy (Decision Tree):', stacker_dt.score(X_test, y_test))


IRIS dataset

Zero_R Accuracy: 0.24
Logistic Regression Accuracy: 0.88
KNN Accuracy: 0.92
Gaussian NB Accuracy: 0.96
Multinomial NB Accuracy: 0.58

Stacker Accuracy (Logistic Regression): 0.96

Stacker Accuracy (Decision Tree): 0.96




#### [OPTIONAL] Exercise 2.(b)
Use the same *stack* to process the `car` dataset use holdout strategy with 30% split ratio.

In [15]:
from sklearn.preprocessing import OneHotEncoder


def load_data(i_file):
    X = []
    y = []
    with open(i_file, mode='r') as fin:
        for line in fin:
            atts = line.strip().split(",")
            X.append(atts[:-1]) #all atts minus the last one
            y.append(atts[-1])
    onehot = OneHotEncoder()
    X = onehot.fit_transform(X).toarray()
    return X, y


X, y = load_data('car.data')

#print('labels:', set(y))
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=30027)

print("Car dataset\n")
for title,clf in zip(titles,classifiers):
    clf.fit(X_train,y_train)
    print(title, "Accuracy:",clf.score(X_test,y_test))
    
#meta_classifier_lr = LogisticRegression()
#stacker = StackingClassifier(classifiers, meta_classifier_lr)
    
stacker_dt.fit(X_train, y_train)
print('\nStacker Accuracy:', stacker_dt.score(X_test, y_test))


Car dataset

Zero_R Accuracy: 0.6742556917688266
Logistic Regression Accuracy: 0.8458844133099825
KNN Accuracy: 0.9001751313485113
Gaussian NB Accuracy: 0.8178633975481612
Multinomial NB Accuracy: 0.8143607705779334

Stacker Accuracy: 0.9457092819614711




### Exercise 3
Bagging is often associated with Decision Trees, but in scikit-learn , it can be applied to any learner. 

If we use bagging with Decision Tree, we will build a number of Decision Trees by re-sampling the data:
- For each tree, we randomly select (with repetition) N instances out of the possible N instances, so that we have the same sized data as the deterministic decision tree, but each one is based around a different data set
- We then build the tree as usual.
- We classify the test instance by **voting** - each tree gets a vote (the class it would predict for the test instance), and the class with the plurality wins.

#### Exercise 3.(a)
Load the `lymphography` dataset and implement bagging of 10 estimator for kNN. 


In [18]:
from sklearn.ensemble import BaggingClassifier

X, y = load_data('lymphography.data')

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=30027)

KNN = KNeighborsClassifier()
bagging = BaggingClassifier(base_estimator=KNeighborsClassifier(),n_estimators=10,\
                              max_samples=0.5, max_features=0.5)
KNN.fit(X_train,y_train)
bagging.fit(X_train,y_train)

print("KNN:",KNN.score(X_test,y_test))
print("KNN Bagging Accuracy:",bagging.score(X_test,y_test))

KNN: 0.40816326530612246
KNN Bagging Accuracy: 0.5510204081632653


In case you used a LabelEncoder before this OneHotEncoder to convert the categories to integers, then you can now use the OneHotEncoder directly.


Look at the documentation https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.BaggingClassifier.html.

Bagging classifier builds N base classifier, each base classifier is trained/fitted on a subset of features/samples. For each base classifier:

- Randomly select max_features * X.shape[1] subset of features.
- Randomly select max_samples * X.shape[0] subset of samples.
- Create a new X_base from the selected features and samples.
- Fit the base classifier on X_base and y_base.

Then use Voting or averaging to combine the prediction of the base classifier for X_test.

#### Exercise 3.(b)
What are the significance of max_samples and max_features , and why might we wish to use values less than 1.0?

Build 3 differnt Decision Tree bagging classifiers using differnt combinations of max_samples and max_features. Can you analyse the results?

In [19]:
DT = DecisionTreeClassifier()
bagging_one = BaggingClassifier(base_estimator=DecisionTreeClassifier(),n_estimators=10,\
                              max_samples=1.0, max_features=1.0)
bagging_two = BaggingClassifier(base_estimator=DecisionTreeClassifier(),n_estimators=10,\
                              max_samples=0.5, max_features=1.0)
bagging_three = BaggingClassifier(base_estimator=DecisionTreeClassifier(),n_estimators=10,\
                              max_samples=0.5, max_features=0.5)

DT.fit(X_train,y_train)
bagging_one.fit(X_train,y_train)
bagging_two.fit(X_train,y_train)
bagging_three.fit(X_train,y_train)

print("DT:",DT.score(X_test,y_test))
print("Option 1: bagging Accuracy:",bagging_one.score(X_test,y_test))
print("Option 2: bagging Accuracy:",bagging_two.score(X_test,y_test))
print("Option 3: bagging Accuracy:",bagging_three.score(X_test,y_test))

DT: 0.42857142857142855
Option 1: bagging Accuracy: 0.46938775510204084
Option 2: bagging Accuracy: 0.5102040816326531
Option 3: bagging Accuracy: 0.46938775510204084


*If max_features=1.0 and max_samples=1.0 then all the base classifiers will probably be similar so there will be no point in combining them.*