# Classification on Iris Dataset

## IMPORTANT: make sure to rerun all the code from the beginning to obtain the results for the final version of your notebook, since this is the way we will do it before evaluting your notebook!!!

### Dataset description

We will be working with the famous “Iris” dataset that has been deposited on the UCI machine learning repository (https://archive.ics.uci.edu/ml/datasets/Iris).
The iris dataset contains measurements for 150 iris flowers from three different species.

The three classes in the Iris dataset:

- Iris-setosa (n=50)

- Iris-versicolor (n=50)

- Iris-virginica (n=50)



### Four features (regressors) are considered for the Iris dataset:



1) sepal length in cm

2) sepal width in cm

3) petal length in cm

4) petal width in cm




We first import all the packages that are needed

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt


import numpy as np
import scipy as sp
from scipy import stats
from sklearn import datasets
from sklearn import linear_model

# Perceptron
We will implement the perceptron and use it to learn a halfspace.

**TO DO** Set the random seed to your ID (matricola).

In [None]:
IDnumber = 1110975
np.random.seed(IDnumber)

Load the dataset from scikit learn and then split in training set and test set (50%-50%) after applying a random permutation to the datset.

In [None]:
# Load the dataset from scikit learn
iris = datasets.load_iris()
m = iris.data.shape[0]
permutation = np.random.permutation(m)
X, Y = iris.data[permutation], iris.target[permutation]

We are going to classify class "1" vs the other two classes (0 and 2). We are going to relabel the other classes (0 and 2) as "-1" so that we can use it directly with the perceptron.

In [None]:
#let's relabel classes 0 and 2 as -1
for i in range(len(Y)):
    if Y[i] != 1:
        Y[i] = -1

**TO DO** Divide the data into training set and test set (50% of the data each)

In [None]:
#Divide in training and test: make sure that your training set
#contains at least 10 elements from class 1 and at least 10 elements
#from class -1! If it does not, modify the code so to apply more random
#permutations (or the same permutation multiple times) until this happens.
#IMPORTANT: do not change the random seed.

#m_training needs to be the number of samples in the training set
m_training = m/2

#m_test needs to be the number of samples in the test set
m_test = m/2

# Since the permutation is random, I just split the samples in the middle.
# The while loop assures that the condition that there are at least 10
# elements from class 1  and at least 10 elements from class -1 in the 
# training set is met.
while True:
    # Instances for training set
    X_training = X[m_training:]
    # Labels for the training set
    Y_training = Y[m_training:]
    # Instances for test set
    X_test = X[:m_test]
    # Labels for the test set
    Y_test = Y[:m_test]
    
    # Check the condition counting the elements for each class
    count1, countm1 = 0, 0
    for i in range(Y_test.shape[0]):
        if count1 < 10 or countm1 < 10:
            if Y_test[i] == 1:
                count1 += 1
            else:
                countm1 += 1
        else:
            break # exit the loop as soon as the two lower bounds are met
            
    # if the conditions are met exit the while loop, otherwise keep trying
    if count1 >= 10 and countm1 >= 10:
        break
    else:
        permutation = np.random.permutation(m)
        X, Y = iris.data[permutation], iris.target[permutation]
        for i in range(len(Y)):
            if Y[i] != 1:
                Y[i] = -1

        
print Y_training #to make sure that Y_training contains both 1 and -1

**TO DO** Now add a 1 in front of each sample so that we can use a vector to describe all the coefficients of the model. You can use the function $hstack$ in $numpy$

In [None]:
#add a 1 to each sample
X_training = np.insert(X_training, 0, 1, axis=1)
X_test = np.insert(X_test, 0, 1, axis=1)

**TO DO** Now complete the function *perceptron*. Since the perceptron does not terminate if the data is not linearly separable, your implementation should return the desired output (see below) if it reached the termination condition seen in class or if a maximum number of iterations have already been run, where 1 iteration corresponds to 1 update of the perceptron weights. In case the termination is reached because the maximum number of iterations have been completed, the implementation should return **the best model** seen up to now.

The input parameters to pass are:
- $X$: the matrix of input features, one row for each sample
- $Y$: the vector of labels for the input features matrix X
- $max\_num\_iterations$: the maximum number of iterations for running the perceptron

The output values are:
- $best\_w$: the vector with the coefficients of the best model
- $best\_error$: the *fraction* of missclassified samples for the best model

In [None]:
def perceptron(X, Y, max_num_iterations):
    best_w = np.zeros(X.shape[1])
    best_error = 0.
    i = 0
    while (i <= max_num_iterations):
        for j in range(X.shape[0]):
            if np.sign(np.inner(best_w, X[j])) != Y[j] and j <= X.shape[0]-1: # I found a mismatched sample
                best_w = np.add(best_w, np.dot(Y[j], X[j]))
                i += 1 # 1 iteration corresponds to 1 update
                break;
            elif j == X.shape[0]-1: # if there is no mismatched sample, il return the best model and the best_error
                for k in range(X.shape[0]):
                    if np.sign(np.inner(best_w, X[k])) != Y[k]:
                        best_error += 1.
                
                return best_w, best_error
    
    for k in range(X.shape[0]):
        if np.sign(np.inner(best_w, X[k])) != Y[k]:
            best_error += 1.
            
    best_error /= X.shape[0]
    
    return best_w, best_error

Now we use the implementation above of the perceptron to learn a model from the training data using 100 iterations and print the error of the best model we have found.

In [None]:
#now run the perceptron for 100 iterations
w_found, error = perceptron(X_training, Y_training, 100)
print w_found, error

**TO DO** use the best model $w\_found$ to predict the labels for the test dataset and print the fraction of missclassified samples in the test set (that is an estimate of the true loss).

In [None]:
#now use the w_found to make predictions on test dataset

num_errors = 0.
for i in range(X_test.shape[0]):
    if np.sign(np.inner(w_found, X_test[i])) != Y_test[i]:
        num_errors += 1

true_loss_estimate = num_errors/m_test
#NOTE: you can avoid using num_errors if you prefer, as long as true_loss_estimate is correct
print true_loss_estimate

**TO DO** Copy the code from the last 2 cells above in the cell below and repeat the training with 10000 iterations. Then print the error in the training set and the estimate of the true loss obtained from the test set.

In [None]:
#now run the perceptron for 10000 iterations here!

w_found, error = perceptron(X_training, Y_training, 10000)

print w_found, error

num_errors = 0.

for i in range(X_test.shape[0]):
    if np.sign(np.inner(w_found, X_test[i])) != Y_test[i]:
        num_errors += 1

true_loss_estimate = num_errors/m_test
print true_loss_estimate

**TO DO** [Answer the following] What changes in the training error and in the test error (in terms of fraction of missclassified samples)? Explain what you observe. [Write the answer in this cell]

**ANSWER**
With more iteration we observe a (slightly) better training error, meaning that the perceptron had more "time" to find better coefficients. In terms of test error, the result is basically the same so, even tough the perceptron did find coefficients that works better on the training set, this improvement does not reflect on data that it has not seen yet, such as data in the test set. Nevertheless, we can say that the coefficients found are not that bad since the difference between the fraction of missclassified samples with respect to the training/test error is not that high.

# Logistic Regression
Now we use logistic regression, as implemented in Scikit-learn, to predict labels. We first do it for 2 labels and then for 3 labels. We will also plot the decision region of logistic regression.

We first load the dataset again.

In [None]:
# Load the dataset from scikit learn
iris = datasets.load_iris()
m = iris.data.shape[0]
permutation = np.random.permutation(m)
X, Y = iris.data[permutation], iris.target[permutation]

**TO DO** As for the previous part, divide the data into training and test (50%-50%), relabel classes 0 and 2 as -1, and add a 1 as first component to each sample.

In [None]:
#Divide in training and test: make sure that your training set
#contains at least 10 elements from class 1 and at least 10 elements
#from class -1! If it does not, modify the code so to apply more random
#permutations (or the same permutation multiple times) until this happens.
#IMPORTANT: do not change the random seed.

m_training = m/2
m_test = m/2

#let's relabel classes 0 and 2 as -1
for i in range(len(Y)):
    if Y[i] != 1:
        Y[i] = -1

# Since the permutation is random, I just split the samples in the middle.
# The while loop assures that the condition that there are at least 10
# elements from class 1  and at least 10 elements from class -1 in the 
# training set is met.
while True:
    # Instances for training set
    X_training = X[m_training:]
    # Labels for the training set
    Y_training = Y[m_training:]
    # Instances for test set
    X_test = X[:m_test]
    # Labels for the test set
    Y_test = Y[:m_test]
    
    # Check the condition counting the elements for each class
    count1, countm1 = 0, 0
    for i in range(Y_test.shape[0]):
        if count1 < 10 or countm1 < 10:
            if Y_test[i] == 1:
                count1 += 1
            else:
                countm1 += 1
        else:
            break # exit the loop as soon as the two lower bounds are met
            
    # if the conditions are met exit the while loop, otherwise keep trying
    if count1 >= 10 and countm1 >= 10:
        break
    else:
        permutation = np.random.permutation(m)
        X, Y = iris.data[permutation], iris.target[permutation]
        for i in range(len(Y)):
            if Y[i] != 1:
                Y[i] = -1
    
#add a 1 to each sample
X_training = np.insert(X_training, 0, 1, axis=1)
X_test = np.insert(X_test, 0, 1, axis=1)

To define a logistic regression model in Scikit-learn use the instruction

$linear\_model.LogisticRegression(C=1e5)$

($C$ is a parameter related to *regularization*, a technique that
we will see later in the course. Setting it to a high value is almost
as ignoring regularization, so the instruction above corresponds to the
logistic regression you have seen in class.)

To learn the model you need to use the $fit(...)$ instruction and to predict you need to use the $predict(...)$ function. See the Scikit-learn documentation for how to use it.

**TO DO** Define the logistic regression model, then learn the model using the training set and predict on the test set. Then print the fraction of samples missclassified in the training set and in the test set.

In [None]:
#part on logistic regression for 2 classes
logreg = linear_model.LogisticRegression(C=1e5)

#learn from training set
learning_training = logreg.fit(X_training, Y_training)

#predict on training set
prediction_training = logreg.predict(X_training)

#print the error rate = fraction of missclassified samples
error_rate_training = 0.
for i in range(prediction_training.shape[0]):
    if prediction_training[i] != Y_training[i]:
        error_rate_training += 1.
error_rate_training /= m_training
print "Error rate on training set: "+str(error_rate_training)

#predict on test set
prediction_test = logreg.predict(X_test)

#print the error rate = fraction of missclassified samples
error_rate_test = 0.
for i in range(prediction_test.shape[0]):
    if prediction_test[i] != Y_test[i]:
        error_rate_test += 1.
error_rate_test /= m_test
print "Error rate on test set: "+str(error_rate_test)

Now we do logistic regression for classification with 3 classes.

**TO DO** First: let's load the data once again (with the same permutation from before).

In [None]:
#part on logistic regression for 3 classes

#Divide in training and test: make sure that your training set
#contains at least 10 elements from each of the 3 classes!
#If it does not, modify the code so to apply more random
#permutations (or the same permutation multiple times) until this happens.
#IMPORTANT: do not change the random seed.
X = iris.data[permutation]
Y = iris.target[permutation]

# Since the permutation is random, I just split the samples in the middle.
# The while loop assures that the condition that there are at least 10
# elements from each class is met.
while True:
    # Instances for training set
    X_training = X[m_training:]
    # Labels for the training set
    Y_training = Y[m_training:]
    # Instances for test set
    X_test = X[:m_test]
    # Labels for the test set
    Y_test = Y[:m_test]
    
    # Check the condition counting the elements for each class
    count0, count1, count2 = 0, 0, 0
    for i in range(Y_test.shape[0]):
        if count0 < 10 or count1 < 10 or count2 < 10:
            if Y_test[i] == 0:
                count0 += 1
            elif Y_test[i] == 1:
                count1 += 1
            else:
                count2 += 1
        else:
            break # exit the for loop as soon as the three lower bounds are met
            
    # if the conditions are met exit the while loop, otherwise keep trying
    if count0 >= 10 and count1 >= 10 and count2 >= 10:
        break
    else:
        permutation = np.random.permutation(m)
        X, Y = iris.data[permutation], iris.target[permutation]

**TO DO** Now perform logistic regression (instructions as before) for 3 classes, learning a model from the training set and predicting on the test set. Print the fraction of missclassified samples on the training set and the fraction of missclassified samples on the test set.

In [None]:
#part on logistic regression for 3 classes
logreg = linear_model.LogisticRegression(C=1e5)

#learn from training set
learning_training = logreg.fit(X_training, Y_training)

#predict on training set
prediction_training = logreg.predict(X_training)

#print the error rate = fraction of missclassified samples
error_rate_training = 0.
for i in range(prediction_training.shape[0]):
    if prediction_training[i] != Y_training[i]:
        error_rate_training += 1.
error_rate_training /= m_training
print "Error rate on training set: "+str(error_rate_training)

#predict on test set
prediction_test = logreg.predict(X_test)

#print the error rate = fraction of missclassified samples
error_rate_test = 0.
for i in range(prediction_test.shape[0]):
    if prediction_test[i] != Y_test[i]:
        error_rate_test += 1.
error_rate_test /= m_test
print "Error rate on test set: "+str(error_rate_test)

**TO DO** Now pick two features and restrict the dataset to include only two features, whose indices are specified in the $feature$ vector below. Then split into training and test.

In [None]:
#to make the plot we need to reduce the data to 2D, so we choose two features
features_list = ['sepal length', 'sepal width', 'petal length', 'petal width']
labels_list = ['Iris-setosa', 'Iris-versicolor', 'Iris-virginica']

index_feature1 = 0
index_feature2 = 2
features = [index_feature1, index_feature2]

feature_name0 = features_list[features[0]]
feature_name1 = features_list[features[1]]

X = X[:,features]

# Since the permutation is random, I just split the samples in the middle.
# The while loop assures that the condition that there are at least 10
# elements from each class is met.
while True:
    # Instances for training set
    X_training = X[m_training:]
    # Labels for the training set
    Y_training = Y[m_training:]
    # Instances for test set
    X_test = X[:m_test]
    # Labels for the test set
    Y_test = Y[:m_test]
    
    # Check the condition counting the elements for each class
    count0, count1, count2 = 0, 0, 0
    for i in range(Y_test.shape[0]):
        if count0 < 10 or count1 < 10 or count2 < 10:
            if Y_test[i] == 0:
                count0 += 1
            elif Y_test[i] == 1:
                count1 += 1
            else:
                count2 += 1
        else:
            break # exit the for loop as soon as the three lower bounds are met
            
    # if the conditions are met exit the while loop, otherwise keep trying
    if count0 >= 10 and count1 >= 10 and count2 >= 10:
        break
    else:
        permutation = np.random.permutation(m)
        X, Y = iris.data[permutation], iris.target[permutation]

Now learn a model using the training data.

In [None]:
#part on logistic regression for three classes restricted to only two features
logreg = linear_model.LogisticRegression(C=1e5)

#learn from training set
learning_training = logreg.fit(X_training, Y_training)

#predict on training set
prediction_training = logreg.predict(X_training)

#print the error rate = fraction of missclassified samples
error_rate_training = 0.
for i in range(prediction_training.shape[0]):
    if prediction_training[i] != Y_training[i]:
        error_rate_training += 1.
error_rate_training /= m_training
print "Error rate on training set: "+str(error_rate_training)

#predict on test set
prediction_test = logreg.predict(X_test)

#print the error rate = fraction of missclassified samples
error_rate_test = 0.
for i in range(prediction_test.shape[0]):
    if prediction_test[i] != Y_test[i]:
        error_rate_test += 1.
error_rate_test /= m_test
print "Error rate on test set: "+str(error_rate_test)

If everything is ok, the code below uses the model in $logreg$ to plot the decision region for the two features chosen above, with colors denoting the predicted value. It also plots the points (with correct labels) in the training set. It makes a similar plot for the test set.

In [None]:
# Plot the decision boundary. For that, we will assign a color to each
# point in the mesh [x_min, x_max]x[y_min, y_max].
h = .02  # step size in the mesh
x_min, x_max = X[:, 0].min() - .5, X[:, 0].max() + .5
y_min, y_max = X[:, 1].min() - .5, X[:, 1].max() + .5
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))

Z = logreg.predict(np.c_[xx.ravel(), yy.ravel()])

# Put the result into a color plot
Z = Z.reshape(xx.shape)
plt.figure(1, figsize=(4, 3))
plt.pcolormesh(xx, yy, Z, cmap=plt.cm.Paired)

# Plot also the training points
plt.scatter(X_training[:, 0], X_training[:, 1], c=Y_training, edgecolors='k', cmap=plt.cm.Paired)
plt.xlabel(feature_name0)
plt.ylabel(feature_name1)

plt.xlim(xx.min(), xx.max())
plt.ylim(yy.min(), yy.max())
plt.xticks(())
plt.yticks(())
plt.title('Training set')

plt.show()

# Put the result into a color plot
Z = Z.reshape(xx.shape)
plt.figure(1, figsize=(4, 3))
plt.pcolormesh(xx, yy, Z, cmap=plt.cm.Paired)

# Plot also the test points 
plt.scatter(X_test[:, 0], X_test[:, 1], c=Y_test, edgecolors='k', cmap=plt.cm.Paired, marker='s')
plt.xlabel(feature_name0)
plt.ylabel(feature_name1)

plt.xlim(xx.min(), xx.max())
plt.ylim(yy.min(), yy.max())
plt.xticks(())
plt.yticks(())
plt.title('Test set')

plt.show()