# ML Bootcamp: Logistic Regression

In this lab, we will provide an interactive demonstration of logistic regression, predicting what number is represented by handwritten images of digits 0-9. 

Let's first start by loading the data in. We will use the scikit-learn digits dataset as a reference. Each piece of data consists of a representation of a handwritten digit and a label corresponding to the number of the drawn digit.

In [3]:
# checking setosa vs. not setosa
import numpy as np
from sklearn import datasets


iris = datasets.load_iris()
iris_X = iris.data
iris_y = iris.target

print(iris_X)
print(iris_y)

[[5.1 3.5 1.4 0.2]
 [4.9 3.  1.4 0.2]
 [4.7 3.2 1.3 0.2]
 [4.6 3.1 1.5 0.2]
 [5.  3.6 1.4 0.2]
 [5.4 3.9 1.7 0.4]
 [4.6 3.4 1.4 0.3]
 [5.  3.4 1.5 0.2]
 [4.4 2.9 1.4 0.2]
 [4.9 3.1 1.5 0.1]
 [5.4 3.7 1.5 0.2]
 [4.8 3.4 1.6 0.2]
 [4.8 3.  1.4 0.1]
 [4.3 3.  1.1 0.1]
 [5.8 4.  1.2 0.2]
 [5.7 4.4 1.5 0.4]
 [5.4 3.9 1.3 0.4]
 [5.1 3.5 1.4 0.3]
 [5.7 3.8 1.7 0.3]
 [5.1 3.8 1.5 0.3]
 [5.4 3.4 1.7 0.2]
 [5.1 3.7 1.5 0.4]
 [4.6 3.6 1.  0.2]
 [5.1 3.3 1.7 0.5]
 [4.8 3.4 1.9 0.2]
 [5.  3.  1.6 0.2]
 [5.  3.4 1.6 0.4]
 [5.2 3.5 1.5 0.2]
 [5.2 3.4 1.4 0.2]
 [4.7 3.2 1.6 0.2]
 [4.8 3.1 1.6 0.2]
 [5.4 3.4 1.5 0.4]
 [5.2 4.1 1.5 0.1]
 [5.5 4.2 1.4 0.2]
 [4.9 3.1 1.5 0.2]
 [5.  3.2 1.2 0.2]
 [5.5 3.5 1.3 0.2]
 [4.9 3.6 1.4 0.1]
 [4.4 3.  1.3 0.2]
 [5.1 3.4 1.5 0.2]
 [5.  3.5 1.3 0.3]
 [4.5 2.3 1.3 0.3]
 [4.4 3.2 1.3 0.2]
 [5.  3.5 1.6 0.6]
 [5.1 3.8 1.9 0.4]
 [4.8 3.  1.4 0.3]
 [5.1 3.8 1.6 0.2]
 [4.6 3.2 1.4 0.2]
 [5.3 3.7 1.5 0.2]
 [5.  3.3 1.4 0.2]
 [7.  3.2 4.7 1.4]
 [6.4 3.2 4.5 1.5]
 [6.9 3.1 4.

We can visualize some of the examples in our dataset.

In [None]:
# changed setosa into 1 and non-setosas into -1

for i in range(len(iris_y)):
  if iris_y[i] == 0:
    iris_y[i] = 1
  else:
    iris_y[i] = -1

As we usually do in machine learning, let's separate our data into a subset to **train** on and a subset to **test** on. 

In [None]:
np.random.seed(0)
indices = np.random.permutation(len(iris_X))

# |with_ex| is the number of test examples
# Precondition: with_ex < 0
with_ex = -50

iris_X_train = iris_X[indices[:with_ex]] # assigning train x 
iris_y_train = iris_y[indices[:with_ex]] # assigning train y

iris_X_test = iris_X[indices[with_ex:]] # assigning test x
iris_y_test = iris_y[indices[with_ex:]] # assigning test y

Now, we will use scikit-learn's implementation of logistic regression. We will train our model on the training subset, and then use that model to make predictions for our test set.

In [None]:
#LOGISTIC REGRESSION
#from sklearn import linear_model

#model = linear_model.LogisticRegression()

#SVM
from sklearn import svm

model = svm.SVC(kernel='linear')

Now, we can train the classifier on the training subset.

In [None]:
model.fit(iris_X_train, iris_y_train)

SVC(C=1.0, break_ties=False, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma='scale', kernel='linear',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False)

With the trained models, we can make predictions on the testing subset. 

In [None]:
preds = model.predict(iris_X_test)

Let's now visualize our predictions!

Finally, we can compute our classification accuracy, or the percentage of examples in the test subset classified correctly. The following code snippet computes the accuracy.

In [None]:
# Compute number of examples classified correctly

num_correct = 0
for i in range(len(preds)):
  if preds[i] == iris_y_test[i]:
    num_correct += 1
    
print("The fraction of correctly classified examples in the test set is: " + str(num_correct / len(preds)))

The fraction of correctly classified examples in the test set is: 1.0
