# Baby steps with Scikit-learn

In this notebook, we will try to train a perceptron using Scikit-learn

Let's start by loading the Iris data set.

In [1]:
from sklearn import datasets
import numpy as np

iris = datasets.load_iris()

print(type(iris.data))
print(type(iris.target))

<class 'numpy.ndarray'>
<class 'numpy.ndarray'>


The Iris data set was loaded and stored in the `iris` object. This `iris` object is of type `Bunch` (a dictionary on steroids 💉). A function or method can output such objects to return _multiple values_ accessible **by key**.

So, the `Bunch` object returned by function `load_iris` has a key called `data` associated with the *features set*, and another one called `target` associated with the _labels_. Notice, both values associated with those keys are Numpy arrays.

We'll thus use both `iris.data` and `iris.target` to manipulate our data set.

But remember, we will be using only two features of the dataset for visualization purpose; exactly like we did in the previous chapter. As a recall, the two features we will use are the **petal length** and the **petal width**.

Let's form our training set :)

In [2]:
X = iris.data[:, [2, 3]] # Select all lines, but only the 3rd and 4th column of the dataset
y = iris.target #grab the label column

print("Class labels: ", np.unique(y))

Class labels:  [0 1 2]


The training set was successfully formed, and the `np.unique(y)` returned the three unique class labels of our training set; where `Iris-setosa = 0`, `Iris-versicolor = 1`, and `Iris-virginica = 2`.

Scikit-learn conviniently assigned numbers to our labels. Remember, it was task we had to do in the previous chapter. Is that nice? 😊

Now, let's split our training set.

In Machine Learning, it is common to **cut** the training set into **two parts**. One for **training** the model, and the other to **evaluate** it. So, far we used the entire dataset to train our models. In Chapter 4, we should discuss this practice in more detail.

In [3]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1, stratify=y)

**What just happened?**

We asked `train_test_split` to randomly split `X` and `y` into 30% test data and 70% training data. The `stratify` option helps us make sure that the training and test subsets have the same proportion of class label as the input dataset. We can verifiy that using Numpy `bincount` function.

In [4]:
print('Labels counts in y:', np.bincount(y))
print('Labels counts in y_train:', np.bincount(y_train))
print('Labels counts in y_test:', np.bincount(y_test))

Labels counts in y: [50 50 50]
Labels counts in y_train: [35 35 35]
Labels counts in y_test: [15 15 15]


Now, let's standardize the features of our taining set. Remember, standardizing (i.e. feature scaling) helps bringing different variables under the same scale.

In [5]:
from sklearn.preprocessing import StandardScaler

sc = StandardScaler()
sc.fit(X_train) # determines the sample mean and std deviation for each feature

X_train_std = sc.transform(X_train)
X_test_std = sc.transform(X_test)

Let's now train a perceptron model using our standardized training set

In [6]:
from sklearn.linear_model import Perceptron

perceptron = Perceptron(eta0=0.1, random_state=1)
perceptron.fit(X_train_std, y_train)

Perceptron(eta0=0.1, random_state=1)

Let's test our perceptron and print out the number of examples it misclassified

In [7]:
y_pred = perceptron.predict(X_test_std)
print('Misclassified examples: %d' %(y_test != y_pred).sum())

Misclassified examples: 1


We can choose to look at the number of misclassification or the accuracy to judge the performance of our model.

Our perceptron misclassified 1 out of the 45 flowers examples in the test subset. The misclassification is then `~0.022 (1 / 45)` or `2.2 %`. It implies that our perceptron accuracy is `1 - 0.022 = 0.978` or  `97.8%`. But we can obtain this metric also in code

In [8]:
from sklearn.metrics import accuracy_score

print('Accuracry: %.3f' % accuracy_score(y_test, y_pred))

Accuracry: 0.978


Remember that the perceptron performs best when the classes are linearly separable. If we were to plot the decision regions (which I encourage you to do), we'd see that some of the class cannot be _perfectly_ seperated linearly.

**Hint**: See the book, to find the `plot_decision_regions` function