## A "Hello World" Example of Machine Learning - Revisit

In [52]:
class AdalineSGD(object):
    """ADAptive LInear NEuron classifier.

    Parameters
    ------------
    eta : float
      Learning rate (between 0.0 and 1.0)
    n_iter : int
      Passes over the training dataset.
    shuffle : bool (default: True)
      Shuffles training data every epoch if True to prevent cycles.
    random_state : int
      Random number generator seed for random weight
      initialization.


    Attributes
    -----------
    w_ : 1d-array
      Weights after fitting.
    cost_ : list
      Sum-of-squares cost function value averaged over all
      training samples in each epoch.

        
    """
    def __init__(self, eta=0.01, n_iter=10, shuffle=True, random_state=None):
        self.eta = eta
        self.n_iter = n_iter
        self.w_initialized = False
        self.shuffle = shuffle
        self.random_state = random_state
        
    def fit(self, X, y):
        """ Fit training data.

        Parameters
        ----------
        X : {array-like}, shape = [n_samples, n_features]
          Training vectors, where n_samples is the number of samples and
          n_features is the number of features.
        y : array-like, shape = [n_samples]
          Target values.

        Returns
        -------
        self : object

        """
        self._initialize_weights(X.shape[1])
        self.cost_ = []
        for i in range(self.n_iter):
            if self.shuffle:
                X, y = self._shuffle(X, y)
            cost = []
            for xi, target in zip(X, y):
                cost.append(self._update_weights(xi, target))
            avg_cost = sum(cost) / len(y)
            self.cost_.append(avg_cost)
        return self

    def partial_fit(self, X, y):
        """Fit training data without reinitializing the weights"""
        if not self.w_initialized:
            self._initialize_weights(X.shape[1])
        if y.ravel().shape[0] > 1:
            for xi, target in zip(X, y):
                self._update_weights(xi, target)
        else:
            self._update_weights(X, y)
        return self

    def _shuffle(self, X, y):
        """Shuffle training data"""
        r = self.rgen.permutation(len(y))
        return X[r], y[r]
    
    def _initialize_weights(self, m):
        """Initialize weights to small random numbers"""
        self.rgen = np.random.RandomState(self.random_state)
        self.w_ = self.rgen.normal(loc=0.0, scale=0.01, size=1 + m)
        self.w_initialized = True
        
    def _update_weights(self, xi, target):
        """Apply Adaline learning rule to update the weights"""
        output = self.activation(self.net_input(xi))
        error = (target - output)
        self.w_[1:] += self.eta * xi.dot(error)
        self.w_[0] += self.eta * error
        cost = 0.5 * error**2
        return cost
    
    def net_input(self, X):
        """Calculate net input"""
        return np.dot(X, self.w_[1:]) + self.w_[0]

    def activation(self, X):
        """Compute linear activation"""
        return X

    def predict(self, X):
        """Return class label after unit step"""
        return np.where(self.activation(self.net_input(X)) >= 0.0, 1, -1)

Loading the Iris dataset from scikit-learn. 

The first column represents Sepal length, the second column represents Sepal width,  the third column represents the petal length, and the fourth column the petal width of the flower samples. The classes (type of species) are already converted to integer labels where 0=Iris-Setosa, 1=Iris-Versicolor, 2=Iris-Virginica.

Here, we are using only two features: the third and fourth columns. 

In [53]:
from sklearn import datasets
import numpy as np

iris = datasets.load_iris()
#iris.data

In [54]:
X = iris.data[:, [0, 3]]

In [55]:
y = iris.target

In [56]:
print('Class labels:', np.unique(y))

Class labels: [0 1 2]


Scikit-learn algorithms support multi-class classification via the One-Versus-Rest(OvR) method. 

Splitting data into 70% training and 30% test data:

In [57]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=1, stratify=y)

In [58]:
print('Labels counts in y:', np.bincount(y))
print('Labels counts in y_train:', np.bincount(y_train))
print('Labels counts in y_test:', np.bincount(y_test))

Labels counts in y: [50 50 50]
Labels counts in y_train: [35 35 35]
Labels counts in y_test: [15 15 15]


### Standardizing the features:

In [59]:
from sklearn.preprocessing import StandardScaler

sc = StandardScaler() #center the distribution around zero (mean), with a standard deviation of 1.
sc.fit(X_train) # When fitted, X_train is standardized
X_train_std = sc.transform(X_train) # Transform features
X_test_std = sc.transform(X_test)

In [60]:
from sklearn.linear_model import Perceptron

ppn = Perceptron(max_iter=100, eta0=0.1, random_state=42)
ppn.fit(X_train_std, y_train)

Perceptron(alpha=0.0001, class_weight=None, early_stopping=False, eta0=0.1,
           fit_intercept=True, max_iter=100, n_iter_no_change=5, n_jobs=None,
           penalty=None, random_state=42, shuffle=True, tol=0.001,
           validation_fraction=0.1, verbose=0, warm_start=False)

### Test the model with the hold-out test set

In [61]:
y_pred = ppn.predict(X_test_std)
print('Misclassified samples: ' + str((y_test != y_pred).sum()))

Misclassified samples: 10


In [62]:
from sklearn.metrics import accuracy_score

print('Accuracy: ' + str(accuracy_score(y_test, y_pred)))

Accuracy: 0.7777777777777778


In [63]:
X_new = [[1.1, 0.2],[0.4, 1.9], [1.4, 0.2]]
y_new = ppn.predict(X_new)
y_new

array([2, 2, 2])

### Evaluate the model using cross validation

In [64]:
from sklearn.model_selection import cross_val_score
cross_val_score(ppn, X_train_std, y_train, cv=4, scoring="accuracy")

array([0.88888889, 0.92592593, 0.77777778, 0.79166667])

In [65]:
2-features: sccuracy score: array([0.88888889, 0.48148148, 0.7037037 , 0.95833333])

SyntaxError: invalid syntax (<ipython-input-65-a2f78d3eeb54>, line 1)

### Exercise 1: Use all four features to train the model and use cross validaton to check if the results better? Briefly explain why. 

In [111]:
X = iris.data[:, [1, 3]]
y = iris.target
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=1, stratify=y)
sc.fit(X_train) # When fitted, X_train is standardized
X_train_std = sc.transform(X_train) # Transform features
X_test_std = sc.transform(X_test)

ppn = Perceptron(max_iter=100, eta0=0.1, random_state=42)
ppn.fit(X_train_std, y_train)
y_pred = ppn.predict(X_test_std)
print('Misclassified samples: ' + str((y_test != y_pred).sum()))

cross_val_score(ppn, X_train_std, y_train, cv=4, scoring="accuracy")
print('Accuracy: ' + str(accuracy_score(y_test, y_pred)) + '\n \n Most accurate when the last 3 features are used instead of all of them. The results did not improve when I used all the features because not every feature was relevant to classifying the samples.')

Misclassified samples: 1
Accuracy: 0.9777777777777777
 
 Most accurate when the last 3 features are used instead of all of them. The results did not improve when I used all the features because not every feature was relevant to classifying the samples.


### Exercise 2: Try with the scikit-learn stochastic gradient descent model instead of perceptron. Use all four features. Evaluate with cross-validation how does the model perform in terms of accuracy using both two features and four features. 

In [110]:
X = iris.data[:, [3, 3]]
X_std = np.copy(X)
#X_std[:, 0] = (X[:, 0] - X[:, 0].mean()) / X[:, 0].std()
#X_std[:, 1] = (X[:, 1] - X[:, 1].mean()) / X[:, 1].std()
from sklearn.linear_model import SGDClassifier
sgd = SGDClassifier(random_state=42)
ada = AdalineSGD(n_iter=15, eta=0.01, random_state=1)
ada.fit(X_std, y)
y_pred = ppn.predict(X_test_std)
print('Misclassified samples: ' + str((y_test != y_pred).sum()))

cross_val_score(ppn, X_train_std, y_train, cv=4, scoring="accuracy")
print('Accuracy: ' + str(accuracy_score(y_test, y_pred)) + '\n \n Accuracy does not seem to vary with the inclusion of more or less features. This observation includes with 2[2,3] and 4[0,3] features This is likely because the features are standardized, meaning variation is reduced.')

Misclassified samples: 1
Accuracy: 0.9777777777777777
 
 Accuracy does not seem to vary with the inclusion of more or less features. This observation includes with 2[2,3] and 4[0,3] features This is likely because the features are standardized, meaning variation is reduced.
