# Introduction

This notebook contains two parts. **Part 1, Testing Perceptrons**, provides you an opportunity to demonstrate your ability to apply course concepts by implementing a test function for a perceptron. **Part 2, Mine Detection**, provides you an opportunity to practice using widely-used ML libraries and an ML workflow to solve a classification problem.

You do not need to complete Part 1 in order to complete Part 2. If you get stuck on Part 1, and choose to work on Part 2, be sure that all of your code for Part 1 runs without error. You can comment out your code in Part 1 if necessary.

# Part 1: Testing Perceptrons

Given a simple Perceptron classifier, and a test set cancer diagnoses, demonstrate your ability to implement a perceptron's `test` function, such that it returns the accuracy of its predictions for the class labels of samples in a training set.

## The Perceptron Implementation

Let's first introduce the classifier, which you should find familiar, and you do not need to modify. Notice that the `test` method is stubbed.

In [1]:
import random

class Perceptron:

    def __init__(self, alpha = 0.1, max_epochs = 1000):
        self.alpha = alpha
        self.max_epochs = max_epochs

    def train(self, training_set):
        self.weights = self._initial_weights(len(training_set[0]))
        for i in range(self.max_epochs):
            correct_classifications = 0
            for record in training_set:
                y = record[0]
                y_hat = self.predict(record[1:])
                if y == y_hat: correct_classifications += 1
                self._update_weights(y - y_hat, [1] + record[1:])
            if correct_classifications / len(training_set) == 1.0:
                print(f"Epoch {i} Accuracy: {correct_classifications / len(training_set)}")
                break

    def predict(self, features):
        return self._sign_of(self._dot_product(self.weights, [1] + features))

    def test(self, test_set):
        pass

    def _update_weights(self, error, features):
        self.weights[0] += self.alpha * error
        for i in range(1, len(self.weights)):
            self.weights[i] += self.alpha * error * features[i]

    def _dot_product(self, w, x):
        return sum([w * x for w, x in zip(w, x)])

    def _sign_of(self, value):
        return 1 if value >= 0 else -1

    def _initial_weights(self, length):
        return [random.uniform(0, 1) for _ in range(length)]

## The Data Set

There is no need for you to manually load the data set. We have provided a subset of the cancer diagnoses data set here, already split into a training set, `diagnoses_training`, and a test set, `diagnoses_test`. Each data set is a simple, two-dimensional Python list, where each sub-list represents the attributes for one diagnosis.

We have preprocessed the data, such that the **first dimension is the class label**: a `1` indicating malignant, and `-1` indicating benign.


In [2]:
diagnoses_training = [
    [1, 17.99, 10.38, 122.8, 1001, 0.1184, 0.2776, 0.3001, 0.1471, 0.2419, 0.07871, 1.095, 0.9053, 8.589, 153.4, 0.006399, 0.04904, 0.05373, 0.01587, 0.03003, 0.006193, 25.38, 17.33, 184.6, 2019, 0.1622, 0.6656, 0.7119, 0.2654, 0.4601, 0.1189],
    [1, 20.57, 17.77, 132.9, 1326, 0.08474, 0.07864, 0.0869, 0.07017, 0.1812, 0.05667, 0.5435, 0.7339, 3.398, 74.08, 0.005225, 0.01308, 0.0186, 0.0134, 0.01389, 0.003532, 24.99, 23.41, 158.8, 1956, 0.1238, 0.1866, 0.2416, 0.186, 0.275, 0.08902],
    [1, 19.69, 21.25, 130, 1203, 0.1096, 0.1599, 0.1974, 0.1279, 0.2069, 0.05999, 0.7456, 0.7869, 4.585, 94.03, 0.00615, 0.04006, 0.03832, 0.02058, 0.0225, 0.004571, 23.57, 25.53, 152.5, 1709, 0.1444, 0.4245, 0.4504, 0.243, 0.3613, 0.08758],
    [1, 11.42, 20.38, 77.58, 386.1, 0.1425, 0.2839, 0.2414, 0.1052, 0.2597, 0.09744, 0.4956, 1.156, 3.445, 27.23, 0.00911, 0.07458, 0.05661, 0.01867, 0.05963, 0.009208, 14.91, 26.5, 98.87, 567.7, 0.2098, 0.8663, 0.6869, 0.2575, 0.6638, 0.173],
    [1, 20.29, 14.34, 135.1, 1297, 0.1003, 0.1328, 0.198, 0.1043, 0.1809, 0.05883, 0.7572, 0.7813, 5.438, 94.44, 0.01149, 0.02461, 0.05688, 0.01885, 0.01756, 0.005115, 22.54, 16.67, 152.2, 1575, 0.1374, 0.205, 0.4, 0.1625, 0.2364, 0.07678],
    [1, 12.45, 15.7, 82.57, 477.1, 0.1278, 0.17, 0.1578, 0.08089, 0.2087, 0.07613, 0.3345, 0.8902, 2.217, 27.19, 0.00751, 0.03345, 0.03672, 0.01137, 0.02165, 0.005082, 15.47, 23.75, 103.4, 741.6, 0.1791, 0.5249, 0.5355, 0.1741, 0.3985, 0.1244],
    [1, 18.25, 19.98, 119.6, 1040, 0.09463, 0.109, 0.1127, 0.074, 0.1794, 0.05742, 0.4467, 0.7732, 3.18, 53.91, 0.004314, 0.01382, 0.02254, 0.01039, 0.01369, 0.002179, 22.88, 27.66, 153.2, 1606, 0.1442, 0.2576, 0.3784, 0.1932, 0.3063, 0.08368],
    [1, 13.71, 20.83, 90.2, 577.9, 0.1189, 0.1645, 0.09366, 0.05985, 0.2196, 0.07451, 0.5835, 1.377, 3.856, 50.96, 0.008805, 0.03029, 0.02488, 0.01448, 0.01486, 0.005412, 17.06, 28.14, 110.6, 897, 0.1654, 0.3682, 0.2678, 0.1556, 0.3196, 0.1151],
    [1, 13, 21.82, 87.5, 519.8, 0.1273, 0.1932, 0.1859, 0.09353, 0.235, 0.07389, 0.3063, 1.002, 2.406, 24.32, 0.005731, 0.03502, 0.03553, 0.01226, 0.02143, 0.003749, 15.49, 30.73, 106.2, 739.3, 0.1703, 0.5401, 0.539, 0.206, 0.4378, 0.1072],
    [1, 12.46, 24.04, 83.97, 475.9, 0.1186, 0.2396, 0.2273, 0.08543, 0.203, 0.08243, 0.2976, 1.599, 2.039, 23.94, 0.007149, 0.07217, 0.07743, 0.01432, 0.01789, 0.01008, 15.09, 40.68, 97.65, 711.4, 0.1853, 1.058, 1.105, 0.221, 0.4366, 0.2075],
    [1, 16.02, 23.24, 102.7, 797.8, 0.08206, 0.06669, 0.03299, 0.03323, 0.1528, 0.05697, 0.3795, 1.187, 2.466, 40.51, 0.004029, 0.009269, 0.01101, 0.007591, 0.0146, 0.003042, 19.19, 33.88, 123.8, 1150, 0.1181, 0.1551, 0.1459, 0.09975, 0.2948, 0.08452],
    [1, 15.78, 17.89, 103.6, 781, 0.0971, 0.1292, 0.09954, 0.06606, 0.1842, 0.06082, 0.5058, 0.9849, 3.564, 54.16, 0.005771, 0.04061, 0.02791, 0.01282, 0.02008, 0.004144, 20.42, 27.28, 136.5, 1299, 0.1396, 0.5609, 0.3965, 0.181, 0.3792, 0.1048],
    [1, 19.17, 24.8, 132.4, 1123, 0.0974, 0.2458, 0.2065, 0.1118, 0.2397, 0.078, 0.9555, 3.568, 11.07, 116.2, 0.003139, 0.08297, 0.0889, 0.0409, 0.04484, 0.01284, 20.96, 29.94, 151.7, 1332, 0.1037, 0.3903, 0.3639, 0.1767, 0.3176, 0.1023],
    [1, 15.85, 23.95, 103.7, 782.7, 0.08401, 0.1002, 0.09938, 0.05364, 0.1847, 0.05338, 0.4033, 1.078, 2.903, 36.58, 0.009769, 0.03126, 0.05051, 0.01992, 0.02981, 0.003002, 16.84, 27.66, 112, 876.5, 0.1131, 0.1924, 0.2322, 0.1119, 0.2809, 0.06287],
    [1, 13.73, 22.61, 93.6, 578.3, 0.1131, 0.2293, 0.2128, 0.08025, 0.2069, 0.07682, 0.2121, 1.169, 2.061, 19.21, 0.006429, 0.05936, 0.05501, 0.01628, 0.01961, 0.008093, 15.03, 32.01, 108.8, 697.7, 0.1651, 0.7725, 0.6943, 0.2208, 0.3596, 0.1431],
    [1, 14.54, 27.54, 96.73, 658.8, 0.1139, 0.1595, 0.1639, 0.07364, 0.2303, 0.07077, 0.37, 1.033, 2.879, 32.55, 0.005607, 0.0424, 0.04741, 0.0109, 0.01857, 0.005466, 17.46, 37.13, 124.1, 943.2, 0.1678, 0.6577, 0.7026, 0.1712, 0.4218, 0.1341],
    [1, 14.68, 20.13, 94.74, 684.5, 0.09867, 0.072, 0.07395, 0.05259, 0.1586, 0.05922, 0.4727, 1.24, 3.195, 45.4, 0.005718, 0.01162, 0.01998, 0.01109, 0.0141, 0.002085, 19.07, 30.88, 123.4, 1138, 0.1464, 0.1871, 0.2914, 0.1609, 0.3029, 0.08216],
    [1, 16.13, 20.68, 108.1, 798.8, 0.117, 0.2022, 0.1722, 0.1028, 0.2164, 0.07356, 0.5692, 1.073, 3.854, 54.18, 0.007026, 0.02501, 0.03188, 0.01297, 0.01689, 0.004142, 20.96, 31.48, 136.8, 1315, 0.1789, 0.4233, 0.4784, 0.2073, 0.3706, 0.1142],
    [1, 19.81, 22.15, 130, 1260, 0.09831, 0.1027, 0.1479, 0.09498, 0.1582, 0.05395, 0.7582, 1.017, 5.865, 112.4, 0.006494, 0.01893, 0.03391, 0.01521, 0.01356, 0.001997, 27.32, 30.88, 186.8, 2398, 0.1512, 0.315, 0.5372, 0.2388, 0.2768, 0.07615],
    [-1, 13.54, 14.36, 87.46, 566.3, 0.09779, 0.08129, 0.06664, 0.04781, 0.1885, 0.05766, 0.2699, 0.7886, 2.058, 23.56, 0.008462, 0.0146, 0.02387, 0.01315, 0.0198, 0.0023, 15.11, 19.26, 99.7, 711.2, 0.144, 0.1773, 0.239, 0.1288, 0.2977, 0.07259],
    [-1, 13.08, 15.71, 85.63, 520, 0.1075, 0.127, 0.04568, 0.0311, 0.1967, 0.06811, 0.1852, 0.7477, 1.383, 14.67, 0.004097, 0.01898, 0.01698, 0.00649, 0.01678, 0.002425, 14.5, 20.49, 96.09, 630.5, 0.1312, 0.2776, 0.189, 0.07283, 0.3184, 0.08183],
    [-1, 9.504, 12.44, 60.34, 273.9, 0.1024, 0.06492, 0.02956, 0.02076, 0.1815, 0.06905, 0.2773, 0.9768, 1.909, 15.7, 0.009606, 0.01432, 0.01985, 0.01421, 0.02027, 0.002968, 10.23, 15.66, 65.13, 314.9, 0.1324, 0.1148, 0.08867, 0.06227, 0.245, 0.07773],
    [1, 15.34, 14.26, 102.5, 704.4, 0.1073, 0.2135, 0.2077, 0.09756, 0.2521, 0.07032, 0.4388, 0.7096, 3.384, 44.91, 0.006789, 0.05328, 0.06446, 0.02252, 0.03672, 0.004394, 18.07, 19.08, 125.1, 980.9, 0.139, 0.5954, 0.6305, 0.2393, 0.4667, 0.09946],
    [1, 21.16, 23.04, 137.2, 1404, 0.09428, 0.1022, 0.1097, 0.08632, 0.1769, 0.05278, 0.6917, 1.127, 4.303, 93.99, 0.004728, 0.01259, 0.01715, 0.01038, 0.01083, 0.001987, 29.17, 35.59, 188, 2615, 0.1401, 0.26, 0.3155, 0.2009, 0.2822, 0.07526],
    [1, 16.65, 21.38, 110, 904.6, 0.1121, 0.1457, 0.1525, 0.0917, 0.1995, 0.0633, 0.8068, 0.9017, 5.455, 102.6, 0.006048, 0.01882, 0.02741, 0.0113, 0.01468, 0.002801, 26.46, 31.56, 177, 2215, 0.1805, 0.3578, 0.4695, 0.2095, 0.3613, 0.09564],
    [1, 17.14, 16.4, 116, 912.7, 0.1186, 0.2276, 0.2229, 0.1401, 0.304, 0.07413, 1.046, 0.976, 7.276, 111.4, 0.008029, 0.03799, 0.03732, 0.02397, 0.02308, 0.007444, 22.25, 21.4, 152.4, 1461, 0.1545, 0.3949, 0.3853, 0.255, 0.4066, 0.1059],
    [1, 14.58, 21.53, 97.41, 644.8, 0.1054, 0.1868, 0.1425, 0.08783, 0.2252, 0.06924, 0.2545, 0.9832, 2.11, 21.05, 0.004452, 0.03055, 0.02681, 0.01352, 0.01454, 0.003711, 17.62, 33.21, 122.4, 896.9, 0.1525, 0.6643, 0.5539, 0.2701, 0.4264, 0.1275],
    [1, 18.61, 20.25, 122.1, 1094, 0.0944, 0.1066, 0.149, 0.07731, 0.1697, 0.05699, 0.8529, 1.849, 5.632, 93.54, 0.01075, 0.02722, 0.05081, 0.01911, 0.02293, 0.004217, 21.31, 27.26, 139.9, 1403, 0.1338, 0.2117, 0.3446, 0.149, 0.2341, 0.07421],
    [1, 15.3, 25.27, 102.4, 732.4, 0.1082, 0.1697, 0.1683, 0.08751, 0.1926, 0.0654, 0.439, 1.012, 3.498, 43.5, 0.005233, 0.03057, 0.03576, 0.01083, 0.01768, 0.002967, 20.27, 36.71, 149.3, 1269, 0.1641, 0.611, 0.6335, 0.2024, 0.4027, 0.09876],
    [1, 17.57, 15.05, 115, 955.1, 0.09847, 0.1157, 0.09875, 0.07953, 0.1739, 0.06149, 0.6003, 0.8225, 4.655, 61.1, 0.005627, 0.03033, 0.03407, 0.01354, 0.01925, 0.003742, 20.01, 19.52, 134.9, 1227, 0.1255, 0.2812, 0.2489, 0.1456, 0.2756, 0.07919],
    [1, 18.63, 25.11, 124.8, 1088, 0.1064, 0.1887, 0.2319, 0.1244, 0.2183, 0.06197, 0.8307, 1.466, 5.574, 105, 0.006248, 0.03374, 0.05196, 0.01158, 0.02007, 0.00456, 23.15, 34.01, 160.5, 1670, 0.1491, 0.4257, 0.6133, 0.1848, 0.3444, 0.09782],
    [1, 11.84, 18.7, 77.93, 440.6, 0.1109, 0.1516, 0.1218, 0.05182, 0.2301, 0.07799, 0.4825, 1.03, 3.475, 41, 0.005551, 0.03414, 0.04205, 0.01044, 0.02273, 0.005667, 16.82, 28.12, 119.4, 888.7, 0.1637, 0.5775, 0.6956, 0.1546, 0.4761, 0.1402],
    [1, 17.02, 23.98, 112.8, 899.3, 0.1197, 0.1496, 0.2417, 0.1203, 0.2248, 0.06382, 0.6009, 1.398, 3.999, 67.78, 0.008268, 0.03082, 0.05042, 0.01112, 0.02102, 0.003854, 20.88, 32.09, 136.1, 1344, 0.1634, 0.3559, 0.5588, 0.1847, 0.353, 0.08482],
    [1, 19.27, 26.47, 127.9, 1162, 0.09401, 0.1719, 0.1657, 0.07593, 0.1853, 0.06261, 0.5558, 0.6062, 3.528, 68.17, 0.005015, 0.03318, 0.03497, 0.009643, 0.01543, 0.003896, 24.15, 30.9, 161.4, 1813, 0.1509, 0.659, 0.6091, 0.1785, 0.3672, 0.1123],
    [1, 16.13, 17.88, 107, 807.2, 0.104, 0.1559, 0.1354, 0.07752, 0.1998, 0.06515, 0.334, 0.6857, 2.183, 35.03, 0.004185, 0.02868, 0.02664, 0.009067, 0.01703, 0.003817, 20.21, 27.26, 132.7, 1261, 0.1446, 0.5804, 0.5274, 0.1864, 0.427, 0.1233],
    [1, 16.74, 21.59, 110.1, 869.5, 0.0961, 0.1336, 0.1348, 0.06018, 0.1896, 0.05656, 0.4615, 0.9197, 3.008, 45.19, 0.005776, 0.02499, 0.03695, 0.01195, 0.02789, 0.002665, 20.01, 29.02, 133.5, 1229, 0.1563, 0.3835, 0.5409, 0.1813, 0.4863, 0.08633],
    [1, 14.25, 21.72, 93.63, 633, 0.09823, 0.1098, 0.1319, 0.05598, 0.1885, 0.06125, 0.286, 1.019, 2.657, 24.91, 0.005878, 0.02995, 0.04815, 0.01161, 0.02028, 0.004022, 15.89, 30.36, 116.2, 799.6, 0.1446, 0.4238, 0.5186, 0.1447, 0.3591, 0.1014],
    [-1, 13.03, 18.42, 82.61, 523.8, 0.08983, 0.03766, 0.02562, 0.02923, 0.1467, 0.05863, 0.1839, 2.342, 1.17, 14.16, 0.004352, 0.004899, 0.01343, 0.01164, 0.02671, 0.001777, 13.3, 22.81, 84.46, 545.9, 0.09701, 0.04619, 0.04833, 0.05013, 0.1987, 0.06169],
    [1, 14.99, 25.2, 95.54, 698.8, 0.09387, 0.05131, 0.02398, 0.02899, 0.1565, 0.05504, 1.214, 2.188, 8.077, 106, 0.006883, 0.01094, 0.01818, 0.01917, 0.007882, 0.001754, 14.99, 25.2, 95.54, 698.8, 0.09387, 0.05131, 0.02398, 0.02899, 0.1565, 0.05504],
    [1, 13.48, 20.82, 88.4, 559.2, 0.1016, 0.1255, 0.1063, 0.05439, 0.172, 0.06419, 0.213, 0.5914, 1.545, 18.52, 0.005367, 0.02239, 0.03049, 0.01262, 0.01377, 0.003187, 15.53, 26.02, 107.3, 740.4, 0.161, 0.4225, 0.503, 0.2258, 0.2807, 0.1071],
    [1, 13.44, 21.58, 86.18, 563, 0.08162, 0.06031, 0.0311, 0.02031, 0.1784, 0.05587, 0.2385, 0.8265, 1.572, 20.53, 0.00328, 0.01102, 0.0139, 0.006881, 0.0138, 0.001286, 15.93, 30.25, 102.5, 787.9, 0.1094, 0.2043, 0.2085, 0.1112, 0.2994, 0.07146],
    [1, 10.95, 21.35, 71.9, 371.1, 0.1227, 0.1218, 0.1044, 0.05669, 0.1895, 0.0687, 0.2366, 1.428, 1.822, 16.97, 0.008064, 0.01764, 0.02595, 0.01037, 0.01357, 0.00304, 12.84, 35.34, 87.22, 514, 0.1909, 0.2698, 0.4023, 0.1424, 0.2964, 0.09606],
    [1, 19.07, 24.81, 128.3, 1104, 0.09081, 0.219, 0.2107, 0.09961, 0.231, 0.06343, 0.9811, 1.666, 8.83, 104.9, 0.006548, 0.1006, 0.09723, 0.02638, 0.05333, 0.007646, 24.09, 33.17, 177.4, 1651, 0.1247, 0.7444, 0.7242, 0.2493, 0.467, 0.1038],
    [1, 13.28, 20.28, 87.32, 545.2, 0.1041, 0.1436, 0.09847, 0.06158, 0.1974, 0.06782, 0.3704, 0.8249, 2.427, 31.33, 0.005072, 0.02147, 0.02185, 0.00956, 0.01719, 0.003317, 17.38, 28, 113.1, 907.2, 0.153, 0.3724, 0.3664, 0.1492, 0.3739, 0.1027],
    [1, 13.17, 21.81, 85.42, 531.5, 0.09714, 0.1047, 0.08259, 0.05252, 0.1746, 0.06177, 0.1938, 0.6123, 1.334, 14.49, 0.00335, 0.01384, 0.01452, 0.006853, 0.01113, 0.00172, 16.23, 29.89, 105.5, 740.7, 0.1503, 0.3904, 0.3728, 0.1607, 0.3693, 0.09618],
    [1, 18.65, 17.6, 123.7, 1076, 0.1099, 0.1686, 0.1974, 0.1009, 0.1907, 0.06049, 0.6289, 0.6633, 4.293, 71.56, 0.006294, 0.03994, 0.05554, 0.01695, 0.02428, 0.003535, 22.82, 21.32, 150.6, 1567, 0.1679, 0.509, 0.7345, 0.2378, 0.3799, 0.09185],
    [-1, 8.196, 16.84, 51.71, 201.9, 0.086, 0.05943, 0.01588, 0.005917, 0.1769, 0.06503, 0.1563, 0.9567, 1.094, 8.205, 0.008968, 0.01646, 0.01588, 0.005917, 0.02574, 0.002582, 8.964, 21.96, 57.26, 242.2, 0.1297, 0.1357, 0.0688, 0.02564, 0.3105, 0.07409]
]

diagnoses_test = [
    [-1, 8.95, 15.76, 58.74, 245.2, 0.09462, 0.1243, 0.09263, 0.02308, 0.1305, 0.07163, 0.3132, 0.9789, 3.28, 16.94, 0.01835, 0.0676, 0.09263, 0.02308, 0.02384, 0.005601, 9.414, 17.07, 63.34, 270, 0.1179, 0.1879, 0.1544, 0.03846, 0.1652, 0.07722],
    [1, 15.22, 30.62, 103.4, 716.9, 0.1048, 0.2087, 0.255, 0.09429, 0.2128, 0.07152, 0.2602, 1.205, 2.362, 22.65, 0.004625, 0.04844, 0.07359, 0.01608, 0.02137, 0.006142, 17.52, 42.79, 128.7, 915, 0.1417, 0.7917, 1.17, 0.2356, 0.4089, 0.1409],
    [-1, 11.34, 21.26, 72.48, 396.5, 0.08759, 0.06575, 0.05133, 0.01899, 0.1487, 0.06529, 0.2344, 0.9861, 1.597, 16.41, 0.009113, 0.01557, 0.02443, 0.006435, 0.01568, 0.002477, 13.01, 29.15, 83.99, 518.1, 0.1699, 0.2196, 0.312, 0.08278, 0.2829, 0.08832],
    [1, 20.92, 25.09, 143, 1347, 0.1099, 0.2236, 0.3174, 0.1474, 0.2149, 0.06879, 0.9622, 1.026, 8.758, 118.8, 0.006399, 0.0431, 0.07845, 0.02624, 0.02057, 0.006213, 24.29, 29.41, 179.1, 1819, 0.1407, 0.4186, 0.6599, 0.2542, 0.2929, 0.09873],
    [-1, 12.36, 18.54, 79.01, 466.7, 0.08477, 0.06815, 0.02643, 0.01921, 0.1602, 0.06066, 0.1199, 0.8944, 0.8484, 9.227, 0.003457, 0.01047, 0.01167, 0.005558, 0.01251, 0.001356, 13.29, 27.49, 85.56, 544.1, 0.1184, 0.1963, 0.1937, 0.08442, 0.2983, 0.07185],
    [1, 21.56, 22.39, 142, 1479, 0.111, 0.1159, 0.2439, 0.1389, 0.1726, 0.05623, 1.176, 1.256, 7.673, 158.7, 0.0103, 0.02891, 0.05198, 0.02454, 0.01114, 0.004239, 25.45, 26.4, 166.1, 2027, 0.141, 0.2113, 0.4107, 0.2216, 0.206, 0.07115],
    [-1, 9.777, 16.99, 62.5, 290.2, 0.1037, 0.08404, 0.04334, 0.01778, 0.1584, 0.07065, 0.403, 1.424, 2.747, 22.87, 0.01385, 0.02932, 0.02722, 0.01023, 0.03281, 0.004638, 11.05, 21.47, 71.68, 367, 0.1467, 0.1765, 0.13, 0.05334, 0.2533, 0.08468],
    [1, 20.13, 28.25, 131.2, 1261, 0.0978, 0.1034, 0.144, 0.09791, 0.1752, 0.05533, 0.7655, 2.463, 5.203, 99.04, 0.005769, 0.02423, 0.0395, 0.01678, 0.01898, 0.002498, 23.69, 38.25, 155, 1731, 0.1166, 0.1922, 0.3215, 0.1628, 0.2572, 0.06637],
    [-1, 12.63, 20.76, 82.15, 480.4, 0.09933, 0.1209, 0.1065, 0.06021, 0.1735, 0.0707, 0.3424, 1.803, 2.711, 20.48, 0.01291, 0.04042, 0.05101, 0.02295, 0.02144, 0.005891, 13.33, 25.47, 89, 527.4, 0.1287, 0.225, 0.2216, 0.1105, 0.2226, 0.08486],
    [1, 16.6, 28.08, 108.3, 858.1, 0.08455, 0.1023, 0.09251, 0.05302, 0.159, 0.05648, 0.4564, 1.075, 3.425, 48.55, 0.005903, 0.03731, 0.0473, 0.01557, 0.01318, 0.003892, 18.98, 34.12, 126.7, 1124, 0.1139, 0.3094, 0.3403, 0.1418, 0.2218, 0.0782],
    [-1, 14.26, 19.65, 97.83, 629.9, 0.07837, 0.2233, 0.3003, 0.07798, 0.1704, 0.07769, 0.3628, 1.49, 3.399, 29.25, 0.005298, 0.07446, 0.1435, 0.02292, 0.02566, 0.01298, 15.3, 23.73, 107, 709, 0.08949, 0.4193, 0, 0.1503, 0.07247, 0.2438, 0.08541],
    [1, 20.6, 29.33, 140.1, 1265, 0.1178, 0.277, 0.3514, 0.152, 0.2397, 0.07016, 0.726, 1.595, 5.772, 86.22, 0.006522, 0.06158, 0.07117, 0.01664, 0.02324, 0.006185, 25.74, 39.42, 184.6, 1821, 0.165, 0.8681, 0.9387, 0.265, 0.4087, 0.124]
]


## What to Do
 
Demonstrate your understanding and ability to have synthesized course concepts by implementing a `test` function for the perceptron. Your goal is to implement the `test` function, stubbed for you below, in the subclass `TestablePerceptron`. The `test` function should return the accuracy rate of the perceptron's prediction, given `diagnoses_test`. In the end, your work should reflect the principles seen thus far in the course.

Please be sure to demonstrate:

1. Your implementation, as code, in the subclass below.
2. Using your test function, which is already done for you, following the class definition.

## Tips

- Be sure that you have spent time with the Exploration materials in this course.
- Ask questions on the course forum if you get stuck (describe what you are trying to do, and errors that you encounter)
- Keep it simple. This is quite straightforward.
- Be sure to run the code cell containing your TestablePerceptron class and the code cell that invokes the `test` method, or use the *>> Run All* button.

## Implementation As Code

In the code cell below, implement the `test` function by replacing `pass` with your code. It should accept a training set as input, and *return* a number representing the accuracy of the perceptron based on predictions made with the test set.

In [3]:
class TestablePerceptron(Perceptron):
    
    def test(self, test_set):
        correct_classifications = 0
        for record in test_set:
            y = record[0]
            y_hat = self.predict(record[1:])
            if y == y_hat: correct_classifications += 1
        accuracy = correct_classifications / len(test_set)
        return accuracy

*Based on the code written, I have the class add a new method called test, which takes the test_set parameter as input and returns the accuracy of the perceptron's predictions on the test set.*

In [4]:
perceptron = TestablePerceptron(alpha = 0.1, max_epochs = 10000)
perceptron.train(diagnoses_training)
print(perceptron.test(diagnoses_test))

Epoch 4525 Accuracy: 1.0
0.9166666666666666


Your TestablePerceptron should result in the following code, which trains and tests a TestablePerceptron, running without error. You do not need to modify the code above, but once your `test` method above is complete, you should see the accuracy of the test printed, instead of `None`.

## Conclusion of Part 1

Write a few sentences here that describes what your `test` function does in "plain English." Try expressing this accurately and authentically *without* giving us a line-by-line summary of what your function does (we can read the code for that). Tell us what the test error is. Then, describe how many learning epochs it took for the perceptron to achive 100% accuracy during *training*, and explain why that number changes each time you run the code cell above.

*What my test function does is evaluate the accuracy of a Perceptron model on a given test set by comparing the actual target values to the predicted target values. It does this by looping over each record in the test set and calculating the number of correct predictions made by the model. The test error is 1 minus the accuracy for perceptron when it comes to the test set. It means that the proportion of the test set examples are misclassified by the model.*

*The number of learning epochs it took for perceptron to achieve 100% accuracy during training can vary each time the code cell is run because the order in which the training examples are presented to the model is randomized. It can result in slightly different model weights and biases, which can impact the number of epochs required for the model to come to a conclusion (solution).* 


# Part 2: Mine Detection

In this, the second, part of this notebook, you will build a classifier that can predict whether or not a sonar signature is from a mine or a rock. We'll use a version of the [sonar data set](https://www.openml.org/search?type=data&sort=runs&id=40&status=active) by Gorman and Sejnowski. Take a moment now to [familiarize yourself with the subject matter of this data set](https://datahub.io/machine-learning/sonar%23resource-sonar), and look at the details of the version of this data set, [Mines vs Rocks, hosted on Kaggle](https://www.kaggle.com/datasets/mattcarter865/mines-vs-rocks).

Unlike previous notebooks, where we provide code for each step of the ML process, this notebook expects each student to implement the ML workflow steps. We will get you started by providing the first step, loading the data, and providing some landmarks below. Your process should demonstrate:

1. Loading the data
2. Exploring the data
3. Preprocessing the data
4. Preparing the training and test sets
5. Creating and configuring a sklearn.linear_model.Perceptron
6. Training the perceptron
7. Testing the perceptron
8. Demonstrating making predictions
9. Evaluate (and Improve) the results

Can you train a classifier that can predict whether a sonar signature is from a mine or a rock? "Three trained human subjects were each tested on 100 signals, chosen at random from the set of 208 returns used to create this data set. Their responses ranged between 88% and 97% correct." Can your classifier outperform the human subjects?



## Step 1: Load the Data

The notebook comes pre-bundled with the [Mines vs Rocks data set](https://www.kaggle.com/datasets/mattcarter865/mines-vs-rocks). Our first step is to create a pandas DataFrame from the CSV file. Note that the CSV file has no header row. Loading the CSV file into a DataFrame will make it easy for us to explore the data, preprocess it, and split it into training and test sets.


In [5]:
import pandas as pd
import matplotlib.pyplot as plt

sonar_csv_path = "../input/mines-vs-rocks/sonar.all-data.csv"
sonar_data = pd.read_csv(sonar_csv_path, header=None)

We now have a pandas DataFrame encapsulating the sonar data, and can proceed with our data exploration.

## Step 2: Explore the Data

*I am planning to use the 'info' method to display summary information. By displaying summary information such as the number of rows and columns in the data frame, I am able to dive deeper into the different types of data in each column and find any missing values.*


In [6]:
print(sonar_data.info())


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 208 entries, 0 to 207
Data columns (total 61 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   0       208 non-null    float64
 1   1       208 non-null    float64
 2   2       208 non-null    float64
 3   3       208 non-null    float64
 4   4       208 non-null    float64
 5   5       208 non-null    float64
 6   6       208 non-null    float64
 7   7       208 non-null    float64
 8   8       208 non-null    float64
 9   9       208 non-null    float64
 10  10      208 non-null    float64
 11  11      208 non-null    float64
 12  12      208 non-null    float64
 13  13      208 non-null    float64
 14  14      208 non-null    float64
 15  15      208 non-null    float64
 16  16      208 non-null    float64
 17  17      208 non-null    float64
 18  18      208 non-null    float64
 19  19      208 non-null    float64
 20  20      208 non-null    float64
 21  21      208 non-null    float64
 22  22

*I also think that using the 'describe' method is useful when computing summary statistics for each numeric column in the data frame. I am able to dive further on statiscal values such as; the mean, standard deviation, min, max for each numeric column.*

In [7]:
print(sonar_data.describe())

               0           1           2           3           4           5   \
count  208.000000  208.000000  208.000000  208.000000  208.000000  208.000000   
mean     0.029164    0.038437    0.043832    0.053892    0.075202    0.104570   
std      0.022991    0.032960    0.038428    0.046528    0.055552    0.059105   
min      0.001500    0.000600    0.001500    0.005800    0.006700    0.010200   
25%      0.013350    0.016450    0.018950    0.024375    0.038050    0.067025   
50%      0.022800    0.030800    0.034300    0.044050    0.062500    0.092150   
75%      0.035550    0.047950    0.057950    0.064500    0.100275    0.134125   
max      0.137100    0.233900    0.305900    0.426400    0.401000    0.382300   

               6           7           8           9   ...          50  \
count  208.000000  208.000000  208.000000  208.000000  ...  208.000000   
mean     0.121747    0.134799    0.178003    0.208259  ...    0.016069   
std      0.061788    0.085152    0.118387    0.1

*When running print(sonar_data.info()), the information about the data frame will display in the output. The output includes the number of rows and columns, the data types of each column, and the presence of missing values. I believe it will be useful when understanding the structure of the dataset.*

*After running print(sonar_data.describe()), a table will be displayed showing summary statistics for each numeric column in the Mines vs Rocks dataset. The table will include count, mean, standard deviation, min, max for each column. I am able to gain insights into the range and distrbution of each predictor, which is very helpful when making decisions on preprocessing and visualizing.*

*I notice in the 'info' method, that the type float64 is present in all columns indicating that they are numeric variables. I also haven't seen any missing values in the dataset when exploring my analysis.*

## Step 3: Preprocess the Data

*This code will use the data from Mines vs Rocks dataset and into a pandas data frame called donar_data. I separated the predictor variables and target variables from the data frame using 'iloc'. 
The next step was to standardize the predictor variables. When I subtract the mean from each of the columns and divide it by the standard deviation, the data is center around 0 and is scaled to have a standard deviation of 1 using the pandas library.* 

*Next, I encoded the target variable as a binary variable, because when I use the 'map' function, it maps the values of R and M to the numeric values 0, and 1. Lastly, I just print out using the 'head' method to display the first few rows of preprocessed data.*



In [8]:
# Separate the predictor variables and the target variable
X = sonar_data.iloc[:, :-1]
y = sonar_data.iloc[:, -1]

# Standardize the predictor variables
X = (X - X.mean()) / X.std()

# Encode the target variable as a binary variable
y = y.map({'R': 0, 'M': 1})

print(sonar_data.head())

       0       1       2       3       4       5       6       7       8   \
0  0.0200  0.0371  0.0428  0.0207  0.0954  0.0986  0.1539  0.1601  0.3109   
1  0.0453  0.0523  0.0843  0.0689  0.1183  0.2583  0.2156  0.3481  0.3337   
2  0.0262  0.0582  0.1099  0.1083  0.0974  0.2280  0.2431  0.3771  0.5598   
3  0.0100  0.0171  0.0623  0.0205  0.0205  0.0368  0.1098  0.1276  0.0598   
4  0.0762  0.0666  0.0481  0.0394  0.0590  0.0649  0.1209  0.2467  0.3564   

       9   ...      51      52      53      54      55      56      57  \
0  0.2111  ...  0.0027  0.0065  0.0159  0.0072  0.0167  0.0180  0.0084   
1  0.2872  ...  0.0084  0.0089  0.0048  0.0094  0.0191  0.0140  0.0049   
2  0.6194  ...  0.0232  0.0166  0.0095  0.0180  0.0244  0.0316  0.0164   
3  0.1264  ...  0.0121  0.0036  0.0150  0.0085  0.0073  0.0050  0.0044   
4  0.4459  ...  0.0031  0.0054  0.0105  0.0110  0.0015  0.0072  0.0048   

       58      59  60  
0  0.0090  0.0032   R  
1  0.0052  0.0044   R  
2  0.0095  0.0078   

*The values in each row correspond to the standardized frequency response reading , while the last column indicates whether the sonar signal was reflected from a rock or a mine.*
 
*I am cautious about is that the predictor  variables have been standardized, meaning that they are now comparable to each other in terms of their magnitudes and scales. Another thing I am aware about is that the target variable has been encoded as a binary variable, meaning that it can be used as the target variable for a classification model.*

## Step 4: Prepare the Training and Test Data Sets

*After separating the label columns above, the next steps is to use train_test_split function to randomly split up the data into two sets (training and testing). I used a 75/25 split for this test.I set the test_size to be 0.25, and the random_state to be 50 as a fixed value.*

In [9]:
from sklearn.model_selection import train_test_split

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=50)


*Based on splitting the data to be 75/25, 25% of the data is used for testing, while 75% is used for training. I set the random_state to be 50, because it's a random number I made up to help with randomizing a number in a generator used by a train_test_split function. It is also used to make sure the same split is obtained every time in each run.*

## Step 5: Instantiate and Configure a Perceptron

*After reading the documentation about sklearn.linear_model, I created a plan to create an instance of the perceptron class from the scikit-learn library. I configured the perceptron object by changing the settings of hyperparameters of its learning rate, penalty, maximum iterations.* 

In [10]:
from sklearn.linear_model import Perceptron

perceptron = Perceptron(alpha=0.001, penalty='l2', max_iter=1000)

*What I learn from the documentation is that penalty a regularization term. Alpha is the constant that multiplies the regularzation term whether penalty is not None. I left max_iter at 1000 as default. I am aware that changing hyperparameters such as learning rates, and number of iterations can change the models performance.*

## Step 6: Train the Perceptron

*During this step, I am training  the perceptron model on the training data using the 'fit' method. The 'fit' method adjusts the weights of the perceptron model based on the input data and target labels to make accurate predictions.When I train the perceptron with training data, I expect it to learn and make decisions that separates the two classes in the input data.* 

In [11]:
perceptron.fit(X_train, y_train)
print(perceptron.coef_)

[[ 10.2078455   -5.88900334  -5.87087624   9.23508891  -0.06451157
   -1.52808759  -9.94071347  -6.62191235   8.76147944  -0.71174532
    1.77098057   9.42786181   3.56122738   2.39295479   2.07182407
   -4.96581888  -7.63042175   1.99202981   2.96006026   7.29242865
    5.77405439   0.92157968   3.58578393   8.085807     2.86091137
    3.55694463   4.13729908   0.70149901   4.63777617   5.24334197
   -7.8546351    8.57880863   1.61737225   1.51995502   1.6579728
   -6.46064903  -5.19526423   0.50418289  -2.40831971 -11.99577187
    0.2117059    1.12670828   3.68672559  -0.99358779   5.4950311
    5.45955366   1.15768667   8.1979555   10.40831806  -8.3024678
    8.08638964   7.01735493  -3.05577402   3.56346032  -8.67267147
   -1.98542422  -6.30834658   1.82566527   0.76688226   5.96767085]]


*After training (running the code), the perceptron model is trained on training data, and weighted to ensure accurate predictions.It is also good to note that the perceptron model is based on its hyperparameters along with other factors (quality of data).*

## Step 7: Test the Perceptron

**Training set accuracy**

*In this step, I am testing the trained perceptron model on the test data and test the accuracy of the model using the 'accuracy_score()' function. I wanted to take this approach to make sure the model is not overfitting when being trained.*


In [12]:
from sklearn.metrics import accuracy_score

# make predictions on the training set
y_pred_train = perceptron.predict(X_train)

# calculate the accuracy of the perceptron on the training set
accuracy_train = accuracy_score(y_train, y_pred_train)

print(f"Perceptron training set accuracy: {accuracy_train}")

Perceptron training set accuracy: 0.7948717948717948


*After running the code, the perceptron training set accuracy is 0.79. The accuracy score tells us how the model is performing, and the high accuracy score will show me that the model is doing well with newer data. If the accuracy is low, then the model isn't as great and should be revise.* 

**Testing Set Accuracy**

*In this step, I am testing the test set using the trained perceptron model to make predictions. I wanted it to calculate the accuracy of the perceptron model, and have it compare the predicted labels with the true labels. I wanted to take this approach to make sure the model is not overfitting when being trained.*

In [13]:
# make predictions on the test set
y_pred = perceptron.predict(X_test)

# calculate the accuracy of the perceptron on the test set
accuracy = accuracy_score(y_test, y_pred)

print(f"Perceptron test set accuracy: {accuracy}")

Perceptron test set accuracy: 0.7307692307692307


*The output of the code is the accuracy score of the perceptron model on the test set. What it does is it measures the fraction of correctly classified instances for the test set.* 

## Step 8: Demonstrate Making Predictions

*In this section, I am planning to use the trained perceptron model to make new predictions on data that are unseen. The 'predict' method will return an array of predicted class labels in the new data for each of the instances. I want to have the model be used to classify new data based on it's learning algorithm from the training data.*

In [14]:
y_pred = perceptron.predict(X_test)
print(y_pred)

[1 0 0 0 0 0 1 1 1 1 1 1 1 1 0 1 1 0 0 1 1 1 1 1 0 1 1 0 0 0 1 1 1 1 1 0 1
 0 0 1 1 0 0 0 0 1 1 0 1 0 1 0]


*After runnning the code, we can get an array of predicted class labels for instances. The new data is based on the learning patterns of the model using the training data. The data can be noisy and contain outliers, which may alter the results.*

## Step 9: Evaluate (and Improve?)

Describe the configuration and performance of your classifier, and what the results mean. Is this a good classifier? Why isn't it better than what it is? What might you try next to improve it? How accurate can you make your classifier? If you have time, see if you can increase the accuracy. Can you beat 98%? 🙀

*The classifier used in the configuration and performance is a perceptron model, where it has al earning rate or 0.001, a regularization penalty of L2, and a maximum number of iterations of 1000. This model is a linear binary classifer that learns how to split the input data into two different classes based on what I asked. The perceptron model score an accuracy of 0.73 on the test set, and 0.79 on the training set, which means that it ranges around moderate-okay performance and shows signs of further improvement.*

*I would try to improve the performance of the classifer is to write a more complex model, where the model can learn more complex patterns in the data.* 

*The accuracy of the classifer can be improve if I went with the route of making a more complex model where it incorporates different methods, and deep learning techniques.*

## Conclusion

Give us a quick recap of what you've done here in **Part 2**. Mention _three_ things that were most notable in this process, whether it's related to exploration, preprocessing, configuring, training, or evaluating. If you put in the effort to try to improve the perceptron, describe what you did and what let you to try what you did, and describe the results. Conclude with some questions about the problem or what you might do next to increase the performance of your classifier.

*Throughout this notebook, I implemented a perceptron model where it preprocessed the data by splitting it into two sets. The training set and test sets were created. I used the scikit-learn library to help with creating the perceptron model where the performance of the test set is paired with the accuracy score metric.*

*There are three notable things I can think of in this process. The first one is the simpleness of the perceptron model I've created and its learning algorithm, where I can easily implement it without stressing out. The next thing I find notable is how important preprocessing the input data can be and that the training and test sets are representing the same underlying distrbution. The last thing is carefully monitoring data and values if there is undefitting or overfitting when configuring the hyperparameters (hence why I left it default).*

*What I may do next for future exploration dig deeper into more machine learning models. I want to learn more about how those models are used elsewhere, classifying the dataset, and compare each model in terms of performance.* 