# Machine Learning Project

## Predict whether a mammogram mass is benign or malignant

Using the "mammographic masses" public dataset from the UCI repository (source: https://archive.ics.uci.edu/ml/datasets/Mammographic+Mass)

This data contains 961 instances of masses detected in mammograms, and contains the following attributes:


   1. BI-RADS assessment: 1 to 5 (ordinal)  
   2. Age: patient's age in years (integer)
   3. Shape: mass shape: round=1 oval=2 lobular=3 irregular=4 (nominal)
   4. Margin: mass margin: circumscribed=1 microlobulated=2 obscured=3 ill-defined=4 spiculated=5 (nominal)
   5. Density: mass density high=1 iso=2 low=3 fat-containing=4 (ordinal)
   6. Severity: benign=0 or malignant=1 (binominal)
   
BI-RADS is an assesment of how confident the severity classification is; it is not a "predictive" attribute and so we will discard it. The age, shape, margin, and density attributes are the features that we will build our model with, and "severity" is the classification we will attempt to predict based on those attributes.

Although "shape" and "margin" are nominal data types, which sklearn typically doesn't deal with well, they are close enough to ordinal that we shouldn't just discard them. The "shape" for example is ordered increasingly from round to irregular.

A lot of unnecessary anguish and surgery arises from false positives arising from mammogram results. If we can build a better way to interpret them through supervised machine learning, it could improve a lot of lives.

several different supervised machine learning techniques were applied to this data set, and using that I tried to determine which one yields the highest accuracy.
* Decision tree
* Random forest
* KNN
* Naive Bayes
* SVM
* Logistic Regression
* a neural network using Keras.


In [5]:
from azureml.core import Workspace
ws = Workspace.from_config()
from azureml.core import Experiment

In [None]:
for i in range (100):
    
    #cleaning and preparing the data
    import numpy as np
    import pandas as pd
    import sklearn
    from sklearn.utils import shuffle
    
    input_file = "mammographic_masses.data"
    df = pd.read_csv(input_file, header=0)
    df = shuffle(df)
    print(df.head())
    
    df = df.replace("?","NaN")
    df = df.rename(columns = {'5':'BI_RADS', '67':'age','3':'shape','5.1':'margin','3.1':'density','1':'severity'})
    df = df.drop("BI_RADS",axis=1)
    print(df.head())
    
    print(df.describe())
    
    #studying the data
    
    df['age'] = pd.to_numeric(df['age'], errors = 'coerce')
    meanAge = df.loc[:,'age'].mean()
    print('mean of age : ' + str(meanAge))

    df['shape'] = pd.to_numeric(df['shape'], errors = 'coerce')
    modeShape = df.loc[:,'shape'].mode()
    print('mode of shape : ' + str(modeShape))

    df['margin'] = pd.to_numeric(df['margin'], errors = 'coerce')
    modeMargin = df.loc[:,'margin'].mode()
    print('mode of margin : ' + str(modeMargin))

    df['density'] = pd.to_numeric(df['density'], errors = 'coerce')
    modeDensity = df.loc[:,'density'].mode()
    print('mode of density : ' + str(modeDensity))

    df['severity'] = pd.to_numeric(df['severity'], errors = 'coerce')
    modeSeverity = df.loc[:,'severity'].mode()
    print('mode of severity : ' + str(modeSeverity))

    print(df.count)
    
    #dropping null values    
    df = df.dropna()
    df.head(10)
    print(df.count)
    
    #Decision Tree model   
    experiment = Experiment(workspace=ws, name="MDHT-decisiontrees")
    run = experiment.start_logging()

    from sklearn.model_selection import train_test_split
    import numpy as np
    from pylab import *
    X_train1 = df.iloc[0:600, 0:4]
    y_train1 = df.iloc[0:600, 4:5]
    X_test1 = df.iloc[600: , 0:4]
    y_test1 = df.iloc[600: , 4:5]
    from sklearn import tree
    clf1 = tree.DecisionTreeClassifier()
    clf1 = clf1.fit(X_train1,y_train1)
    from sklearn import metrics
    y_pred1 = clf1.predict(X_test1)
    print("Accuracy of Decision Tree : ", metrics.accuracy_score(y_test1, y_pred1))
    run.log("accuracy", metrics.accuracy_score(y_test1, y_pred1))
    run.complete()
    
    #Random forest model   
    experiment = Experiment(workspace=ws, name="MDHT-randomforest")
    run = experiment.start_logging()

    from sklearn.ensemble import RandomForestClassifier
    from sklearn import metrics
    X3 = df.iloc[: , 0:4]
    y3 = df.iloc[: , 4:5]
    X_train3, X_test3, y_train3, y_test3 = train_test_split(X3, y3, test_size=0.25)
    clf3 = RandomForestClassifier(n_estimators=100)
    clf3.fit(X_train3, y_train3)
    y_pred3 = clf3.predict(X_test3)
    print("Accuracy of Random Forest : ",metrics.accuracy_score(y_test3, y_pred3))
    run.log("accuracy", metrics.accuracy_score(y_test3, y_pred3))
    run.complete()
    
    #SVM Linear kernel
    experiment = Experiment(workspace=ws, name="MDHT-SVMlinearkernel")
    run = experiment.start_logging()

    from sklearn import svm, metrics
    C=1.0
    X4 = df.iloc[: , 0:4]
    y4 = df.iloc[: , 4:5]
    X_train4, X_test4, y_train4, y_test4 = train_test_split(X4, y4, test_size=0.25)
    clf4 = svm.SVC(kernel='linear', C=C).fit(X4,y4)
    y_pred4 = clf4.predict(X_test4)
    print("Accuracy of SVM with linear kernel : ", metrics.accuracy_score(y_test4, y_pred4))
    run.log("accuracy", metrics.accuracy_score(y_test4, y_pred4))
    run.complete()
    
    #KNN
    experiment = Experiment(workspace=ws, name="MDHT-KNN")
    run = experiment.start_logging()

    from matplotlib import pyplot as plt
    from sklearn.metrics import confusion_matrix
    from sklearn.neighbors import KNeighborsClassifier
    import seaborn as sns
    X5 = df.iloc[: , 0:4]
    y5 = df.iloc[: , 4:5]
    X_train5, X_test5, y_train5, y_test5 = train_test_split(X5, y5, test_size=0.25)
    knn = KNeighborsClassifier(n_neighbors=10, metric = 'euclidean')
    knn.fit(X_train5, y_train5)
    y_pred5 = knn.predict(X_test5)
    print("Accuracy of KNN : ", metrics.accuracy_score(y_test5, y_pred5))
    run.log("accuracy", metrics.accuracy_score(y_test5, y_pred5))
    run.complete()
    
    # KNN i Neighbours
    experiment = Experiment(workspace=ws, name="MDHT-KNNineighbours")
    run = experiment.start_logging()
    l = []
    for i in range (1,51):
        X6 = df.iloc[: , 0:4]
        y6 = df.iloc[: , 4:5]
        X_train6, X_test6, y_train6, y_test6 = train_test_split(X6, y6, test_size=0.25)
        knn = KNeighborsClassifier(n_neighbors=i, metric = 'euclidean')
        knn.fit(X_train6, y_train6)
        y_pred6 = knn.predict(X_test6)
        l.append( metrics.accuracy_score(y_test6, y_pred6))
    print(max(l))
    run.log("accuracy", max(l))
    run.complete()
    
    #Naive Bayes
    experiment = Experiment(workspace=ws, name="MDHT-naivebayes")
    run = experiment.start_logging()

    from sklearn.naive_bayes import MultinomialNB
    X7 = df.iloc[: , 0:4]
    y7 = df.iloc[: , 4:5]
    X_train7, X_test7, y_train7, y_test7 = train_test_split(X7, y7, test_size=0.25)
    mnb = MultinomialNB()
    mnb.fit(X_train7, y_train7)
    y_pred7 = mnb.predict(X_test7)
    print ('Accuracy of Multinomial Naive Bayes : ' ,metrics.accuracy_score(y_test7, y_pred7) )
    run.log("accuracy", metrics.accuracy_score(y_test7, y_pred7))
    run.complete()
    
    #SVM rbf kerenel
    experiment = Experiment(workspace=ws, name="MDHT-SVMrbfkernel")
    run = experiment.start_logging()
    from sklearn import svm, metrics
    C=1.0
    X8 = df.iloc[: , 0:4]
    y8 = df.iloc[: , 4:5]
    X_train8, X_test8, y_train8, y_test8 = train_test_split(X8, y8, test_size=0.25)
    svm1 = svm.SVC(kernel='rbf', C=C).fit(X8,y8)
    y_pred8 = svm1.predict(X_test8)
    print("Accuracy of SVM with rbf kernel: ", metrics.accuracy_score(y_test8, y_pred8))
    run.log("accuracy", metrics.accuracy_score(y_test8, y_pred8))
    run.complete()
    
    #SVM sigmoid kernel
    experiment = Experiment(workspace=ws, name="MDHT-svmsigmoidkernel")
    run = experiment.start_logging()
    X9 = df.iloc[: , 0:4]
    y9 = df.iloc[: , 4:5]
    X_train9, X_test9, y_train9, y_test9 = train_test_split(X9, y9, test_size=0.25)
    svm2 = svm.SVC(kernel='sigmoid', C=C).fit(X9,y9)
    y_pred9 = svm2.predict(X_test9)
    print("Accuracy of SVM with sigmoid kernel: ", metrics.accuracy_score(y_test9, y_pred9))
    run.log("accuracy", metrics.accuracy_score(y_test9, y_pred9))
    run.complete()
    
    #Logistic regression
    experiment = Experiment(workspace=ws, name="MDHT-logisticregression")
    run = experiment.start_logging()
    from sklearn.linear_model import LogisticRegression
    X11 = df.iloc[: , 0:4]
    y11 = df.iloc[: , 4:5]
    X_train11, X_test11, y_train11, y_test11 = train_test_split(X11, y11, test_size=0.25)
    logreg = LogisticRegression()
    logreg.fit(X_train11,y_train11)
    y_pred11 = logreg.predict(X_test11)
    print("Accuracy of Logistic Regression : ", metrics.accuracy_score(y_test11, y_pred11))
    run.log("accuracy", metrics.accuracy_score(y_test11, y_pred11))
    run.complete()

    #Neural Networks (10 perceptrons - 3 layers)
    experiment = Experiment(workspace=ws, name="MDHT-neuralnetworks10perc3lay")
    run = experiment.start_logging()
    from sklearn.preprocessing import StandardScaler
    from sklearn.neural_network import MLPClassifier
    scaler = StandardScaler()
    X12 = df.iloc[: , 0:4]
    y12 = df.iloc[: , 4:5]
    X_train12, X_test12, y_train12, y_test12 = train_test_split(X12, y12, test_size=0.25)
    scaler.fit(X_train12)
    X_train12 = scaler.transform(X_train12)
    X_test12 = scaler.transform(X_test12)
    mlp1 = MLPClassifier(hidden_layer_sizes=(10,10,10), max_iter=1000)
    mlp1.fit(X_train12, y_train12.values.ravel())
    y_pred12 = mlp1.predict(X_test12)
    print("Accuracy of Neural Network with 10 perceptrons in each of the 3 hidden layers : ", metrics.accuracy_score(y_test12, y_pred12))
    run.log("accuracy", metrics.accuracy_score(y_test12, y_pred12))
    run.complete()
    
    #Neural Networks (20 perceptrons - 3 layers)
    experiment = Experiment(workspace=ws, name="MDHT-neuralnetworks20perc3lay")
    run = experiment.start_logging()
    from sklearn.preprocessing import StandardScaler
    from sklearn.neural_network import MLPClassifier
    scaler = StandardScaler()
    X13 = df.iloc[: , 0:4]
    y13 = df.iloc[: , 4:5]
    X_train13, X_test13, y_train13, y_test13 = train_test_split(X13, y13, test_size=0.25)
    scaler.fit(X_train13)
    X_train13 = scaler.transform(X_train13)
    X_test13 = scaler.transform(X_test13)
    mlp2 = MLPClassifier(hidden_layer_sizes=(20,20,20), max_iter=1000)
    mlp2.fit(X_train13, y_train13.values.ravel())
    y_pred13 = mlp2.predict(X_test13)
    print("Accuracy of Neural Network with 20 perceptrons in each of the 3 hidden layers : ", metrics.accuracy_score(y_test13, y_pred13))
    run.log("accuracy", metrics.accuracy_score(y_test13, y_pred13))
    run.complete()
    
    #Neural Networks (30 perceptrons - 3 layers)
    experiment = Experiment(workspace=ws, name="MDHT-neuralnetworks30perc3lay")
    run = experiment.start_logging()
    from sklearn.preprocessing import StandardScaler
    from sklearn.neural_network import MLPClassifier
    scaler = StandardScaler()
    X14 = df.iloc[: , 0:4]
    y14 = df.iloc[: , 4:5]
    X_train14, X_test14, y_train14, y_test14 = train_test_split(X14, y14, test_size=0.25)
    scaler.fit(X_train14)
    X_train14 = scaler.transform(X_train14)
    X_test14 = scaler.transform(X_test14)
    mlp3 = MLPClassifier(hidden_layer_sizes=(30,30,30), max_iter=1000)
    mlp3.fit(X_train14, y_train14.values.ravel())
    y_pred14 = mlp3.predict(X_test14)
    print("Accuracy of Neural Network with 30 perceptrons in each of the 3 hidden layers : ", metrics.accuracy_score(y_test14, y_pred14))
    run.log("accuracy", metrics.accuracy_score(y_test14, y_pred14))
    run.complete()
    
    #Neural Networks (30 perceptrons - 4 layers)
    experiment = Experiment(workspace=ws, name="MDHT-neuralnetworks30perc4lay")
    run = experiment.start_logging()
    from sklearn.preprocessing import StandardScaler
    from sklearn.neural_network import MLPClassifier
    scaler = StandardScaler()
    X15 = df.iloc[: , 0:4]
    y15 = df.iloc[: , 4:5]
    X_train15, X_test15, y_train15, y_test15 = train_test_split(X15, y15, test_size=0.25)
    scaler.fit(X_train15)
    X_train15 = scaler.transform(X_train15)
    X_test15 = scaler.transform(X_test15)
    mlp4 = MLPClassifier(hidden_layer_sizes=(30,30,30,30), max_iter=1000)
    mlp4.fit(X_train15, y_train15.values.ravel())
    y_pred15 = mlp4.predict(X_test15)
    print("Accuracy of Neural Network with 30 perceptrons in each of the 4 hidden layers : ", metrics.accuracy_score(y_test15, y_pred15))
    run.log("accuracy", metrics.accuracy_score(y_test15, y_pred15))
    run.complete()

     5  67  3 5.1 3.1  1
707  4  71  2   1   3  1
440  5  56  2   3   3  1
119  5  51  4   4   3  1
759  4  35  4   4   3  0
651  4  41  2   1   3  0
    age shape margin density  severity
707  71     2      1       3         1
440  56     2      3       3         1
119  51     4      4       3         1
759  35     4      4       3         0
651  41     2      1       3         0
         severity
count  960.000000
mean     0.462500
std      0.498852
min      0.000000
25%      0.000000
50%      0.000000
75%      1.000000
max      1.000000
mean of age : 55.47539267015707
mode of shape : 0    4.0
dtype: float64
mode of margin : 0    1.0
dtype: float64
mode of density : 0    3.0
dtype: float64
mode of severity : 0    0
dtype: int64
<bound method DataFrame.count of       age  shape  margin  density  severity
707  71.0    2.0     1.0      3.0         1
440  56.0    2.0     3.0      3.0         1
119  51.0    4.0     4.0      3.0         1
759  35.0    4.0     4.0      3.0         0
651  41



Accuracy of Random Forest :  0.75


  y = column_or_1d(y, warn=True)


Accuracy of SVM with linear kernel :  0.7836538461538461




Accuracy of KNN :  0.8028846153846154




0.8317307692307693


  y = column_or_1d(y, warn=True)


Accuracy of Multinomial Naive Bayes :  0.7548076923076923


  y = column_or_1d(y, warn=True)


Accuracy of SVM with rbf kernel:  0.8125


  y = column_or_1d(y, warn=True)


Accuracy of SVM with sigmoid kernel:  0.5192307692307693


  y = column_or_1d(y, warn=True)


Accuracy of Logistic Regression :  0.7596153846153846
Accuracy of Neural Network with 10 perceptrons in each of the 3 hidden layers :  0.7403846153846154
Accuracy of Neural Network with 20 perceptrons in each of the 3 hidden layers :  0.8076923076923077
Accuracy of Neural Network with 30 perceptrons in each of the 3 hidden layers :  0.7548076923076923
Accuracy of Neural Network with 30 perceptrons in each of the 4 hidden layers :  0.8076923076923077
     5  67  3 5.1 3.1  1
154  4  59  3   4   3  0
804  5  31  4   4   2  1
199  4  49  2   1   3  0
741  4  60  2   1   3  0
639  4  67  4   4   3  1
    age shape margin density  severity
154  59     3      4       3         0
804  31     4      4       2         1
199  49     2      1       3         0
741  60     2      1       3         0
639  67     4      4       3         1
         severity
count  960.000000
mean     0.462500
std      0.498852
min      0.000000
25%      0.000000
50%      0.000000
75%      1.000000
max      1.000000




Accuracy of Random Forest :  0.7451923076923077


  y = column_or_1d(y, warn=True)


Accuracy of SVM with linear kernel :  0.7692307692307693




Accuracy of KNN :  0.8028846153846154




0.8509615384615384


  y = column_or_1d(y, warn=True)


Accuracy of Multinomial Naive Bayes :  0.7115384615384616


  y = column_or_1d(y, warn=True)


Accuracy of SVM with rbf kernel:  0.8269230769230769


  y = column_or_1d(y, warn=True)


Accuracy of SVM with sigmoid kernel:  0.5192307692307693


  y = column_or_1d(y, warn=True)


Accuracy of Logistic Regression :  0.7788461538461539
Accuracy of Neural Network with 10 perceptrons in each of the 3 hidden layers :  0.7980769230769231
Accuracy of Neural Network with 20 perceptrons in each of the 3 hidden layers :  0.8269230769230769
Accuracy of Neural Network with 30 perceptrons in each of the 3 hidden layers :  0.7307692307692307
Accuracy of Neural Network with 30 perceptrons in each of the 4 hidden layers :  0.7692307692307693
     5  67  3 5.1 3.1  1
516  5  76  4   5   3  1
191  5  60  3   1   3  0
104  4  54  1   1   ?  0
10   3  42  2   1   3  1
838  4  50  2   1   3  0
    age shape margin density  severity
516  76     4      5       3         1
191  60     3      1       3         0
104  54     1      1     NaN         0
10   42     2      1       3         1
838  50     2      1       3         0
         severity
count  960.000000
mean     0.462500
std      0.498852
min      0.000000
25%      0.000000
50%      0.000000
75%      1.000000
max      1.000000




Accuracy of Random Forest :  0.7548076923076923


  y = column_or_1d(y, warn=True)


Accuracy of SVM with linear kernel :  0.7884615384615384




Accuracy of KNN :  0.7355769230769231




0.8269230769230769


  y = column_or_1d(y, warn=True)


Accuracy of Multinomial Naive Bayes :  0.75


  y = column_or_1d(y, warn=True)


Accuracy of SVM with rbf kernel:  0.8509615384615384


  y = column_or_1d(y, warn=True)


Accuracy of SVM with sigmoid kernel:  0.5625


  y = column_or_1d(y, warn=True)


Accuracy of Logistic Regression :  0.8509615384615384
Accuracy of Neural Network with 10 perceptrons in each of the 3 hidden layers :  0.8076923076923077
Accuracy of Neural Network with 20 perceptrons in each of the 3 hidden layers :  0.8028846153846154
Accuracy of Neural Network with 30 perceptrons in each of the 3 hidden layers :  0.8509615384615384
Accuracy of Neural Network with 30 perceptrons in each of the 4 hidden layers :  0.7932692307692307
     5  67  3 5.1 3.1  1
336  5  87  4   4   3  1
27   5  45  4   5   3  1
237  4  63  4   4   3  1
959  4  62  3   3   3  0
554  4  46  4   3   3  0
    age shape margin density  severity
336  87     4      4       3         1
27   45     4      5       3         1
237  63     4      4       3         1
959  62     3      3       3         0
554  46     4      3       3         0
         severity
count  960.000000
mean     0.462500
std      0.498852
min      0.000000
25%      0.000000
50%      0.000000
75%      1.000000
max      1.000000




Accuracy of Random Forest :  0.7788461538461539


  y = column_or_1d(y, warn=True)


Accuracy of SVM with linear kernel :  0.8221153846153846




Accuracy of KNN :  0.7644230769230769




0.8269230769230769


  y = column_or_1d(y, warn=True)


Accuracy of Multinomial Naive Bayes :  0.7307692307692307


  y = column_or_1d(y, warn=True)


Accuracy of SVM with rbf kernel:  0.875


  y = column_or_1d(y, warn=True)


Accuracy of SVM with sigmoid kernel:  0.4807692307692308


  y = column_or_1d(y, warn=True)


Accuracy of Logistic Regression :  0.8173076923076923
Accuracy of Neural Network with 10 perceptrons in each of the 3 hidden layers :  0.8413461538461539
Accuracy of Neural Network with 20 perceptrons in each of the 3 hidden layers :  0.7788461538461539
Accuracy of Neural Network with 30 perceptrons in each of the 3 hidden layers :  0.8125
Accuracy of Neural Network with 30 perceptrons in each of the 4 hidden layers :  0.7932692307692307
     5  67  3 5.1 3.1  1
824  4  63  1   1   3  0
841  5  62  4   4   3  1
296  4  50  4   5   3  1
478  4  36  1   1   3  0
396  4  53  2   1   3  0
    age shape margin density  severity
824  63     1      1       3         0
841  62     4      4       3         1
296  50     4      5       3         1
478  36     1      1       3         0
396  53     2      1       3         0
         severity
count  960.000000
mean     0.462500
std      0.498852
min      0.000000
25%      0.000000
50%      0.000000
75%      1.000000
max      1.000000
mean of age 



Accuracy of Random Forest :  0.7788461538461539


  y = column_or_1d(y, warn=True)


Accuracy of SVM with linear kernel :  0.8028846153846154




Accuracy of KNN :  0.7644230769230769




0.8269230769230769


  y = column_or_1d(y, warn=True)


Accuracy of Multinomial Naive Bayes :  0.7403846153846154


  y = column_or_1d(y, warn=True)


Accuracy of SVM with rbf kernel:  0.8942307692307693


  y = column_or_1d(y, warn=True)


Accuracy of SVM with sigmoid kernel:  0.47115384615384615


  y = column_or_1d(y, warn=True)


Accuracy of Logistic Regression :  0.8076923076923077
Accuracy of Neural Network with 10 perceptrons in each of the 3 hidden layers :  0.8269230769230769
Accuracy of Neural Network with 20 perceptrons in each of the 3 hidden layers :  0.8125
Accuracy of Neural Network with 30 perceptrons in each of the 3 hidden layers :  0.7596153846153846
Accuracy of Neural Network with 30 perceptrons in each of the 4 hidden layers :  0.8125
     5  67  3 5.1 3.1  1
696  4  59  2   1   3  1
516  5  76  4   5   3  1
354  4  36  2   1   3  0
750  4  32  2   1   3  0
260  4  43  1   1   3  0
    age shape margin density  severity
696  59     2      1       3         1
516  76     4      5       3         1
354  36     2      1       3         0
750  32     2      1       3         0
260  43     1      1       3         0
         severity
count  960.000000
mean     0.462500
std      0.498852
min      0.000000
25%      0.000000
50%      0.000000
75%      1.000000
max      1.000000
mean of age : 55.4753926



Accuracy of Random Forest :  0.7692307692307693


  y = column_or_1d(y, warn=True)


Accuracy of SVM with linear kernel :  0.8221153846153846




Accuracy of KNN :  0.7596153846153846




0.8317307692307693


  y = column_or_1d(y, warn=True)


Accuracy of Multinomial Naive Bayes :  0.7307692307692307


  y = column_or_1d(y, warn=True)


Accuracy of SVM with rbf kernel:  0.8605769230769231


  y = column_or_1d(y, warn=True)


Accuracy of SVM with sigmoid kernel:  0.5384615384615384


  y = column_or_1d(y, warn=True)


Accuracy of Logistic Regression :  0.8413461538461539
Accuracy of Neural Network with 10 perceptrons in each of the 3 hidden layers :  0.8076923076923077
Accuracy of Neural Network with 20 perceptrons in each of the 3 hidden layers :  0.8509615384615384
Accuracy of Neural Network with 30 perceptrons in each of the 3 hidden layers :  0.7836538461538461
Accuracy of Neural Network with 30 perceptrons in each of the 4 hidden layers :  0.7644230769230769
     5  67  3 5.1 3.1  1
596  4  59  4   4   3  0
934  4  71  1   1   3  1
455  0  69  4   5   3  1
367  5  58  4   4   3  1
161  4  23  3   1   3  0
    age shape margin density  severity
596  59     4      4       3         0
934  71     1      1       3         1
455  69     4      5       3         1
367  58     4      4       3         1
161  23     3      1       3         0
         severity
count  960.000000
mean     0.462500
std      0.498852
min      0.000000
25%      0.000000
50%      0.000000
75%      1.000000
max      1.000000




Accuracy of Random Forest :  0.7884615384615384


  y = column_or_1d(y, warn=True)


Accuracy of SVM with linear kernel :  0.8413461538461539




Accuracy of KNN :  0.7548076923076923




0.8221153846153846


  y = column_or_1d(y, warn=True)


Accuracy of Multinomial Naive Bayes :  0.7403846153846154


  y = column_or_1d(y, warn=True)


Accuracy of SVM with rbf kernel:  0.8509615384615384


  y = column_or_1d(y, warn=True)


Accuracy of SVM with sigmoid kernel:  0.4807692307692308


  y = column_or_1d(y, warn=True)


Accuracy of Logistic Regression :  0.8413461538461539
Accuracy of Neural Network with 10 perceptrons in each of the 3 hidden layers :  0.7980769230769231
Accuracy of Neural Network with 20 perceptrons in each of the 3 hidden layers :  0.7884615384615384
Accuracy of Neural Network with 30 perceptrons in each of the 3 hidden layers :  0.7932692307692307
Accuracy of Neural Network with 30 perceptrons in each of the 4 hidden layers :  0.8173076923076923
     5  67  3 5.1 3.1  1
81   3  68  1   1   3  1
462  4  18  1   1   3  0
474  4  36  1   1   3  0
238  4  24  2   1   2  0
370  3  46  1   ?   ?  0
    age shape margin density  severity
81   68     1      1       3         1
462  18     1      1       3         0
474  36     1      1       3         0
238  24     2      1       2         0
370  46     1    NaN     NaN         0
         severity
count  960.000000
mean     0.462500
std      0.498852
min      0.000000
25%      0.000000
50%      0.000000
75%      1.000000
max      1.000000




Accuracy of Random Forest :  0.8076923076923077


  y = column_or_1d(y, warn=True)


Accuracy of SVM with linear kernel :  0.7932692307692307




Accuracy of KNN :  0.7836538461538461




0.8269230769230769


  y = column_or_1d(y, warn=True)


Accuracy of Multinomial Naive Bayes :  0.7115384615384616


  y = column_or_1d(y, warn=True)


Accuracy of SVM with rbf kernel:  0.8413461538461539


  y = column_or_1d(y, warn=True)


Accuracy of SVM with sigmoid kernel:  0.4951923076923077


  y = column_or_1d(y, warn=True)


Accuracy of Logistic Regression :  0.7692307692307693
Accuracy of Neural Network with 10 perceptrons in each of the 3 hidden layers :  0.8365384615384616
Accuracy of Neural Network with 20 perceptrons in each of the 3 hidden layers :  0.8365384615384616
Accuracy of Neural Network with 30 perceptrons in each of the 3 hidden layers :  0.7740384615384616
Accuracy of Neural Network with 30 perceptrons in each of the 4 hidden layers :  0.7884615384615384
     5  67  3 5.1 3.1  1
99   5  59  2   ?   ?  0
943  5  70  1   4   3  1
619  4  51  3   4   3  0
580  2  65  ?   1   2  0
867  2  23  1   1   3  0
    age shape margin density  severity
99   59     2    NaN     NaN         0
943  70     1      4       3         1
619  51     3      4       3         0
580  65   NaN      1       2         0
867  23     1      1       3         0
         severity
count  960.000000
mean     0.462500
std      0.498852
min      0.000000
25%      0.000000
50%      0.000000
75%      1.000000
max      1.000000




Accuracy of Random Forest :  0.7163461538461539


  y = column_or_1d(y, warn=True)


Accuracy of SVM with linear kernel :  0.8317307692307693




Accuracy of KNN :  0.7451923076923077




0.8173076923076923


  y = column_or_1d(y, warn=True)


Accuracy of Multinomial Naive Bayes :  0.8125


  y = column_or_1d(y, warn=True)


Accuracy of SVM with rbf kernel:  0.8557692307692307


  y = column_or_1d(y, warn=True)


Accuracy of SVM with sigmoid kernel:  0.5288461538461539


  y = column_or_1d(y, warn=True)


Accuracy of Logistic Regression :  0.7836538461538461
Accuracy of Neural Network with 10 perceptrons in each of the 3 hidden layers :  0.7692307692307693
Accuracy of Neural Network with 20 perceptrons in each of the 3 hidden layers :  0.8028846153846154
Accuracy of Neural Network with 30 perceptrons in each of the 3 hidden layers :  0.7788461538461539
Accuracy of Neural Network with 30 perceptrons in each of the 4 hidden layers :  0.7788461538461539
     5  67  3 5.1 3.1  1
465  5  84  4   5   3  1
597  5  59  1   5   3  1
182  4  35  1   1   2  0
104  4  54  1   1   ?  0
802  4  53  2   1   3  0
    age shape margin density  severity
465  84     4      5       3         1
597  59     1      5       3         1
182  35     1      1       2         0
104  54     1      1     NaN         0
802  53     2      1       3         0
         severity
count  960.000000
mean     0.462500
std      0.498852
min      0.000000
25%      0.000000
50%      0.000000
75%      1.000000
max      1.000000




Accuracy of Random Forest :  0.7403846153846154


  y = column_or_1d(y, warn=True)


Accuracy of SVM with linear kernel :  0.8028846153846154




Accuracy of KNN :  0.7596153846153846




0.8413461538461539


  y = column_or_1d(y, warn=True)


Accuracy of Multinomial Naive Bayes :  0.7644230769230769


  y = column_or_1d(y, warn=True)


Accuracy of SVM with rbf kernel:  0.8076923076923077


  y = column_or_1d(y, warn=True)


Accuracy of SVM with sigmoid kernel:  0.5


  y = column_or_1d(y, warn=True)


Accuracy of Logistic Regression :  0.8125
Accuracy of Neural Network with 10 perceptrons in each of the 3 hidden layers :  0.75
Accuracy of Neural Network with 20 perceptrons in each of the 3 hidden layers :  0.8269230769230769
Accuracy of Neural Network with 30 perceptrons in each of the 3 hidden layers :  0.8076923076923077
Accuracy of Neural Network with 30 perceptrons in each of the 4 hidden layers :  0.7692307692307693
     5  67  3 5.1 3.1  1
352  4  43  1   1   3  0
176  4  45  2   1   2  0
563  5  79  1   4   3  1
523  2  57  1   1   3  0
678  4  24  2   1   3  0
    age shape margin density  severity
352  43     1      1       3         0
176  45     2      1       2         0
563  79     1      4       3         1
523  57     1      1       3         0
678  24     2      1       3         0
         severity
count  960.000000
mean     0.462500
std      0.498852
min      0.000000
25%      0.000000
50%      0.000000
75%      1.000000
max      1.000000
mean of age : 55.475392670



Accuracy of Random Forest :  0.7740384615384616


  y = column_or_1d(y, warn=True)


Accuracy of SVM with linear kernel :  0.8269230769230769




Accuracy of KNN :  0.8365384615384616




0.8365384615384616


  y = column_or_1d(y, warn=True)


Accuracy of Multinomial Naive Bayes :  0.7980769230769231


  y = column_or_1d(y, warn=True)


Accuracy of SVM with rbf kernel:  0.8269230769230769


  y = column_or_1d(y, warn=True)


Accuracy of SVM with sigmoid kernel:  0.49038461538461536


  y = column_or_1d(y, warn=True)


Accuracy of Logistic Regression :  0.7932692307692307
Accuracy of Neural Network with 10 perceptrons in each of the 3 hidden layers :  0.7932692307692307
Accuracy of Neural Network with 20 perceptrons in each of the 3 hidden layers :  0.7788461538461539
Accuracy of Neural Network with 30 perceptrons in each of the 3 hidden layers :  0.8269230769230769
Accuracy of Neural Network with 30 perceptrons in each of the 4 hidden layers :  0.7836538461538461
     5  67  3 5.1 3.1  1
233  5  64  4   5   3  1
156  4  51  ?   ?   3  0
116  4  57  2   1   2  0
638  5  52  4   5   3  1
100  4  65  2   ?   ?  0
    age shape margin density  severity
233  64     4      5       3         1
156  51   NaN    NaN       3         0
116  57     2      1       2         0
638  52     4      5       3         1
100  65     2    NaN     NaN         0
         severity
count  960.000000
mean     0.462500
std      0.498852
min      0.000000
25%      0.000000
50%      0.000000
75%      1.000000
max      1.000000




Accuracy of Random Forest :  0.7980769230769231


  y = column_or_1d(y, warn=True)


Accuracy of SVM with linear kernel :  0.7932692307692307




Accuracy of KNN :  0.7451923076923077




0.8413461538461539


  y = column_or_1d(y, warn=True)


Accuracy of Multinomial Naive Bayes :  0.7740384615384616


  y = column_or_1d(y, warn=True)


Accuracy of SVM with rbf kernel:  0.8846153846153846


  y = column_or_1d(y, warn=True)


Accuracy of SVM with sigmoid kernel:  0.4855769230769231


  y = column_or_1d(y, warn=True)


Accuracy of Logistic Regression :  0.7596153846153846
Accuracy of Neural Network with 10 perceptrons in each of the 3 hidden layers :  0.7836538461538461
Accuracy of Neural Network with 20 perceptrons in each of the 3 hidden layers :  0.8221153846153846
Accuracy of Neural Network with 30 perceptrons in each of the 3 hidden layers :  0.7788461538461539
Accuracy of Neural Network with 30 perceptrons in each of the 4 hidden layers :  0.8317307692307693
     5  67  3 5.1 3.1  1
894  4  72  3   3   3  1
686  4  64  4   4   3  1
29   4  46  1   5   2  0
79   4  67  4   5   3  0
477  5  51  4   4   2  1
    age shape margin density  severity
894  72     3      3       3         1
686  64     4      4       3         1
29   46     1      5       2         0
79   67     4      5       3         0
477  51     4      4       2         1
         severity
count  960.000000
mean     0.462500
std      0.498852
min      0.000000
25%      0.000000
50%      0.000000
75%      1.000000
max      1.000000




Accuracy of Random Forest :  0.7836538461538461


  y = column_or_1d(y, warn=True)


Accuracy of SVM with linear kernel :  0.8365384615384616




Accuracy of KNN :  0.7980769230769231




0.8076923076923077


  y = column_or_1d(y, warn=True)


Accuracy of Multinomial Naive Bayes :  0.7067307692307693


  y = column_or_1d(y, warn=True)


Accuracy of SVM with rbf kernel:  0.8269230769230769


  y = column_or_1d(y, warn=True)


Accuracy of SVM with sigmoid kernel:  0.5048076923076923


  y = column_or_1d(y, warn=True)


Accuracy of Logistic Regression :  0.7692307692307693
Accuracy of Neural Network with 10 perceptrons in each of the 3 hidden layers :  0.75
Accuracy of Neural Network with 20 perceptrons in each of the 3 hidden layers :  0.8269230769230769
Accuracy of Neural Network with 30 perceptrons in each of the 3 hidden layers :  0.8125
Accuracy of Neural Network with 30 perceptrons in each of the 4 hidden layers :  0.7980769230769231
     5  67  3 5.1 3.1  1
926  4  20  1   1   3  0
498  5  80  4   5   3  1
707  4  71  2   1   3  1
774  3  39  1   1   3  0
193  4  50  1   1   3  0
    age shape margin density  severity
926  20     1      1       3         0
498  80     4      5       3         1
707  71     2      1       3         1
774  39     1      1       3         0
193  50     1      1       3         0
         severity
count  960.000000
mean     0.462500
std      0.498852
min      0.000000
25%      0.000000
50%      0.000000
75%      1.000000
max      1.000000
mean of age : 55.475392670



Accuracy of Random Forest :  0.7596153846153846


  y = column_or_1d(y, warn=True)


Accuracy of SVM with linear kernel :  0.7980769230769231




Accuracy of KNN :  0.8076923076923077




0.8317307692307693


  y = column_or_1d(y, warn=True)


Accuracy of Multinomial Naive Bayes :  0.7355769230769231


  y = column_or_1d(y, warn=True)


Accuracy of SVM with rbf kernel:  0.8798076923076923


  y = column_or_1d(y, warn=True)


Accuracy of SVM with sigmoid kernel:  0.5576923076923077


  y = column_or_1d(y, warn=True)


Accuracy of Logistic Regression :  0.8125
Accuracy of Neural Network with 10 perceptrons in each of the 3 hidden layers :  0.7980769230769231
Accuracy of Neural Network with 20 perceptrons in each of the 3 hidden layers :  0.8413461538461539
Accuracy of Neural Network with 30 perceptrons in each of the 3 hidden layers :  0.7836538461538461
Accuracy of Neural Network with 30 perceptrons in each of the 4 hidden layers :  0.7884615384615384
     5  67  3 5.1 3.1  1
688  5  43  1   4   3  1
359  4  88  4   4   3  1
239  5  72  4   4   3  1
40   4  78  1   1   1  0
431  4  39  2   3   3  0
    age shape margin density  severity
688  43     1      4       3         1
359  88     4      4       3         1
239  72     4      4       3         1
40   78     1      1       1         0
431  39     2      3       3         0
         severity
count  960.000000
mean     0.462500
std      0.498852
min      0.000000
25%      0.000000
50%      0.000000
75%      1.000000
max      1.000000
mean of age 



Accuracy of Random Forest :  0.7548076923076923


  y = column_or_1d(y, warn=True)


Accuracy of SVM with linear kernel :  0.7692307692307693




Accuracy of KNN :  0.7403846153846154




0.8221153846153846


  y = column_or_1d(y, warn=True)


Accuracy of Multinomial Naive Bayes :  0.7115384615384616


  y = column_or_1d(y, warn=True)


Accuracy of SVM with rbf kernel:  0.8461538461538461


  y = column_or_1d(y, warn=True)


Accuracy of SVM with sigmoid kernel:  0.5048076923076923


  y = column_or_1d(y, warn=True)


Accuracy of Logistic Regression :  0.7980769230769231
Accuracy of Neural Network with 10 perceptrons in each of the 3 hidden layers :  0.7980769230769231
Accuracy of Neural Network with 20 perceptrons in each of the 3 hidden layers :  0.8269230769230769
Accuracy of Neural Network with 30 perceptrons in each of the 3 hidden layers :  0.7403846153846154
Accuracy of Neural Network with 30 perceptrons in each of the 4 hidden layers :  0.8173076923076923
     5  67  3 5.1 3.1  1
760  5  77  3   3   3  1
430  5  36  4   3   3  1
886  5  65  4   5   3  1
884  4  65  2   4   3  1
135  4  59  4   4   3  1
    age shape margin density  severity
760  77     3      3       3         1
430  36     4      3       3         1
886  65     4      5       3         1
884  65     2      4       3         1
135  59     4      4       3         1
         severity
count  960.000000
mean     0.462500
std      0.498852
min      0.000000
25%      0.000000
50%      0.000000
75%      1.000000
max      1.000000




Accuracy of Random Forest :  0.7548076923076923


  y = column_or_1d(y, warn=True)


Accuracy of SVM with linear kernel :  0.7980769230769231




Accuracy of KNN :  0.7836538461538461




0.8365384615384616


  y = column_or_1d(y, warn=True)


Accuracy of Multinomial Naive Bayes :  0.7403846153846154


  y = column_or_1d(y, warn=True)


Accuracy of SVM with rbf kernel:  0.875


  y = column_or_1d(y, warn=True)


Accuracy of SVM with sigmoid kernel:  0.5096153846153846


  y = column_or_1d(y, warn=True)


Accuracy of Logistic Regression :  0.8221153846153846
Accuracy of Neural Network with 10 perceptrons in each of the 3 hidden layers :  0.7884615384615384
Accuracy of Neural Network with 20 perceptrons in each of the 3 hidden layers :  0.7980769230769231
Accuracy of Neural Network with 30 perceptrons in each of the 3 hidden layers :  0.8028846153846154
Accuracy of Neural Network with 30 perceptrons in each of the 4 hidden layers :  0.7884615384615384
     5  67  3 5.1 3.1  1
318  4  64  4   4   3  1
860  5  64  4   4   3  1
98   4  33  2   1   3  0
58   5  59  2   ?   ?  1
245  4  40  2   1   3  0
    age shape margin density  severity
318  64     4      4       3         1
860  64     4      4       3         1
98   33     2      1       3         0
58   59     2    NaN     NaN         1
245  40     2      1       3         0
         severity
count  960.000000
mean     0.462500
std      0.498852
min      0.000000
25%      0.000000
50%      0.000000
75%      1.000000
max      1.000000




Accuracy of Random Forest :  0.75


  y = column_or_1d(y, warn=True)


Accuracy of SVM with linear kernel :  0.7884615384615384




Accuracy of KNN :  0.75




0.8173076923076923


  y = column_or_1d(y, warn=True)


Accuracy of Multinomial Naive Bayes :  0.7836538461538461


  y = column_or_1d(y, warn=True)


Accuracy of SVM with rbf kernel:  0.8365384615384616


  y = column_or_1d(y, warn=True)


Accuracy of SVM with sigmoid kernel:  0.5528846153846154


  y = column_or_1d(y, warn=True)


Accuracy of Logistic Regression :  0.7836538461538461
