# Using the MLP Algorithm to Classify Breast Cancer from the Wisconsin Dataset 

## Intro

* I'm a student in one of Microsoft AI School in France. We are working on adapting our first and simple Neural Network Perceptron type, so we will keep One Hidden Layer even though we could probably improve the results with more neurons and more hidden layers.
* This notebook isn't intended to have the best performances but more like working on having a simple and clean guideline.

---

# 1. Import Librairies

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
import seaborn as sns
from joblib import dump

# Sk Learn
from sklearn.neural_network import MLPClassifier
from sklearn.model_selection import GridSearchCV

# Preprocessing
from sklearn.preprocessing import StandardScaler

#SciKit Learn
from sklearn.model_selection import train_test_split

# Metrics
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score

from sklearn.datasets import load_breast_cancer

---

# 2. Loading the Dataset

Attribute Information:
* ID number
* Diagnosis (0 = Malignant, 1 = Benign)

In [None]:
X, y = load_breast_cancer(return_X_y=True)
X.shape, y.shape

In [None]:
X = pd.DataFrame(X)
X.head()

In [None]:
y[:5]


---

# 3. Split the Dataset

In [None]:
X_train, X_test, y_train, y_test = train_test_split(
    X, y, 
    test_size=0.2, 
    random_state=0, 
    shuffle=True,
    stratify=y
)

X_train.shape, y_train.shape, X_test.shape, y_test.shape

---

# 4. Preprocessing

In [None]:
scaler = StandardScaler()

X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

In [None]:
df_test = pd.DataFrame(X_test_scaled)
df_test["y"] = y_test
df_test.to_csv("df_test.csv")

---

# 5. GridSearch + Training

In [None]:
params = [{
    "hidden_layer_sizes": np.arange(2, 30),
    "random_state": np.arange(51),
    "activation": ["identity", "logistic", "tanh", "relu"]
}]

In [None]:
model = GridSearchCV(
    MLPClassifier(
        solver='lbfgs', 
        alpha=0.0001, 
        max_iter=10000, 
        random_state=0, 
        max_fun=15000
    ),
    params,
    n_jobs=-1,
    verbose=8
)

model.fit(X_train_scaled, y_train)

In [None]:
model.best_params_

In [None]:
model.best_score_

---

# 6. Performance on the Training Set

In [None]:
# Predict
y_pred_train = model.predict(X_train_scaled)

# Confusion Matrix
cm_train = confusion_matrix(y_train, y_pred_train, normalize='true')

# Graph
plt.figure(figsize=(12, 7))
plt.title('Accuracy Score on the Training Set: ' + str(accuracy_score(y_train, y_pred_train).round(4)), size=25)
sns.heatmap(cm_train, annot=True, fmt='.2%', cmap='Blues')
plt.xlabel('Predicted Values', size=20)
plt.ylabel('True Values', size=20)
plt.show()

---

# 7. Save the Model

In [None]:
saved_model = model.best_estimator_
dump(saved_model, "model.joblib")