#  Artificial Neural Network

* It is one of the powerful machine learning algorithms that can be used for classification or regression problems with reference to the information processing system of the human brain.



* This model tend to overfitting.

## 1-)Data Preprocessing

In [1]:
import numpy as np
import pandas as pd 
from sklearn.model_selection import train_test_split

In [3]:
diabetes = pd.read_csv("diabetes.csv")
df = diabetes.copy()
df = df.dropna()
y = df["Outcome"]
X = df.drop(['Outcome'], axis=1)
X_train, X_test, y_train, y_test = train_test_split(X, y, 
                                                    test_size=0.30, 
                                                    random_state=42)

### 1.1) Data standartization

* If the input values in artificial neural networks are proportionally different from each other, it creates a bias on the output.


* We need to standardize on the input values to solve this problem and to make the input contribute as much as it should be on the output.

In [4]:
from sklearn.preprocessing import StandardScaler 

In [5]:
scaler = StandardScaler()

In [6]:
scaler.fit(X_train)

StandardScaler()

In [7]:
X_train_scaled = scaler.transform(X_train)
X_test_scaled = scaler.transform(X_test)

In [9]:
X_test_scaled[0:1]

array([[ 0.69748316, -0.70719864, -0.64639893,  0.81207927,  0.95720244,
         0.26575953, -0.11680393,  0.85019217]])

In [10]:
X_train_scaled[0:1]

array([[-0.8362943 , -0.80005088, -0.53576428, -0.15714558, -0.18973183,
        -1.06015343, -0.61421636, -0.94861028]])

## 2-)MODEL

In [13]:
from warnings import filterwarnings
filterwarnings('ignore')

In [14]:
from sklearn.neural_network import MLPClassifier

In [15]:
mlpc = MLPClassifier().fit(X_train_scaled, y_train)

## 3-)Prediction

In [16]:
from sklearn.metrics import confusion_matrix, accuracy_score, classification_report

In [19]:
y_pred = mlpc.predict(X_test_scaled)
y_pred[0:10]

array([0, 0, 0, 0, 0, 0, 0, 1, 1, 1], dtype=int64)

In [20]:
accuracy_score(y_test, y_pred) # before model tuning

0.7359307359307359

In [21]:
confusion_matrix(y_test, y_pred) # before model tuning

array([[121,  30],
       [ 31,  49]], dtype=int64)

In [22]:
print(classification_report(y_test, y_pred)) # before model tuning

              precision    recall  f1-score   support

           0       0.80      0.80      0.80       151
           1       0.62      0.61      0.62        80

    accuracy                           0.74       231
   macro avg       0.71      0.71      0.71       231
weighted avg       0.74      0.74      0.74       231



## 3-) Model tuning

* In this section, we will try to determine the optimum **alpha, hidden_layer_sizes, solver, activation function**  with the GridSearchCV method.


* GridSearchCV: Grid Search Cross Validation Methode



* Then , we will create the most optimum model by using optimum **alpha, hidden_layer_sizes, solver, activation function** .





* **alpha, hidden_layer_sizes, solver, activation function** are the hyperparameters that we will determine according to ourselves and we want it to be the most optimum.



* But instead of relying on our own feeling and sense in order to find the  optimum value of these hyperparameters   , we will find the optimum value of these hyperparameters   by using the gridsearch method.





In [23]:
mlpc.alpha # default value of alpha 

0.0001

* **alpha** is the regulatory term. It indicates the amount of correction to be made while optimizing the weights.

In [24]:
mlpc.hidden_layer_sizes # default value of  hidden_layer_sizes

(100,)

**hidden_layer_sizes = (100,20)** means taht 

* The number of hidden layers is 2. It means that there will be two columns as hidden layer in multilayer percepton




* The 1st column is called the 1st hidden layer


* The 2nd column is called the 2nd hidden layer


* The size of hidden layer 1 is 100. this means that there are 100 boxes (norons) from bottom to top in the 1st secret layer.


* The size of the 2nd hidden layer is 20. this that there are 20 boxes (norons) from bottom to top in the second secret layer.


In [25]:
mlpc.solver # default value of solver

'adam'

* **Solver** is used to specify which functions to use when optimizing weights in solver neural networks.

In [26]:
mlpc.activation # default value of activation function

'relu'

* **activation** refers to the activation function. The value obtained as a result of the inputs is given to this function, and this function makes a result.


* There are varieties such as 'identity', 'logistic', 'tanh', 'relu'.


* The efficiency of these functions varies according to the size of the data set.

In [27]:
mlpc_params = {"alpha": [0.1, 0.01, 0.02, 0.005, 0.0001,0.00001],
              "hidden_layer_sizes": [(10,10,10),
                                     (100,100,100),
                                     (100,100),
                                     (3,5), 
                                     (5, 3)],
              "solver" : ["lbfgs","adam","sgd"],
              "activation": ["relu","logistic"]}


In [28]:
mlpc = MLPClassifier()

In [30]:
from sklearn.model_selection import GridSearchCV

In [31]:
mlpc_cv_model = GridSearchCV(mlpc, mlpc_params, 
                         cv = 10, 
                         n_jobs = -1,
                         verbose = 2)

In [32]:
mlpc_cv_model.fit(X_train_scaled, y_train)

Fitting 10 folds for each of 180 candidates, totalling 1800 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done  33 tasks      | elapsed:   11.7s
[Parallel(n_jobs=-1)]: Done 154 tasks      | elapsed:  1.1min
[Parallel(n_jobs=-1)]: Done 357 tasks      | elapsed:  2.6min
[Parallel(n_jobs=-1)]: Done 640 tasks      | elapsed:  4.3min
[Parallel(n_jobs=-1)]: Done 1005 tasks      | elapsed:  6.4min
[Parallel(n_jobs=-1)]: Done 1450 tasks      | elapsed:  8.1min
[Parallel(n_jobs=-1)]: Done 1800 out of 1800 | elapsed:  9.3min finished


GridSearchCV(cv=10, estimator=MLPClassifier(), n_jobs=-1,
             param_grid={'activation': ['relu', 'logistic'],
                         'alpha': [0.1, 0.01, 0.02, 0.005, 0.0001, 1e-05],
                         'hidden_layer_sizes': [(10, 10, 10), (100, 100, 100),
                                                (100, 100), (3, 5), (5, 3)],
                         'solver': ['lbfgs', 'adam', 'sgd']},
             verbose=2)

In [33]:
mlpc_cv_model.best_params_

{'activation': 'relu',
 'alpha': 0.0001,
 'hidden_layer_sizes': (100, 100, 100),
 'solver': 'sgd'}

### 3.1)Tuned Model

In [34]:
mlpc_tuned = MLPClassifier(activation = "relu", 
                           alpha = 0.1, 
                           hidden_layer_sizes = (100, 100, 100),
                          solver = "sgd")

In [35]:
mlpc_tuned.fit(X_train_scaled, y_train)

MLPClassifier(alpha=0.1, hidden_layer_sizes=(100, 100, 100), solver='sgd')

In [36]:
y_pred1 = mlpc_tuned.predict(X_test_scaled)
y_pred1 [0:10]

array([0, 0, 0, 0, 1, 1, 0, 0, 1, 1], dtype=int64)

In [37]:
accuracy_score(y_test, y_pred1)# after model tuning

0.7186147186147186

In [38]:
confusion_matrix(y_test, y_pred1)# after model tuning

array([[120,  31],
       [ 34,  46]], dtype=int64)

In [39]:
print(classification_report(y_test, y_pred1))# after model tuning

              precision    recall  f1-score   support

           0       0.78      0.79      0.79       151
           1       0.60      0.57      0.59        80

    accuracy                           0.72       231
   macro avg       0.69      0.68      0.69       231
weighted avg       0.72      0.72      0.72       231

