# Neural networks with scikit-learn

This guide is based on the [kdnuggets article](http://www.kdnuggets.com/2016/10/beginners-guide-neural-networks-python-scikit-learn.html) from Jose Portilla.

Further reading in the scikit-learn documentation:
+ [Basic Tutorial](http://scikit-learn.org/stable/tutorial/basic/tutorial.html)
+ [Neural Networks](http://scikit-learn.org/stable/modules/neural_networks_supervised.html)

import sample data, show keys

In [1]:
from sklearn.datasets import load_breast_cancer
cancer = load_breast_cancer()
cancer.keys()

dict_keys(['data', 'target', 'target_names', 'DESCR', 'feature_names', 'filename'])

you can print the details for each key

In [2]:
print(cancer['DESCR'])

.. _breast_cancer_dataset:

Breast cancer wisconsin (diagnostic) dataset
--------------------------------------------

**Data Set Characteristics:**

    :Number of Instances: 569

    :Number of Attributes: 30 numeric, predictive attributes and the class

    :Attribute Information:
        - radius (mean of distances from center to points on the perimeter)
        - texture (standard deviation of gray-scale values)
        - perimeter
        - area
        - smoothness (local variation in radius lengths)
        - compactness (perimeter^2 / area - 1.0)
        - concavity (severity of concave portions of the contour)
        - concave points (number of concave portions of the contour)
        - symmetry 
        - fractal dimension ("coastline approximation" - 1)

        The mean, standard error, and "worst" or largest (mean of the three
        largest values) of these features were computed for each image,
        resulting in 30 features.  For instance, field 3 is Mean Radius, f

569 rows of data with 30 features

In [3]:
cancer['data'].shape

(569, 30)

seperate data and target classifications

In [4]:
X = cancer['data']
y = cancer['target']

split the dataset into training and test data. if you use cross validation you split into training, validation and test data. Typical splits are 80/20 or 80/10/10

In [5]:
# train_test_split was moved from cross_validation to model_selection in 0.18
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y)

scaling helps the neural network converge towards an optimal solution across all features. first step is configuring the scaler, second step is transforming the data

In [6]:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
# Fit only to the training data
scaler.fit(X_train)

StandardScaler(copy=True, with_mean=True, with_std=True)

In [7]:
# Now apply the transformations to the data:
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

Import the multi layer perceptron and configure layers. We chose 3 layers of 30 neurons each, as we have 30 features

In [8]:
# MLPClassifier exists since sklearn 0.18
from sklearn.neural_network import MLPClassifier
mlp = MLPClassifier(hidden_layer_sizes=(30,30,30))

In [9]:
# training the mlp with data and target predictions of the training data
mlp.fit(X_train,y_train)

MLPClassifier(activation='relu', alpha=0.0001, batch_size='auto', beta_1=0.9,
              beta_2=0.999, early_stopping=False, epsilon=1e-08,
              hidden_layer_sizes=(30, 30, 30), learning_rate='constant',
              learning_rate_init=0.001, max_iter=200, momentum=0.9,
              n_iter_no_change=10, nesterovs_momentum=True, power_t=0.5,
              random_state=None, shuffle=True, solver='adam', tol=0.0001,
              validation_fraction=0.1, verbose=False, warm_start=False)

[sklearn - confusion matrix](http://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html)

In [10]:
# use the trained mlp to make predictions on the test set
predictions = mlp.predict(X_test)

# confusion matrix layout
# true negatives | false positives
# false negatives | true positives
from sklearn.metrics import classification_report,confusion_matrix
print(confusion_matrix(y_test,predictions))

[[53  1]
 [ 0 89]]


+ [sklearn - classification report](http://scikit-learn.org/stable/modules/generated/sklearn.metrics.classification_report.html)
+ [Cornell University - Precision, recall, confusion matrix](https://www.cs.cornell.edu/courses/cs578/2003fa/performance_measures.pdf)
+ [hackercollider - Precision, recall, f1](https://hackercollider.com/articles/2016/06/03/recall-vs-precision/)


In [11]:
print(classification_report(y_test,predictions))

              precision    recall  f1-score   support

           0       1.00      0.98      0.99        54
           1       0.99      1.00      0.99        89

    accuracy                           0.99       143
   macro avg       0.99      0.99      0.99       143
weighted avg       0.99      0.99      0.99       143



import sample data, show keys

In [12]:
from sklearn.datasets import load_digits
digits = load_digits()
digits.keys()

dict_keys(['data', 'target', 'target_names', 'images', 'DESCR'])

you can print the details for each key

In [13]:
print(digits['DESCR'])

.. _digits_dataset:

Optical recognition of handwritten digits dataset
--------------------------------------------------

**Data Set Characteristics:**

    :Number of Instances: 5620
    :Number of Attributes: 64
    :Attribute Information: 8x8 image of integer pixels in the range 0..16.
    :Missing Attribute Values: None
    :Creator: E. Alpaydin (alpaydin '@' boun.edu.tr)
    :Date: July; 1998

This is a copy of the test set of the UCI ML hand-written digits datasets
https://archive.ics.uci.edu/ml/datasets/Optical+Recognition+of+Handwritten+Digits

The data set contains images of hand-written digits: 10 classes where
each class refers to a digit.

Preprocessing programs made available by NIST were used to extract
normalized bitmaps of handwritten digits from a preprinted form. From a
total of 43 people, 30 contributed to the training set and different 13
to the test set. 32x32 bitmaps are divided into nonoverlapping blocks of
4x4 and the number of on pixels are counted in each blo

569 rows of data with 30 features

In [14]:
digits['data'].shape

(1797, 64)

seperate data and target classifications

In [16]:
X_d = digits['data']
y_d = digits['target']

split the dataset into training and test data. if you use cross validation you split into training, validation and test data. Typical splits are 80/20 or 80/10/10

In [17]:
# train_test_split was moved from cross_validation to model_selection in 0.18
from sklearn.model_selection import train_test_split
X_train_d, X_test_d, y_train_d, y_test_d = train_test_split(X_d, y_d)

scaling helps the neural network converge towards an optimal solution across all features. first step is configuring the scaler, second step is transforming the data

In [18]:
from sklearn.preprocessing import StandardScaler
scaler_d = StandardScaler()
# Fit only to the training data
scaler_d.fit(X_train_d)

StandardScaler(copy=True, with_mean=True, with_std=True)

In [19]:
# Now apply the transformations to the data:
X_train_d = scaler_d.transform(X_train_d)
X_test_d = scaler_d.transform(X_test_d)

Import the multi layer perceptron and configure layers. We chose 3 layers of 30 neurons each, as we have 30 features

In [22]:
# MLPClassifier exists since sklearn 0.18
from sklearn.neural_network import MLPClassifier
mlp_d = MLPClassifier(hidden_layer_sizes=(30,30,30))

In [23]:
# training the mlp with data and target predictions of the training data
mlp_d.fit(X_train_d,y_train_d)

MLPClassifier(activation='relu', alpha=0.0001, batch_size='auto', beta_1=0.9,
              beta_2=0.999, early_stopping=False, epsilon=1e-08,
              hidden_layer_sizes=(30, 30, 30), learning_rate='constant',
              learning_rate_init=0.001, max_iter=200, momentum=0.9,
              n_iter_no_change=10, nesterovs_momentum=True, power_t=0.5,
              random_state=None, shuffle=True, solver='adam', tol=0.0001,
              validation_fraction=0.1, verbose=False, warm_start=False)

[sklearn - confusion matrix](http://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html)

In [24]:
# use the trained mlp to make predictions on the test set
predictions_d = mlp_d.predict(X_test_d)

# confusion matrix layout
# true negatives | false positives
# false negatives | true positives
from sklearn.metrics import classification_report,confusion_matrix
print(confusion_matrix(y_test_d,predictions_d))

[[45  0  0  0  1  0  1  0  0  0]
 [ 0 47  0  0  0  0  0  0  1  0]
 [ 0  0 46  0  0  0  0  1  0  0]
 [ 0  0  0 33  0  0  0  0  2  2]
 [ 0  0  1  0 40  0  1  0  0  2]
 [ 0  0  0  0  0 46  0  0  0  0]
 [ 0  0  0  0  0  0 49  0  0  0]
 [ 0  0  0  0  0  0  0 46  0  0]
 [ 0  2  1  0  0  0  1  0 40  0]
 [ 0  1  0  2  0  0  0  1  0 38]]


+ [sklearn - classification report](http://scikit-learn.org/stable/modules/generated/sklearn.metrics.classification_report.html)
+ [Cornell University - Precision, recall, confusion matrix](https://www.cs.cornell.edu/courses/cs578/2003fa/performance_measures.pdf)
+ [hackercollider - Precision, recall, f1](https://hackercollider.com/articles/2016/06/03/recall-vs-precision/)


In [25]:
print(classification_report(y_test_d,predictions_d))

              precision    recall  f1-score   support

           0       1.00      0.96      0.98        47
           1       0.94      0.98      0.96        48
           2       0.96      0.98      0.97        47
           3       0.94      0.89      0.92        37
           4       0.98      0.91      0.94        44
           5       1.00      1.00      1.00        46
           6       0.94      1.00      0.97        49
           7       0.96      1.00      0.98        46
           8       0.93      0.91      0.92        44
           9       0.90      0.90      0.90        42

    accuracy                           0.96       450
   macro avg       0.96      0.95      0.95       450
weighted avg       0.96      0.96      0.96       450



## Further reading

Really good article that implements a neural network with just numpy but explains a lot of the basics: [iamtrask - A neural network in 11 lines of python](http://iamtrask.github.io/2015/07/12/basic-python-network/)
Documentation of sklearn - [Neural Network functionality](https://scikit-learn.org/stable/modules/neural_networks_supervised.html)

In [2]:
from sklearn.datasets import fetch_covtype
covertype = fetch_covtype()
covertype.keys()

dict_keys(['data', 'target', 'DESCR'])

you can print the details for each key

In [3]:
print(covertype['DESCR'])

.. _covtype_dataset:

Forest covertypes
-----------------

The samples in this dataset correspond to 30×30m patches of forest in the US,
collected for the task of predicting each patch's cover type,
i.e. the dominant species of tree.
There are seven covertypes, making this a multiclass classification problem.
Each sample has 54 features, described on the
`dataset's homepage <https://archive.ics.uci.edu/ml/datasets/Covertype>`__.
Some of the features are boolean indicators,
while others are discrete or continuous measurements.

**Data Set Characteristics:**

    Classes                        7
    Samples total             581012
    Dimensionality                54
    Features                     int

:func:`sklearn.datasets.fetch_covtype` will load the covertype dataset;
it returns a dictionary-like object
with the feature matrix in the ``data`` member
and the target values in ``target``.
The dataset will be downloaded from the web if necessary.



569 rows of data with 30 features

In [4]:
covertype['data'].shape

(581012, 54)

seperate data and target classifications

In [5]:
X = covertype['data']
y = covertype['target']

split the dataset into training and test data. if you use cross validation you split into training, validation and test data. Typical splits are 80/20 or 80/10/10

In [6]:
# train_test_split was moved from cross_validation to model_selection in 0.18
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y)

scaling helps the neural network converge towards an optimal solution across all features. first step is configuring the scaler, second step is transforming the data

In [7]:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
# Fit only to the training data
scaler.fit(X_train)

StandardScaler(copy=True, with_mean=True, with_std=True)

In [8]:
# Now apply the transformations to the data:
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

Import the multi layer perceptron and configure layers. We chose 3 layers of 30 neurons each, as we have 30 features

In [9]:
# MLPClassifier exists since sklearn 0.18
from sklearn.neural_network import MLPClassifier
mlp = MLPClassifier(hidden_layer_sizes=(30,30,30))

In [10]:
# training the mlp with data and target predictions of the training data
mlp.fit(X_train,y_train)



MLPClassifier(activation='relu', alpha=0.0001, batch_size='auto', beta_1=0.9,
              beta_2=0.999, early_stopping=False, epsilon=1e-08,
              hidden_layer_sizes=(30, 30, 30), learning_rate='constant',
              learning_rate_init=0.001, max_iter=200, momentum=0.9,
              n_iter_no_change=10, nesterovs_momentum=True, power_t=0.5,
              random_state=None, shuffle=True, solver='adam', tol=0.0001,
              validation_fraction=0.1, verbose=False, warm_start=False)

[sklearn - confusion matrix](http://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html)

In [11]:
# use the trained mlp to make predictions on the test set
predictions = mlp.predict(X_test)

# confusion matrix layout
# true negatives | false positives
# false negatives | true positives
from sklearn.metrics import classification_report,confusion_matrix
print(confusion_matrix(y_test,predictions))

[[47729  5115     5     0    76    17   194]
 [ 7453 62070   388     0   420   375    31]
 [    7   335  7743    84    25   686     0]
 [    0     1   173   483     0    47     0]
 [  130   715    61     0  1484    15     0]
 [   47   280   850    30    20  3052     0]
 [  870    46     0     0     1     0  4195]]


+ [sklearn - classification report](http://scikit-learn.org/stable/modules/generated/sklearn.metrics.classification_report.html)
+ [Cornell University - Precision, recall, confusion matrix](https://www.cs.cornell.edu/courses/cs578/2003fa/performance_measures.pdf)
+ [hackercollider - Precision, recall, f1](https://hackercollider.com/articles/2016/06/03/recall-vs-precision/)


In [12]:
print(classification_report(y_test,predictions))

              precision    recall  f1-score   support

           1       0.85      0.90      0.87     53136
           2       0.91      0.88      0.89     70737
           3       0.84      0.87      0.86      8880
           4       0.81      0.69      0.74       704
           5       0.73      0.62      0.67      2405
           6       0.73      0.71      0.72      4279
           7       0.95      0.82      0.88      5112

    accuracy                           0.87    145253
   macro avg       0.83      0.78      0.80    145253
weighted avg       0.87      0.87      0.87    145253

