**Disclaimer**: All of information and most of the code in this Notebook is taken or derived from https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html#sklearn.neural_network.MLPClassifier

In [None]:
import numpy as np
from sklearn.neural_network import MLPClassifier

MLP trains on two arrays: array X of size (n_samples, n_features), which holds the training samples represented as floating point feature vectors; and array y of size (n_samples,), which holds the target values (class labels) for the training samples:

In [None]:
X = ((0., 0.), (1., 1.))
testX = ((2., 2.), (-1., -2.))

Further, the model supports [multi-label classification](https://scikit-learn.org/stable/modules/multiclass.html#multiclass) in which a sample can belong to more than one class. 

In [None]:
multipleLabels = False

In [None]:
if not(multipleLabels):
    y = (0, 1)
    hidden_layer_sizes = (5, 2)
else:
    y = ((0, 1), (1, 1))
    hidden_layer_sizes = (15,)

In [None]:
clf = MLPClassifier(solver='lbfgs', alpha=1e-5, \
                    hidden_layer_sizes=hidden_layer_sizes, \
                    random_state=1)
clf.fit(X, y)

MLPClassifier(alpha=1e-05, hidden_layer_sizes=(5, 2), random_state=1,
              solver='lbfgs')

After fitting (training), the model can predict labels for new samples:

In [None]:
clf.predict(testX)

array([1, 0])

`coefs_` is a list of weight matrices, where weight matrix at index `i` represents the weights between layer `i` and layer `i + 1`. 

In [None]:
clf.coefs_

[array([[-0.14196276, -0.02104562, -0.85522848, -3.51355396, -0.60434709],
        [-0.69744683, -0.9347486 , -0.26422217, -3.35199017,  0.06640954]]),
 array([[ 0.29164405, -0.14147894],
        [ 2.39665167, -0.6152434 ],
        [-0.51650256,  0.51452834],
        [ 4.0186541 , -0.31920293],
        [ 0.32903482,  0.64394475]]),
 array([[-4.53025854],
        [-0.86285329]])]

In [None]:
tuple([coef.shape for coef in clf.coefs_])

((2, 5), (5, 2), (2, 1))

Apply each layer without an activation function.

In [None]:
def ReLU(x):
    return max(0, x)

In [None]:
predictions = [X, testX]
activate = np.vectorize(ReLU)
for idx, prediction in enumerate(predictions):
    predictions[idx] = np.array(predictions[idx])
    for coef in (clf.coefs_):
        #print(predictions[idx])
        # apply the activation function
        predictions[idx] = activate(predictions[idx])
        #print(predictions[idx])
        # multiply by the weights
        predictions[idx] = np.dot(predictions[idx], coef)
        #print(predictions[idx])
print(predictions)


[array([[0.],
       [0.]]), array([[0.],
       [0.]])]


In [None]:
clf.predict_proba(testX)

array([[1.96718015e-004, 9.99803282e-001],
       [1.00000000e+000, 4.67017947e-144]])