# DM2: "Connexionism: backpropagation algorithm"


--------------------------------------------

_Eole Cervenka, Nov 13th 2017_

+ Python version: 3.6
+ libraries: sklean, numpy, pandas
+ dependencies:

    + `Eole_Cervenka_DM2_preparation.ipynb`
    + `Eole_Cervenka_DM2_exploration.ipynb`
    + `Eole_Cervenka_DM2_MLP.ipynb`
        
+ Data:
    + `DM2_attr_val_encoded.json` (cf section I --"Breast cancer data")

--------------------------------------------

## XOR dataset

We are tasked with generating a XOR dataset that mimics the breast_cancer data.  

_"An XOR gate implements an exclusive or; that is, a true output results if one, and only one, of the inputs to the gate is true"_ ([Wikipedia](https://en.wikipedia.org/wiki/XOR_gate))

We will see both the case of XOR taking two input attributes and that of XOR taking an arbitrary number of attribute superior to 2.

I have implemented a general function to generate a random dataset using the attributes from the original dataset, classified according to the XOR function.

I choose two attributes, say `age` and `menopause`.  
For each attribute we choose a single possible value, conveniently we can use the encoded value `1`for both attributes. We put that the label of a given record is positive (_ie_ encoded label value equals to `1`) if and only if a single of the attribute value equals `1`, the chosen value for each attribute.

### Load helper functions

In [1]:
%run utils/helper_functions.ipynb

In [2]:
%run utils/preparation.ipynb

In [3]:
%run utils/exploration.ipynb

In [4]:
%run utils/MLP_utils.ipynb

### Generate XOR dataset

In [5]:
# Load encoded attribute values from the Breast Cancer dataset
fpath = "/tmp/DM2_attr_val_encoded.json"
attr_val_breast_dataset_encoded = load_json(fpath)

# Generate XOR dataset based on encoded data for attributes similar to 'age' and 'menopause'
XOR_df = generate_XOR_dataset(
    n_examples=1000,
    n_attributes=2,
    no_duplicate=False,
    attr_dict=attr_val_breast_dataset_encoded)
    
XOR_df.head()

Dataset size: 1000


Unnamed: 0,Class,breast_quad,node_caps
0,1,1,1
1,0,1,0
2,1,1,1
3,1,1,1
4,0,0,1


### Run MLP on XOR dataset

In [6]:
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.neural_network import MLPClassifier
from sklearn.model_selection import RandomizedSearchCV

X, y = get_nn_inputs(XOR_df)

steps = [
    ('scaler', StandardScaler()), # data scaling
    ('clf', MLPClassifier()) # Multilayer Perceptron
]

pipeline = Pipeline(steps)

param_grid = get_param_grid(max_layers=1,
                            max_neurons=3)

random_search = RandomizedSearchCV(
    pipeline,
    param_distributions=param_grid,
    n_iter=400, # search iteration
    n_jobs=8, # parallel jobs
    refit=True,
    cv=10, # 10-fold cross-validation
    verbose=0,
    random_state=None
)

best_model = random_search.fit(X, y)

pprint_best_model(random_search)

best params [score=0.952]:
{
  "activation": "tanh",
  "alpha": 0.0077573469387755105,
  "hidden_layer_sizes": [
    3
  ],
  "learning_rate": "constant",
  "learning_rate_init": 0.004140816326530613,
  "max_iter": 1600,
  "momentum": 0.6,
  "solver": "lbfgs"
}
