# DM2: "Connexionism: backpropagation algorithm"

_Eole Cervenka, Nov 13th 2017_

+ Python version: 3.6
+ libraries: sklean, numpy, pandas
+ dependencies:

    + `Eole_Cervenka_DM2_preparation.ipynb`
    + `Eole_Cervenka_DM2_exploration.ipynb`
    + `Eole_Cervenka_DM2_MLP.ipynb`
        
+ Data:
    + `DM2_attr_val_encoded.json` (cf section I --"Breast cancer data")

-------------------------------------------------
## XOR dataset

We are tasked with generating a XOR dataset that mimics the breast_cancer data.  

_"An XOR gate implements an exclusive or; that is, a true output results if one, and only one, of the inputs to the gate is true"_ ([Wikipedia](https://en.wikipedia.org/wiki/XOR_gate))

**We assume that we want to generate a dataset for a function that takes two inputs and returns a boolean output.**

There exists interpretation of [XOR function with more than 2 inputs](https://en.wikipedia.org/wiki/XOR_gate#More_than_two_inputs).

However, we can process the first two inputs with a regular XOR function and _"regard subsequent inputs as being applied through a cascade of binary exclusive-or operations: the first two signals are fed into an XOR gate, then the output of that gate is fed into a second XOR gate together with the third signal, and so on for any remaining signals. The result is a circuit that outputs a 1 when the number of 1s at its inputs is odd, and a 0 when the number of incoming 1s is even. This makes it practically useful as a parity generator or a modulo-2 adder"_ (source: wikipedia). We will have a closer look at this generalized scenario in the next section.

Therefore, we choose two attributes, say `age` and `menopause`.  
For each attribute we choose a single possible value, conveniently we can use the encoded value `1`for both attributes. We put that the label of a given record is positive (_ie_ encoded label value equals to `1`) if and only if a single of the attribute value equals `1`, the chosen value for each attribute.

### Load helper functions

In [28]:
%run Eole_Cervenka_DM2_preparation.ipynb

In [29]:
%run Eole_Cervenka_DM2_exploration.ipynb

In [30]:
%run Eole_Cervenka_DM2_MLP.ipynb

### Generate XOR dataset

In [31]:
import random

def generate_positive_XOR(
    attr_val,
    attr_keep = ['age', 'menopause'] ):
    """
    Given an attribute dictionnary
    """
    
    neg_attr = random.choice(attr_keep) # choose the neg attribute randomly
    pos_attr = attributes[0] if neg_attr == attr_keep[1] else attr_keep[1]
    neg_attr_values = [v for v in attr_val[neg_attr] if v != 1] # list its possible val
    neg_attr_value_actual = random.choice(neg_attr_values) # pick a random possible negative val
    
    new_record = {
            pos_attr : 1,
            neg_attr : neg_attr_value_actual,
            'Class' : 1
        }
    
    return new_record

    
def generate_negative_XOR(
    attr_val,
    attr_keep = ['age', 'menopause'] ):
    
    if random.choice([1, 0]):
        
        new_rec = {
            attr_keep[0] : 1,
            attr_keep[1] : 1,
            'Class' : 0
        }
    else:
        neg_attr_values_0 = [v for v in attr_val[attr_keep[0]] if v != 1]
        neg_attr_value_actual_0 = random.choice(neg_attr_values_0)
        
        neg_attr_values_1 = [v for v in attr_val[attr_keep[1]] if v != 1]
        neg_attr_value_actual_1 = random.choice(neg_attr_values_1)
        
        new_rec = {
            attr_keep[0] : neg_attr_value_actual_0,
            attr_keep[1] : neg_attr_value_actual_1,
            'Class' : 0
        }
    return new_rec


def generate_dataset_XOR(attr_val,
                         neg_count=2000,
                         pos_count=1000):
    import pandas as pd
    
    records = []
    for i in range(neg_count):
        records.append(generate_negative_XOR(attr_val))
    for j in range(pos_count):
        records.append(generate_negative_XOR(attr_val))
        
    dataset = pd.DataFrame(records)
    dataset_shuffled = dataset.sample(frac=1)
    
    return dataset_shuffled

In [32]:
# Load encoded attribute values from the Breast Cancer dataset
fpath = "/tmp/DM2_attr_val_encoded.json"
attr_val_breast_dataset_encoded = load_json(fpath)

# Generate XOR dataset based on encoded data for attributes similar to 'age' and 'menopause'
XOR_df = generate_dataset_XOR(attr_val_breast_dataset_encoded)
XOR_df.head()

Unnamed: 0,Class,age,menopause
2378,0,2,2
604,0,1,1
2590,0,1,1
553,0,1,1
1613,0,3,2


### Run MLP on XOR dataset

In [33]:
# matrix input X and label vector y
X, y = get_nn_inputs(XOR_df)

# Preview 
print(X[:10])
print()
print(y[:10])

[(2, 2), (1, 1), (1, 1), (1, 1), (3, 2), (1, 1), (4, 0), (5, 0), (4, 0), (0, 2)]

[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]


In [34]:
from sklearn.neural_network import MLPClassifier
clf = MLPClassifier(solver='sgd', max_iter=200) # default parameters

from sklearn.model_selection import RandomizedSearchCV
random_search = RandomizedSearchCV(
    clf,
    param_distributions=param_grid,
    n_iter=30, # 30 (random) search iteration
    n_jobs=4, # 4 parallel jobs
    refit=True,
    cv=10, # 10-fold cross-validation
    verbose=0,
    random_state=None
)

random_search.fit(X, y)
print("best params:\n{}".format(random_search.best_params_))
print("best score :\n{}".format(random_search.best_score_))



best params:
{'learning_rate': 'constant', 'hidden_layer_sizes': (73, 38)}
best score :
1.0
