## Binray classification
Binary classification using Logistic Regression machine learning technique.
Here in this example we will try to predict the new observation can be benign or malignant. 
Let us import some useful modules.
Sklearn has breast cancer dataset that we will use.
Pandas dataframe to load the dataset

In [34]:
import mxnet as mx
from mxnet import gluon, autograd, ndarray
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt
import seaborn as sns

 sklearn.dataset contains the set of open source data sets
 pandas dataframe to hold the dataset
 

In [35]:
from sklearn.datasets import load_breast_cancer
data = load_breast_cancer()
df = pd.DataFrame(data.data, columns=data.feature_names)
y = data.target
X = data.data
df.head()

Unnamed: 0,mean radius,mean texture,mean perimeter,mean area,mean smoothness,mean compactness,mean concavity,mean concave points,mean symmetry,mean fractal dimension,...,worst radius,worst texture,worst perimeter,worst area,worst smoothness,worst compactness,worst concavity,worst concave points,worst symmetry,worst fractal dimension
0,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,0.2419,0.07871,...,25.38,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189
1,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,0.1812,0.05667,...,24.99,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902
2,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,0.2069,0.05999,...,23.57,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758
3,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,0.2597,0.09744,...,14.91,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173
4,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,0.1809,0.05883,...,22.54,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678


In [36]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 569 entries, 0 to 568
Data columns (total 30 columns):
mean radius                569 non-null float64
mean texture               569 non-null float64
mean perimeter             569 non-null float64
mean area                  569 non-null float64
mean smoothness            569 non-null float64
mean compactness           569 non-null float64
mean concavity             569 non-null float64
mean concave points        569 non-null float64
mean symmetry              569 non-null float64
mean fractal dimension     569 non-null float64
radius error               569 non-null float64
texture error              569 non-null float64
perimeter error            569 non-null float64
area error                 569 non-null float64
smoothness error           569 non-null float64
compactness error          569 non-null float64
concavity error            569 non-null float64
concave points error       569 non-null float64
symmetry error             569 

In [37]:
df.shape

(569, 30)

In [38]:
df.ndim

2

Now data is available but this data is human readable format and to train neural network it wont be useful. Before start train our neural network we need to normalise the data. To normalise the data we are using pandas. We can also use gluon to normalise the dataset.

In [3]:
df_norm = (df - df.mean()) / (df.max() - df.min())

In [4]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state=12345) 


In [None]:
BATCH_SIZE = 32
LEARNING_R = 0.001
EPOCHS = 150

In [None]:
train_dataset = mx.gluon.data.ArrayDataset(X_train,y_train)
test_dataset = mx.gluon.data.ArrayDataset(X_test,y_test)
train_data = mx.gluon.data.DataLoader(train_dataset,
                                      batch_size=BATCH_SIZE, shuffle=True)

test_data = mx.gluon.data.DataLoader(test_dataset,
                                     batch_size=BATCH_SIZE, shuffle=False)

In [47]:
net = gluon.nn.Sequential()

# Define the model architecture
with net.name_scope():
    net.add(gluon.nn.Dense(64, activation="relu"))
    net.add(gluon.nn.Dense(32, activation="relu") ) 
    net.add(gluon.nn.BatchNorm())    
    net.add(gluon.nn.Dense(1, activation="sigmoid"))

# Intitalize parametes of the model
net.collect_params().initialize(mx.init.Uniform())

# Add binary loss function
binary_cross_entropy = gluon.loss.SigmoidBinaryCrossEntropyLoss()

trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': LEARNING_R})

In [50]:

for e in range(EPOCHS):
    for i, (data, label) in enumerate(train_data):
        data = data.as_in_context(mx.cpu()).astype('float32')
        label = label.as_in_context(mx.cpu()).astype('float32')
        with autograd.record(): # Start recording the derivatives
            output = net(data) # the forward iteration
            loss = binary_cross_entropy(output, label)
            loss.backward()
        trainer.step(data.shape[0])
        # Provide stats on the improvement of the model over each epoch
        curr_loss = ndarray.mean(loss).asscalar()
    if e % 20 == 0:
        print("Epoch {}. Current Loss: {}.".format(e, curr_loss))


Epoch 0. Current Loss: 0.5109495520591736.
Epoch 20. Current Loss: 0.532987117767334.
Epoch 40. Current Loss: 0.5951952934265137.
Epoch 60. Current Loss: 0.6340570449829102.
Epoch 80. Current Loss: 0.5661656260490417.
Epoch 100. Current Loss: 0.7367585301399231.
Epoch 120. Current Loss: 0.7390211224555969.
Epoch 140. Current Loss: 0.6360472440719604.


In [51]:
y_pred = np.array([])
for data,label in test_data:
        data = data.as_in_context(mx.cpu()).astype('float32')
        label = label.as_in_context(mx.cpu()).astype('float32')
        output = net(data)
        y_pred = np.append(y_pred, output.asnumpy())

y_pred_labels = np.where(y_pred > 0.45, 1, 0)

In [52]:
print(accuracy_score(y_test, y_pred_labels))

0.9035087719298246
