# Breast Cancer Detection

[[Notebook](https://github.com/fastestimator/fastestimator/blob/master/apphub/tabular/dnn/dnn.ipynb)] [[TF Implementation](https://github.com/fastestimator/fastestimator/blob/master/apphub/tabular/dnn/dnn_tf.py)] [[Torch Implementation](https://github.com/fastestimator/fastestimator/blob/master/apphub/tabular/dnn/dnn_torch.py)]

## Import the required libraries

In [1]:
import tempfile

import fastestimator as fe
from fastestimator.dataset.data import breast_cancer
from fastestimator.op.tensorop.loss import CrossEntropy
from fastestimator.op.tensorop.model import ModelOp, UpdateOp
from fastestimator.trace.io import BestModelSaver
from fastestimator.trace.metric import Accuracy

In [None]:
# We import these in a separate block to avoid an import error in Jupyter when running on Linux CPU-only machines
import pandas as pd
import tensorflow as tf

from sklearn.preprocessing import StandardScaler

In [2]:
#training parameters
batch_size = 4
epochs = 10
save_dir = tempfile.mkdtemp()
train_steps_per_epoch = None
eval_steps_per_epoch = None

# Download data

This downloads some tabular data with different features stored in numerical format in a table. We then split the data into train, evaluation, and testing data sets.

In [3]:
train_data, eval_data = breast_cancer.load_data()
test_data = eval_data.split(0.5)

This is what the raw data looks like:

In [4]:
df = pd.DataFrame.from_dict(train_data.data, orient='index')
df.head()

Unnamed: 0,x,y
0,"[9.029, 17.33, 58.79, 250.5, 0.1066, 0.1413, 0...",1
1,"[21.09, 26.57, 142.7, 1311.0, 0.1141, 0.2832, ...",0
2,"[9.173, 13.86, 59.2, 260.9, 0.07721, 0.08751, ...",1
3,"[10.65, 25.22, 68.01, 347.0, 0.09657, 0.07234,...",1
4,"[10.17, 14.88, 64.55, 311.9, 0.1134, 0.08061, ...",1


In [5]:
scaler = StandardScaler()
train_data["x"] = scaler.fit_transform(train_data["x"])
eval_data["x"] = scaler.transform(eval_data["x"])
test_data["x"] = scaler.transform(test_data["x"])

# Building Components

## Step 1: Create `Pipeline`

We create the `Pipeline` with the usual train, eval, and test data along with the batch size:

In [6]:
pipeline = fe.Pipeline(train_data=train_data, eval_data=eval_data, test_data=test_data, batch_size=batch_size)

## Step 2: Create `Network`

We first define the neural network in a function that can then be passed on to the FastEstimator `Network`:

In [7]:
def create_dnn():
    model = tf.keras.Sequential()
    model.add(tf.keras.layers.Dense(32, activation="relu", input_shape=(30, )))
    model.add(tf.keras.layers.Dropout(0.5))
    model.add(tf.keras.layers.Dense(16, activation="relu"))
    model.add(tf.keras.layers.Dropout(0.5))
    model.add(tf.keras.layers.Dense(8, activation="relu"))
    model.add(tf.keras.layers.Dropout(0.5))
    model.add(tf.keras.layers.Dense(1, activation="sigmoid"))
    return model

In [8]:
model = fe.build(model_fn=create_dnn, optimizer_fn="adam")
network = fe.Network(ops=[
    ModelOp(inputs="x", model=model, outputs="y_pred"),
    CrossEntropy(inputs=("y_pred", "y"), outputs="ce"),
    UpdateOp(model=model, loss_name="ce", mode="!infer")
])

## Step 3: Create `Estimator`

In [9]:
traces = [
    Accuracy(true_key="y", pred_key="y_pred"),
    BestModelSaver(model=model, save_dir=save_dir, metric="accuracy", save_best_mode="max")
]
estimator = fe.Estimator(pipeline=pipeline,
                         network=network,
                         epochs=epochs,
                         log_steps=10,
                         traces=traces,
                         train_steps_per_epoch=train_steps_per_epoch,
                         eval_steps_per_epoch=eval_steps_per_epoch)

# Training

In [10]:
estimator.fit()

    ______           __  ______     __  _                 __            
   / ____/___ ______/ /_/ ____/____/ /_(_)___ ___  ____ _/ /_____  _____
  / /_  / __ `/ ___/ __/ __/ / ___/ __/ / __ `__ \/ __ `/ __/ __ \/ ___/
 / __/ / /_/ (__  ) /_/ /___(__  ) /_/ / / / / / / /_/ / /_/ /_/ / /    
/_/    \__,_/____/\__/_____/____/\__/_/_/ /_/ /_/\__,_/\__/\____/_/     
                                                                        

FastEstimator-Start: step: 1; model_lr: 0.001; 
FastEstimator-Train: step: 1; ce: 0.58930933; 
FastEstimator-Train: step: 10; ce: 1.2191963; steps/sec: 342.02; 
FastEstimator-Train: step: 20; ce: 0.6330318; steps/sec: 422.95; 
FastEstimator-Train: step: 30; ce: 0.68403095; steps/sec: 400.86; 
FastEstimator-Train: step: 40; ce: 0.70622563; steps/sec: 277.93; 
FastEstimator-Train: step: 50; ce: 0.7649698; steps/sec: 443.68; 
FastEstimator-Train: step: 60; ce: 0.70189; steps/sec: 455.18; 
FastEstimator-Train: step: 70; ce: 0.6120157; steps/sec: 486.19; 
Fast

## Model testing
`Estimator.test` triggers model testing with the test dataset that was specified in our `Pipeline`. We can use this to evaluate our model's accuracy on previously unseen data:

In [11]:
estimator.test()

FastEstimator-Test: epoch: 10; accuracy: 0.9649122807017544; 
