**PREDICTING BREAST CANCER USING ARTIFICIAL NEURAL NETWORKS.**

* The features that have been computed from digitized images of the cell nuclei, which can be used to build a model to predict whether a tumor is benign or malignant.
* 1 = Malignant (Cancerous) - Present (M)
* 0 = Benign (Not Cancerous) -Absent (B)

**HERE I HAVE APPLIED SOME TECHNIQUES TO AVOID OVERFITTING ALSO**

In [None]:
#importing libaries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import tensorflow as tf

In [None]:
#importing dataset
ds = pd.read_csv('../input/breast-cancer-wisconsin-data/data.csv')

In [None]:
#reviewing dataset
pd.set_option('display.max_columns',None)
ds.head()

In [None]:
#dropping unnecessary features
ds.drop(['id', 'Unnamed: 32'], axis = 1, inplace = True)

In [None]:
#checking type of feaures
ds.info()

In [None]:
#dataset has 569 rows and 31 columns
ds.shape

In [None]:
#checking for null values
ds.isnull().sum()

**NO MISSING VALUES**

In [None]:
#taking care of categorical values
from sklearn.preprocessing import LabelEncoder
le=LabelEncoder()
ds['diagnosis']=le.fit_transform(ds['diagnosis'])

In [None]:
plt.figure(figsize = (8,6))
sns.countplot(x = 'diagnosis', data = ds)

**AROUND 350 ARE Benign**

In [None]:
plt.figure(figsize=(16,14))
sns.heatmap(ds.corr(), cmap='Blues', annot = True)
plt.title("Correlation Map", fontweight = "bold", fontsize=16)

**CORRELATION WAS FOUND IN SOME FEATURES**

**WE CAN EITHER REMOVE THE HIGH CORRELATED FEATURES OR WE CAN USE ALL THE FEATURES, I AM USING ALL FEATURES.**

* **REMOVING CORRELATED FEATURES MAY INCREASE ACCURACY**

In [None]:
#defining dependent and independent variables
x = ds.drop('diagnosis', axis=1)
y = ds['diagnosis']

In [None]:
#splitting data into training and testing set
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.2, random_state = 0)

**FEATURE SCALING IS NECESSARY IN ANN**

In [None]:
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
x_train = sc.fit_transform(x_train)
x_test = sc.transform(x_test)

In [None]:
from tensorflow.keras.layers import Dropout


#CREATING THE ANN AS SEQUENCE OF LAYERS
ann = tf.keras.models.Sequential()

#ADDING FIRST HIDDEN LAYER WITH 30 NEURONS, THE INPUT LAYER WILL BE ADDED AUTOMATICALLY,
ann.add(tf.keras.layers.Dense(units = 30,activation = 'relu'))
ann.add(Dropout(0.5))

#ADDING 2ND HIDDEN LAYER WITH 30 NEURONS
ann.add(tf.keras.layers.Dense(units = 30,activation = 'relu'))
ann.add(Dropout(0.5))

#ADDING OUTPUT LAYER WITH 1 NEURON , AS THIS IS A BINARY CLASSIFICATION
ann.add(tf.keras.layers.Dense(units = 1,activation = 'sigmoid'))

#COMPILING THE ANN USING STOCHASTIC GRADIENT DESCENT (optimizer = 'adam')
ann.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])


In [None]:
#setting callbacks for monitoring maximum accuracy
from tensorflow.keras.callbacks import EarlyStopping
early = EarlyStopping(monitor = 'accuracy', mode = 'max',patience = 25)

In [None]:
#TRAINING THE ANN WITH BATCH SIZE OF 32 (THIS IS A BATCH LEARNING)
ann.fit(x_train, y_train,batch_size = 32, epochs = 400,callbacks = [early])

In [None]:
losses = pd.DataFrame(ann.history.history)
losses.plot()

In [None]:
#GETTING ACCURACY AND CONFUSION MATRIX
from sklearn.metrics import confusion_matrix,accuracy_score
y_pred = ann.predict(x_test)
y_pred  = y_pred > 0.5
cm = confusion_matrix(y_pred,y_test)
ac = accuracy_score(y_pred,y_test)

In [None]:
sns.heatmap(cm,annot = True)
plt.title('CONFUSION MATRIX')
print('Accuracy - {}%'.format(ac*100))