<a href="https://www.kaggle.com/code/manishkr1754/day21-project21-dl-breast-cancer-classication?scriptVersionId=144993019" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

---
<center><h1>Deep Learning - Breast Cancer Classification with Neural Network</h1></center>
<center><h3>Part of 30 Days 30 ML Projects Challenge</h3></center>

---

## 1) Understanding Problem Statement
---

Breast cancer is a widespread and potentially life-threatening medical condition that affects a significant portion of the population, predominantly women. Timely and precise diagnosis of breast cancer plays a crucial role in determining treatment options and improving patient outcomes. In this context, the application of machine learning offers a promising avenue to tackle this healthcare challenge.

This project belongs to the domain of **Medical Diagnosis and Classification using Machine Learning**. The primary goal is **to develop a predictive model for the classification of breast cancer by analyzing a comprehensive dataset that includes various clinical attributes, mammography findings and patient demographics**.

## 2) Understanding Data
---

The project uses **Breast Cancer Data** which contains several variables (independent variables) and the outcome variable or dependent variable.

## 3) Getting System Ready
---
Importing required libraries


In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

## 4) Data Eyeballing
---

### Laoding Data

In [None]:
breast_cancer_data = pd.read_csv('/kaggle/input/day21-dl-breast-cancer-data/Day21_DL_Breast_Cancer_Data.csv') 

In [None]:
breast_cancer_data

In [None]:
print('The size of Dataframe is: ', breast_cancer_data.shape)
print('-'*100)
print('The Column Name, Record Count and Data Types are as follows: ')
breast_cancer_data.info()
print('-'*100)

In [None]:
# Defining numerical & categorical columns
numeric_features = [feature for feature in breast_cancer_data.columns if breast_cancer_data[feature].dtype != 'O']
categorical_features = [feature for feature in breast_cancer_data.columns if breast_cancer_data[feature].dtype == 'O']

# print columns
print('We have {} numerical features : {}'.format(len(numeric_features), numeric_features))
print('\nWe have {} categorical features : {}'.format(len(categorical_features), categorical_features))

In [None]:
print('Missing Value Presence in different columns of DataFrame are as follows : ')
print('-'*100)
total=breast_cancer_data.isnull().sum().sort_values(ascending=False)
percent=(breast_cancer_data.isnull().sum()/breast_cancer_data.isnull().count()*100).sort_values(ascending=False)
pd.concat([total, percent], axis=1, keys=['Total', 'Percent'])

In [None]:
print('Summary Statistics of numerical features for DataFrame are as follows:')
print('-'*100)
breast_cancer_data.describe()

In [None]:
print('Summary Statistics of categorical features for DataFrame are as follows:')
print('-'*100)
breast_cancer_data.describe(include='object')

In [None]:
breast_cancer_data['diagnosis'].value_counts() # status is target variable

## 5) Data Cleaning and Preprocessing
---

### Dropping unwanted columns

In [None]:
breast_cancer_data = breast_cancer_data.drop(columns = ['Unnamed: 32'], axis=1)

In [None]:
breast_cancer_data

### Encoding 'M'(Malignant) as 0 and 'B'(Benign) as 1

In [None]:
breast_cancer_data['diagnosis'] = breast_cancer_data['diagnosis'].map({'M':0,'B':1})

In [None]:
breast_cancer_data

## 5) Model Building
---

### Creating Feature Matrix (Independent Variables) & Target Variable (Dependent Variable)

In [None]:
# separating the data and labels
X = breast_cancer_data.drop(columns = ['id','diagnosis'], axis=1) # Feature matrix
y = breast_cancer_data['diagnosis'] # Target variable

In [None]:
X

In [None]:
y

### Data Standardization

In [None]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()

In [None]:
scaler.fit(X)

In [None]:
standardized_data = scaler.transform(X)

In [None]:
standardized_data

In [None]:
X = standardized_data

In [None]:
X

### Train-Test Split

In [None]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, stratify=y, random_state=45)

In [None]:
print(X.shape, X_train.shape, X_test.shape)

In [None]:
print(y.shape, y_train.shape, y_test.shape)

### Building Neural Network

#### Importing tensorflow and Keras

In [None]:
import tensorflow as tf 
tf.random.set_seed(3)
from tensorflow import keras

#### Setting up the layers of Neural Network

In [None]:
model = keras.Sequential([
                          keras.layers.Flatten(input_shape=(30,)),
                          keras.layers.Dense(20, activation='relu'),
                          keras.layers.Dense(2, activation='sigmoid')
])

#### Compiling the Neural Network

In [None]:
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

#### Training the Neural Network

In [None]:
history = model.fit(X_train, y_train, validation_split=0.1, epochs=10)

#### Visualization: Model Accuracy Vs epoch

In [None]:
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])

plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')

plt.legend(['training data', 'validation data'], loc = 'lower right')

#### Model Loss Vs epoch

In [None]:
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])

plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')

plt.legend(['training data', 'validation data'], loc = 'upper right')

#### Accuracy of Model on Test Data

In [None]:
loss, accuracy = model.evaluate(X_test, y_test)
print(accuracy)

#### Inference

After training the model, it is tested on a separate dataset to assess its performance. The printed accuracy of approximately 94.74% indicates that the model correctly classified tumors as benign or malignant in nearly 95% of cases. This high accuracy is promising for its potential clinical utility. However, additional evaluation metrics and analysis are necessary to ensure the model's reliability in medical applications. False positives and false negatives should be carefully examined, as they have distinct implications in the context of cancer diagnosis. Further validation and testing with diverse datasets are crucial to assess the model's generalization and its readiness for real-world clinical use.