<a href="https://colab.research.google.com/github/rameez16/Deep-Learning-Projects/blob/main/Breast_Cancer_Prediction_With_ANN(Classification).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>



## 📌 Project Title:
**Breast Cancer Classification using Artificial Neural Network**

---

## 🎯 Objective:
To build and evaluate an Artificial Neural Network (ANN) that can accurately predict whether a tumor is **malignant** or **benign** based on various numerical features extracted from breast mass cell images.

---

## 📊 Dataset Overview:
- **Source**: Breast Cancer Wisconsin (Diagnostic) Dataset
- **Entries**: 569 samples
- **Features**: 30 numerical features (e.g., mean radius, texture, area, smoothness)
- **Target**: Tumor diagnosis label  
  - `Malignant = 1`  
  - `Benign = 0`

---

## 🔍 Features Used:
Statistics including:
- **Mean values**: `mean radius`, `mean texture`, etc.
- **Standard errors**: `radius error`, `texture error`, etc.
- **Worst-case values**: `worst perimeter`, `worst symmetry`, etc.

---

## 🧪 Steps Involved:

### 1. Data Preprocessing
- Load and explore the dataset
- Normalize/scale features
- Encode labels if necessary
- Train-test split (e.g., 80-20)

### 2. Model Building
- Define a Sequential ANN using TensorFlow/Keras
- Input layer: 30 neurons (one per feature)
- Hidden layers: 1–2 layers with **ReLU** activation
- Output layer: 1 neuron with **sigmoid** activation (binary classification)

### 3. Model Compilation
- **Loss function**: `binary_crossentropy`
- **Optimizer**: `adam`
- **Metrics**: `accuracy`

### 4. Training
- Fit model with training data
- Validate with test data
- Use callbacks like `EarlyStopping` (optional)

### 5. Evaluation
- Accuracy
- Confusion matrix
- Precision, Recall, F1-score
- ROC Curve and AUC

### 6. Conclusion
- Summarize model performance
- Suggest improvements and future work

---

## 🧰 Tools & Libraries:
- **Python**
- **Pandas**, **NumPy**
- **Scikit-learn** (for preprocessing and evaluation)
- **TensorFlow / Keras** (for ANN model)
- **Matplotlib**, **Seaborn** (for visualization)

---

## ✅ Expected Outcome:
- A trained ANN model that classifies breast tumors as **malignant** or **benign** with high accuracy (typically >90%)
- Insights into which features contribute most to the diagnosis


DOMAIN KNOWLEDGE



## 🔬 Benign Tumors
- **Definition**: Non-cancerous growths.
- **Growth Rate**: Usually slow.
- **Spread**: Do **not** spread to other parts of the body (non-invasive).
- **Appearance**: Cells look relatively normal under a microscope.
- **Danger Level**: Often not life-threatening unless they press on vital organs.
- **Treatment**: Can often be removed surgically and rarely come back.

**Examples**: Fibroids in the uterus, lipomas (fatty lumps), adenomas.

---

## 🧬 Malignant Tumors (Cancer)
- **Definition**: Cancerous growths that can invade and destroy surrounding tissue.
- **Growth Rate**: Often grow quickly.
- **Spread**: Can spread (metastasize) to distant parts of the body via blood or lymph.
- **Appearance**: Cells are irregular and look abnormal.
- **Danger Level**: Potentially life-threatening if not treated.
- **Treatment**: May require surgery, chemotherapy, radiation, or targeted therapy.

**Examples**: Breast cancer, lung cancer, leukemia, lymphoma.

---

## 🔍 Key Differences Summary

| Feature              | Benign Tumor           | Malignant Tumor         |
|----------------------|------------------------|--------------------------|
| **Growth**           | Slow                   | Rapid                    |
| **Spread**           | No                     | Yes (metastasis)         |
| **Cell appearance**  | Normal-like            | Abnormal and varied      |
| **Treatment**        | Usually surgery        | Surgery + chemo/radiation|
| **Recurrence**       | Rare                   | Possible/likely          |
| **Life-threatening** | Rarely                 | Often                    |

# 1. Data Preprocessing

In [41]:
import numpy as np
import pandas as pd
import sklearn.datasets
from sklearn.model_selection import train_test_split

In [2]:
breast_cancer = sklearn.datasets.load_breast_cancer()


In [3]:
print(breast_cancer)

{'data': array([[1.799e+01, 1.038e+01, 1.228e+02, ..., 2.654e-01, 4.601e-01,
        1.189e-01],
       [2.057e+01, 1.777e+01, 1.329e+02, ..., 1.860e-01, 2.750e-01,
        8.902e-02],
       [1.969e+01, 2.125e+01, 1.300e+02, ..., 2.430e-01, 3.613e-01,
        8.758e-02],
       ...,
       [1.660e+01, 2.808e+01, 1.083e+02, ..., 1.418e-01, 2.218e-01,
        7.820e-02],
       [2.060e+01, 2.933e+01, 1.401e+02, ..., 2.650e-01, 4.087e-01,
        1.240e-01],
       [7.760e+00, 2.454e+01, 4.792e+01, ..., 0.000e+00, 2.871e-01,
        7.039e-02]]), 'target': array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,
       0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0,
       1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0,
       1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1,
       1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0,
 

In [4]:
df=pd.DataFrame(breast_cancer.data,columns=breast_cancer.feature_names)

In [5]:
df.head(5)

Unnamed: 0,mean radius,mean texture,mean perimeter,mean area,mean smoothness,mean compactness,mean concavity,mean concave points,mean symmetry,mean fractal dimension,...,worst radius,worst texture,worst perimeter,worst area,worst smoothness,worst compactness,worst concavity,worst concave points,worst symmetry,worst fractal dimension
0,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,0.2419,0.07871,...,25.38,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189
1,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,0.1812,0.05667,...,24.99,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902
2,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,0.2069,0.05999,...,23.57,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758
3,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,0.2597,0.09744,...,14.91,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173
4,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,0.1809,0.05883,...,22.54,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678


In [6]:
df.describe()

Unnamed: 0,mean radius,mean texture,mean perimeter,mean area,mean smoothness,mean compactness,mean concavity,mean concave points,mean symmetry,mean fractal dimension,...,worst radius,worst texture,worst perimeter,worst area,worst smoothness,worst compactness,worst concavity,worst concave points,worst symmetry,worst fractal dimension
count,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,...,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0
mean,14.127292,19.289649,91.969033,654.889104,0.09636,0.104341,0.088799,0.048919,0.181162,0.062798,...,16.26919,25.677223,107.261213,880.583128,0.132369,0.254265,0.272188,0.114606,0.290076,0.083946
std,3.524049,4.301036,24.298981,351.914129,0.014064,0.052813,0.07972,0.038803,0.027414,0.00706,...,4.833242,6.146258,33.602542,569.356993,0.022832,0.157336,0.208624,0.065732,0.061867,0.018061
min,6.981,9.71,43.79,143.5,0.05263,0.01938,0.0,0.0,0.106,0.04996,...,7.93,12.02,50.41,185.2,0.07117,0.02729,0.0,0.0,0.1565,0.05504
25%,11.7,16.17,75.17,420.3,0.08637,0.06492,0.02956,0.02031,0.1619,0.0577,...,13.01,21.08,84.11,515.3,0.1166,0.1472,0.1145,0.06493,0.2504,0.07146
50%,13.37,18.84,86.24,551.1,0.09587,0.09263,0.06154,0.0335,0.1792,0.06154,...,14.97,25.41,97.66,686.5,0.1313,0.2119,0.2267,0.09993,0.2822,0.08004
75%,15.78,21.8,104.1,782.7,0.1053,0.1304,0.1307,0.074,0.1957,0.06612,...,18.79,29.72,125.4,1084.0,0.146,0.3391,0.3829,0.1614,0.3179,0.09208
max,28.11,39.28,188.5,2501.0,0.1634,0.3454,0.4268,0.2012,0.304,0.09744,...,36.04,49.54,251.2,4254.0,0.2226,1.058,1.252,0.291,0.6638,0.2075


In [7]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 569 entries, 0 to 568
Data columns (total 30 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   mean radius              569 non-null    float64
 1   mean texture             569 non-null    float64
 2   mean perimeter           569 non-null    float64
 3   mean area                569 non-null    float64
 4   mean smoothness          569 non-null    float64
 5   mean compactness         569 non-null    float64
 6   mean concavity           569 non-null    float64
 7   mean concave points      569 non-null    float64
 8   mean symmetry            569 non-null    float64
 9   mean fractal dimension   569 non-null    float64
 10  radius error             569 non-null    float64
 11  texture error            569 non-null    float64
 12  perimeter error          569 non-null    float64
 13  area error               569 non-null    float64
 14  smoothness error         5

In [8]:
df.isnull().sum()

Unnamed: 0,0
mean radius,0
mean texture,0
mean perimeter,0
mean area,0
mean smoothness,0
mean compactness,0
mean concavity,0
mean concave points,0
mean symmetry,0
mean fractal dimension,0


In [9]:
df['label']=breast_cancer.target

In [10]:
df['label'].value_counts()

Unnamed: 0_level_0,count
label,Unnamed: 1_level_1
1,357
0,212


**Data is Imbalenced**

* 1-->Benign Cancre

* 0-->Malignant Cancre

In [11]:
df.shape

(569, 31)

In [12]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 569 entries, 0 to 568
Data columns (total 31 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   mean radius              569 non-null    float64
 1   mean texture             569 non-null    float64
 2   mean perimeter           569 non-null    float64
 3   mean area                569 non-null    float64
 4   mean smoothness          569 non-null    float64
 5   mean compactness         569 non-null    float64
 6   mean concavity           569 non-null    float64
 7   mean concave points      569 non-null    float64
 8   mean symmetry            569 non-null    float64
 9   mean fractal dimension   569 non-null    float64
 10  radius error             569 non-null    float64
 11  texture error            569 non-null    float64
 12  perimeter error          569 non-null    float64
 13  area error               569 non-null    float64
 14  smoothness error         5

### **Seprating the features and target**

In [13]:
x=df.drop(columns='label',axis=1)
y=df['label']

### Splitting the data

In [14]:
X_train,X_test,Y_train,Y_test=train_test_split(x,y,test_size=0.2,random_state=2)

In [15]:
X_train.shape

(455, 30)

In [16]:
Y_train.shape

(455,)

### Standardise the Data

In [17]:
from sklearn.preprocessing import StandardScaler

scalar=StandardScaler()

X_train=scalar.fit_transform(X_train)
X_test=scalar.transform(X_test)

#2. Model Building ( Nural NetWork )


 Define a Sequential ANN using TensorFlow/Keras
- Input layer: 30 neurons (one per feature)
- Hidden layers: 1–2 layers with **ReLU** activation
- Output layer: 1 neuron with **sigmoid** activation (binary classification)

In [18]:
import tensorflow as tf
from tensorflow import keras

In [31]:
model=keras.Sequential([keras.layers.Flatten(input_shape=(30,)),
                        keras.layers.Dense(20,activation='relu'),
                        keras.layers.Dense(30,activation='relu'),
                        keras.layers.Dense(20,activation='relu'),
                        keras.layers.Dense(1,activation='sigmoid')])

  super().__init__(**kwargs)


In [32]:
model.summary()

### 3. Model Compilation
- **Loss function**: `binary_crossentropy`
- **Optimizer**: `adam`
- **Metrics**: `accuracy`

In [33]:
model.compile(optimizer='adam',loss='binary_crossentropy',metrics=['accuracy'])

## 4. Training and Prediction
- Fit model with training data
- Validate with test data

In [34]:
model.fit(X_train,Y_train,validation_split=.1,epochs=10)

Epoch 1/10
[1m13/13[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 23ms/step - accuracy: 0.3092 - loss: 0.7650 - val_accuracy: 0.8696 - val_loss: 0.5816
Epoch 2/10
[1m13/13[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step - accuracy: 0.8001 - loss: 0.5630 - val_accuracy: 0.9348 - val_loss: 0.4363
Epoch 3/10
[1m13/13[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step - accuracy: 0.9286 - loss: 0.4364 - val_accuracy: 0.9565 - val_loss: 0.3239
Epoch 4/10
[1m13/13[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step - accuracy: 0.9331 - loss: 0.3151 - val_accuracy: 0.9565 - val_loss: 0.2374
Epoch 5/10
[1m13/13[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 12ms/step - accuracy: 0.9716 - loss: 0.2267 - val_accuracy: 0.9348 - val_loss: 0.1774
Epoch 6/10
[1m13/13[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step - accuracy: 0.9687 - loss: 0.1753 - val_accuracy: 0.9348 - val_loss: 0.1380
Epoch 7/10
[1m13/13[0m [32m━━━━━━━━

<keras.src.callbacks.history.History at 0x780f08aca3d0>

Prediction on Test Data

In [49]:
test_predict=model.predict(X_test)

[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step 


In [50]:
test_predict[0]

array([0.8268795], dtype=float32)

In [51]:
test_predict = (test_predict >= 0.5).astype(int)

In [52]:
test_predict[0]

array([1])

Prediction on Train data

In [45]:
train_predict=model.predict(X_train)

[1m15/15[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step 


In [53]:
train_predict = (train_predict >= 0.5).astype(int)

In [54]:
train_predict[0]

array([1])

# 4. Evaluation

In [55]:
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report, f1_score, precision_score, recall_score

In [56]:
print('Training model accuracy :- ', accuracy_score(Y_train, train_predict))
print('training model Confusion Matrix :- \n',confusion_matrix(Y_train, train_predict))
print('training model precision_score Report :- ',precision_score(Y_train, train_predict))
print('training model recall_score Report :- ',recall_score(Y_train, train_predict))
print(' ')
print('*'*80)
print(' ')
print('Testing model accuracy :- ', accuracy_score(Y_test, test_predict))
print('Testing model Confusion Matrix :- \n',confusion_matrix(Y_test, test_predict))
print('Testing model precision_score Report :- ',precision_score(Y_test, test_predict))
print('Testing model recall_score Report :- ',recall_score(Y_test, test_predict))

Training model accuracy :-  0.9824175824175824
training model Confusion Matrix :- 
 [[162   5]
 [  3 285]]
training model precision_score Report :-  0.9827586206896551
training model recall_score Report :-  0.9895833333333334
 
********************************************************************************
 
Testing model accuracy :-  0.9473684210526315
Testing model Confusion Matrix :- 
 [[41  4]
 [ 2 67]]
Testing model precision_score Report :-  0.9436619718309859
Testing model recall_score Report :-  0.9710144927536232


##  Training Performance

- **Accuracy**: 98.24%
- **Confusion Matrix**:
[[162 5]
[ 3 285]]

markdown
Copy
Edit
- **True Negatives (TN)**: 162
- **False Positives (FP)**: 5
- **False Negatives (FN)**: 3
- **True Positives (TP)**: 285

- **Precision**: 98.28%  
*(High precision indicates very few false positives)*
- **Recall**: 98.96%  
*(High recall means most malignant cases were correctly identified)*

---

##  Testing Performance

- **Accuracy**: 94.74%
- **Confusion Matrix**:
[[41 4]
[ 2 67]]

- **True Negatives (TN)**: 41
- **False Positives (FP)**: 4
- **False Negatives (FN)**: 2
- **True Positives (TP)**: 67

- **Precision**: 94.37%  
- **Recall**: 97.10%

---

## 📈 Summary

| Metric       | Training Set | Testing Set |
|--------------|--------------|-------------|
| **Accuracy** | 98.24%       | 94.74%      |
| **Precision**| 98.28%       | 94.37%      |
| **Recall**   | 98.96%       | 97.10%      |

---

## 🔍 Conclusion

- The model demonstrates excellent performance on the training set, indicating effective learning.
- The slight drop in performance on the test set is expected and indicates generalization ability without overfitting.
- Both precision and recall are high, suggesting that the model is effective at minimizing false positives and false negatives — a critical aspect in medical diagnosis.

---
