# Classification Problem on Breast Cancer Dataset

## Problem Statement
Breast cancer is one of the most common cancers worldwide, and early detection plays a vital role in improving patient survival rates. In this task, we aim to build and compare different machine learning classification models to predict whether a tumor is **malignant** or **benign** based on features extracted from digitized images of breast mass.

## Dataset
We will use the **Breast Cancer dataset** available in `scikit-learn`.  
- Input: 30 numeric features (e.g., radius, texture, smoothness, compactness, etc.).  
- Target:  
  - `0` → Malignant  
  - `1` → Benign  

## Objectives
1. Train and evaluate the following classification models:
   - Logistic Regression  
   - Naive Bayes  
   - Decision Tree  
   - Random Forest  
   - Support Vector Machine (SVM)  
   - Multi-Layer Perceptron (MLP) Classifier with **2 hidden layers** (using an optimizer)  

2. Compute and compare the following evaluation metrics for each model:
   - Accuracy  
   - Precision  
   - Recall  
   - F1-score  
   - Classification Report  

## Expected Outcome
By the end of this task, we will:
- Identify which classifier performs best on the Breast Cancer dataset.  
- Understand the trade-offs between different algorithms in terms of **precision, recall, and F1-score**, which are crucial in medical decision-making.  


In [8]:
import numpy as np
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, precision_score, recall_score,f1_score,classification_report

# models
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import GaussianNB
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.neural_network import MLPClassifier

In [3]:
data = load_breast_cancer()
x = data.data
y = data.target

In [4]:
x_train, x_test, y_train,y_test = train_test_split(x,y,test_size=0.2,random_state=42)

In [5]:
scaler = StandardScaler()
x_train = scaler.fit_transform(x_train)
x_test = scaler.transform(x_test)

In [6]:
models = {
    'Logistic Regression':LogisticRegression(max_iter=500,random_state=42),
    'Naive Bayes':GaussianNB(),
    'Decision Tree':DecisionTreeClassifier(random_state=42),
    'Random Forest':RandomForestClassifier(n_estimators=100,random_state=42),
    'SVM':SVC(kernel='rbf',probability=True,random_state=42),
    'MLPClassifier':MLPClassifier(hidden_layer_sizes= (64,32),solver='adam',max_iter=500,learning_rate_init=0.001,random_state=41)
}

In [9]:
results = {}
for name,model in models.items():
    model.fit(x_train,y_train)
    y_pred = model.predict(x_test)

    acc = accuracy_score(y_test,y_pred)
    prec = precision_score(y_test,y_pred)
    rec = recall_score(y_test,y_pred)
    f1 = f1_score(y_test,y_pred)

    results[name] = {'Accuracy':acc,'Precision':prec,'Recall':rec,'F1 Score':f1}
    print(f'\n{name} Classification Report:\n {classification_report(y_test,y_pred)}')

results_df = pd.DataFrame(results).T
results_df


Logistic Regression Classification Report:
               precision    recall  f1-score   support

           0       0.98      0.95      0.96        43
           1       0.97      0.99      0.98        71

    accuracy                           0.97       114
   macro avg       0.97      0.97      0.97       114
weighted avg       0.97      0.97      0.97       114


Naive Bayes Classification Report:
               precision    recall  f1-score   support

           0       0.98      0.93      0.95        43
           1       0.96      0.99      0.97        71

    accuracy                           0.96       114
   macro avg       0.97      0.96      0.96       114
weighted avg       0.97      0.96      0.96       114


Decision Tree Classification Report:
               precision    recall  f1-score   support

           0       0.93      0.93      0.93        43
           1       0.96      0.96      0.96        71

    accuracy                           0.95       114
   macr

Unnamed: 0,Accuracy,Precision,Recall,F1 Score
Logistic Regression,0.973684,0.972222,0.985915,0.979021
Naive Bayes,0.964912,0.958904,0.985915,0.972222
Decision Tree,0.947368,0.957746,0.957746,0.957746
Random Forest,0.964912,0.958904,0.985915,0.972222
SVM,0.982456,0.972603,1.0,0.986111
MLPClassifier,0.973684,0.972222,0.985915,0.979021
