The Breast Cancer Wisconsin dataset is a well-known dataset for binary classification, used to predict whether a tumor is malignant (cancerous) or benign (non-cancerous) based on various cell features.
Target Classes:
0 → Malignant (cancerous)
1 → Benign (non-cancerous)
The dataset consists of 30 numerical features describing tumor characteristics, categorized into three types:
Mean Features (10) – Average values of tumor properties.

Standard Error Features (10) – Measure of variation in tumor properties.

Worst Features (10) – The largest (worst) values of tumor characteristics.

In [11]:
import numpy as np
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

In [13]:
data = load_breast_cancer()
df = pd.DataFrame(data.data, columns=data.feature_names)
df['target'] = data.target

X = df.drop(columns='target')
y = df['target']

In [15]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42, stratify=y)

log_reg = LogisticRegression(max_iter=10000, random_state=42)
log_reg.fit(X_train, y_train)

y_pred = log_reg.predict(X_test)

In [17]:
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred)

print("Accuracy:", accuracy)
print("\nConfusion Matrix:\n", conf_matrix)
print("\nClassification Report:\n", class_report)

Accuracy: 0.9473684210526315

Confusion Matrix:
 [[ 57   7]
 [  2 105]]

Classification Report:
               precision    recall  f1-score   support

           0       0.97      0.89      0.93        64
           1       0.94      0.98      0.96       107

    accuracy                           0.95       171
   macro avg       0.95      0.94      0.94       171
weighted avg       0.95      0.95      0.95       171

