# Assignment 3: Breast Cancer Classification

This notebook demonstrates classification of the Breast Cancer dataset using scikit-learn.

Authors: Malek Sibai, Mahri Kadyrova

## 1. Load the Breast Cancer Dataset

In [1]:
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import confusion_matrix, classification_report
import numpy as np

# Load the dataset
data = load_breast_cancer()

## 2. Assign Features and Labels

In [2]:
# Assign features to X and labels to y
X = data.data
y = data.target

## 3. Split Data into Train and Test Sets

In [3]:
# Split the data into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

## 4. Train a Classifier

Using Random Forest Classifier to classify benign and malignant cases.

In [4]:
# Create and train the Random Forest classifier
classifier = RandomForestClassifier(n_estimators=100, random_state=42)
classifier.fit(X_train, y_train)

# Make predictions on the test set
y_pred = classifier.predict(X_test)

# Calculate accuracy
accuracy = classifier.score(X_test, y_test)
print(f"Accuracy: {accuracy:.4f} ({accuracy*100:.2f}%)")

Accuracy: 0.9649 (96.49%)


## 5. Compute Confusion Matrix, Precision, and Recall

In [None]:
# Compute confusion matrix
cm = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:")
print(cm)

Confusion Matrix:
[[40  3]
 [ 1 70]]

Interpretation:
True Negatives (Malignant correctly classified): 40
False Positives (Malignant incorrectly classified as Benign): 3
False Negatives (Benign incorrectly classified as Malignant): 1
True Positives (Benign correctly classified): 70


In [6]:
# Compute precision, recall, and other metrics
report = classification_report(y_test, y_pred, target_names=data.target_names)
print("Classification Report:")
print(report)

Classification Report:
              precision    recall  f1-score   support

   malignant       0.98      0.93      0.95        43
      benign       0.96      0.99      0.97        71

    accuracy                           0.96       114
   macro avg       0.97      0.96      0.96       114
weighted avg       0.97      0.96      0.96       114

