# Logistic Regression on the Breast Cancer Dataset
This notebook demonstrates a binary classification task using logistic regression from **scikit-learn**.
We will step through loading the data, training the model, and evaluating the results using standard metrics.

## Dataset
We use the Breast Cancer Wisconsin dataset included with scikit-learn.
Each sample contains 30 numeric features describing cell nuclei from a digitized image.
The target label is **1** for malignant tumors and **0** for benign.

In [None]:
import numpy as np
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report, ConfusionMatrixDisplay, RocCurveDisplay
import matplotlib.pyplot as plt


In [None]:
# Load dataset
data = load_breast_cancer()
X = data.data
y = data.target

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


### Dataset shape
After loading, we'll inspect the feature matrix and the target vector.

In [None]:
print(f"Features shape: {X.shape}")
print(f"Labels shape: {y.shape}")

## Training the Model
With the data in hand we fit a logistic regression classifier.

In [None]:
# Train logistic regression model
clf = LogisticRegression(max_iter=1000)
clf.fit(X_train, y_train)

# Predict on the test set
y_pred = clf.predict(X_test)
acc = accuracy_score(y_test, y_pred)
print(f"Accuracy: {acc:.3f}")


## Evaluation Metrics
We'll check accuracy, print a classification report, and show the confusion matrix.

In [None]:
# Display classification report and confusion matrix
print(classification_report(y_test, y_pred, target_names=data.target_names))
ConfusionMatrixDisplay.from_predictions(y_test, y_pred)
plt.show()


### ROC Curve
Finally, let's visualize the trade-off between true and false positive rates.

In [None]:
# ROC curve
RocCurveDisplay.from_estimator(clf, X_test, y_test)
plt.show()
