# Comparison of Classification Methods

In this notebook, we will compare four classification techniques:
1. **k-Nearest Neighbors (kNN)**
2. **Support Vector Machine (SVM)**
3. **Logistic Regression**
4. **Linear Regression**

We will use the Breast Cancer dataset from scikit-learn (a binary classification problem: **malignant** vs. **benign**).

In [32]:
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression, LinearRegression
from sklearn.metrics import accuracy_score, classification_report

In [33]:
# Load the dataset
data = load_breast_cancer()
X, y = data.data, data.target

# Split into training and testing sets (70% train, 30% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Standardize the features (mean=0, variance=1)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

## 2. k-Nearest Neighbors (kNN)

**Theory:**  
kNN is a non-parametric method that classifies a new data point based on the majority vote among its `k` nearest neighbors. It does not involve an explicit training phase—classification is performed at the time of prediction. However, this can be computationally expensive when the dataset is large, and the method is sensitive to the choice of `k` and feature scaling.

Let's apply kNN with `k = 5`.

In [34]:
# kNN Classifier
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train_scaled, y_train)
y_pred_knn = knn.predict(X_test_scaled)
accuracy_knn = accuracy_score(y_test, y_pred_knn)

print("### kNN Results ###")
print("Accuracy: {:.2f}%".format(accuracy_knn * 100))
print("\nClassification Report:")
print(classification_report(y_test, y_pred_knn))

### kNN Results ###
Accuracy: 95.91%

Classification Report:
              precision    recall  f1-score   support

           0       0.95      0.94      0.94        63
           1       0.96      0.97      0.97       108

    accuracy                           0.96       171
   macro avg       0.96      0.95      0.96       171
weighted avg       0.96      0.96      0.96       171



## 3. Support Vector Machine (SVM)

**Theory:**  
SVMs try to find a hyperplane that best separates the classes by maximizing the margin between them. With a linear kernel, SVM finds a straight-line (or hyperplane in higher dimensions) separator. SVMs are particularly effective in high-dimensional spaces and are robust with proper regularization.

We use SVM with a linear kernel and a regularization parameter `C = 1.0`.

In [35]:
# SVM Classifier
svm = SVC(kernel='linear', C=1.0, random_state=42)
svm.fit(X_train_scaled, y_train)
y_pred_svm = svm.predict(X_test_scaled)
accuracy_svm = accuracy_score(y_test, y_pred_svm)

print("### SVM Results ###")
print("Accuracy: {:.2f}%".format(accuracy_svm * 100))
print("\nClassification Report:")
print(classification_report(y_test, y_pred_svm))

### SVM Results ###
Accuracy: 97.66%

Classification Report:
              precision    recall  f1-score   support

           0       0.97      0.97      0.97        63
           1       0.98      0.98      0.98       108

    accuracy                           0.98       171
   macro avg       0.97      0.97      0.97       171
weighted avg       0.98      0.98      0.98       171



## 4. Logistic Regression

**Theory:**  
Logistic Regression models the probability of class membership using the logistic (sigmoid) function. It outputs probabilities which can be thresholded (commonly at 0.5) to obtain class predictions.

In [36]:
# Logistic Regression Classifier
logreg = LogisticRegression(max_iter=10000, random_state=42)
logreg.fit(X_train_scaled, y_train)
y_pred_logreg = logreg.predict(X_test_scaled)
accuracy_logreg = accuracy_score(y_test, y_pred_logreg)

print("### Logistic Regression Results ###")
print("Accuracy: {:.2f}%".format(accuracy_logreg * 100))
print("\nClassification Report:")
print(classification_report(y_test, y_pred_logreg))

### Logistic Regression Results ###
Accuracy: 98.25%

Classification Report:
              precision    recall  f1-score   support

           0       0.97      0.98      0.98        63
           1       0.99      0.98      0.99       108

    accuracy                           0.98       171
   macro avg       0.98      0.98      0.98       171
weighted avg       0.98      0.98      0.98       171



## 5. Linear Regression for Classification

**Theory:**  
Linear Regression is inherently designed for continuous outcomes. However, one can use it for classification by fitting a linear model to predict a continuous value and then thresholding the predictions (e.g., at 0.5) to decide class membership.  
**Note:** This approach is not ideal because the model is not constrained to output probabilities between 0 and 1, but it serves as an instructive baseline.

In [37]:
# Linear Regression Classifier (thresholded at 0.5)
linreg = LinearRegression()
linreg.fit(X_train_scaled, y_train)
y_pred_linreg_cont = linreg.predict(X_test_scaled)
y_pred_linreg = (y_pred_linreg_cont >= 0.5).astype(int)
accuracy_linreg = accuracy_score(y_test, y_pred_linreg)

print("### Linear Regression (Thresholded) Results ###")
print("Accuracy: {:.2f}%".format(accuracy_linreg * 100))
print("\nClassification Report:")
print(classification_report(y_test, y_pred_linreg))

### Linear Regression (Thresholded) Results ###
Accuracy: 95.32%

Classification Report:
              precision    recall  f1-score   support

           0       0.97      0.90      0.93        63
           1       0.95      0.98      0.96       108

    accuracy                           0.95       171
   macro avg       0.96      0.94      0.95       171
weighted avg       0.95      0.95      0.95       171



# 6. Results

Classification Accuracies:
1. kNN Accuracy: 95.91%
2. SVM Accuracy: 97.66%
3. Logistic Regression Accuracy: 98.25%
4. Linear Regression Accuracy (thresholded): 95.32%