# Case Study: SVM Classification using the Breast Cancer Dataset

In this case study, we will classify the Breast Cancer dataset using a Support Vector Machine (SVM).
We will go through the following steps:

1. **Importing Libraries**: Required libraries for data preprocessing, model training, and evaluation.
2. **Data Loading and Preprocessing**: Loading the dataset, splitting the data into training and test sets, and standardizing the features.
3. **Model Training**: Training an SVM model with a linear kernel.
4. **Visualization**: Visualizing the decision boundary of the trained model.
5. **Evaluation**: Making predictions and evaluating the model using a confusion matrix and classification report.

We will use `scikit-learn`'s Breast Cancer dataset for this analysis, which is a binary classification problem where the goal is to classify whether a tumor is benign or malignant.

In [None]:
# Step 1: Import Libraries
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import classification_report, confusion_matrix

### Step 2: Load and Preprocess the Breast Cancer Dataset

In this step, we will:
- Load the Breast Cancer dataset.
- Extract the features (X) and labels (y).
- Select only the first two features for visualization.
- Split the dataset into training and test sets.
- Standardize the features using `StandardScaler` to ensure that they have zero mean and unit variance.

In [None]:
# Load dataset
cancer = datasets.load_breast_cancer()
X, y = cancer.data, cancer.target  # Features & labels


In [None]:
# Select first two features for visualization
X = X[:, :2]

# Split dataset into train & test sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Display dataset shapes
X_train.shape, X_test.shape, y_train.shape, y_test.shape

In [None]:
# Standardize the data
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

### Step 3: Train the SVM Model

Now we will train an SVM model using a linear kernel. We will fit the model using the training data and evaluate it on the test data later.

In [None]:
# Train an SVM model with a linear kernel
svm_model = SVC(kernel='linear', C=1.0)  # C is the regularization parameter
svm_model.fit(X_train, y_train)

### Step 4: Visualize the Decision Boundary

Next, we will visualize the decision boundary learned by the SVM model. The decision boundary will help us understand how the model distinguishes between benign and malignant tumors.
We will plot the decision boundary along with the training data points.

In [None]:
def plot_decision_boundary(model, X, y):
    plt.figure(figsize=(8, 6))

    # Define grid for decision boundary plot
    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    xx, yy = np.meshgrid(np.linspace(x_min, x_max, 100), np.linspace(y_min, y_max, 100))

    # Predict across the grid
    Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)

    # Plot contour
    plt.contourf(xx, yy, Z, alpha=0.3, cmap=plt.cm.coolwarm)

    # Scatter plot of training points
    scatter = plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.coolwarm, edgecolors='k')
    plt.xlabel('Feature 1 (Normalized)')
    plt.ylabel('Feature 2 (Normalized)')
    plt.title('SVM Decision Boundary')

    # Add legend
    legend_labels = ['Benign (0)', 'Malignant (1)']
    handles, _ = scatter.legend_elements()
    plt.legend(handles, legend_labels, loc='upper right')

    plt.show()

# Plot decision boundary
plot_decision_boundary(svm_model, X_train, y_train)

### Step 5: Model Predictions and Evaluation

Finally, we will make predictions on the test set and evaluate the performance of the trained SVM model using:
- **Confusion Matrix**: To visualize the model's classification results.
- **Classification Report**: To show detailed metrics such as precision, recall, and F1-score.

In [None]:
# Make predictions on the test set
y_pred = svm_model.predict(X_test)

# Confusion matrix
cm = confusion_matrix(y_test, y_pred)

# Plot confusion matrix
plt.figure(figsize=(5, 4))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=['Benign', 'Malignant'], yticklabels=['Benign', 'Malignant'])
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.title('Confusion Matrix')
plt.show()

# Classification Report
print('Classification Report:\n', classification_report(y_test, y_pred))

### Summary

In this notebook, we demonstrated how to:
1. Load and preprocess the Breast Cancer dataset.
2. Train an SVM model to classify the tumors as benign or malignant.
3. Visualize the decision boundary learned by the SVM.
4. Evaluate the model's performance using a confusion matrix and classification report.

This example showcases the power of Support Vector Machines in binary classification tasks and how visualizing decision boundaries can help understand model behavior.