# Hands-On Activity: Support Vector Machines (SVM) for Breast Cancer Classification

---

### Introduction:
Today, we'll dive into the world of Support Vector Machines (SVM) and explore their application in classifying breast cancer using the popular Breast Cancer dataset. We'll be using Python and the scikit-learn library for this hands-on activity.

### Instructions:
Follow the prompts below to complete each step of the activity. Make sure to understand the significance of the C value in SVM and how it influences the model's performance.

### Step 1: Importing Libraries

In this step, we import essential libraries. Familiarize yourself with the libraries we'll be using throughout the activity.

In [1]:
# Importing required libraries
import pandas as pd
from sklearn import datasets
from sklearn.model_selection import GridSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, classification_report

### Step 2: Load and Prepare the Data

In this step, we load the Breast Cancer dataset and prepare it for training. Examine the structure of the dataset and understand how we split it into features (X) and target (y). Standardizing the data is crucial for SVMs—why do you think we do this?

In [2]:
# Load the Breast Cancer dataset into a Pandas DataFrame
cancer = datasets.load_breast_cancer()
data = pd.DataFrame(data=cancer.data, columns=cancer.feature_names)
data['target'] = cancer.target

# Split the data into features (X) and target (y)
X = data.drop('target', axis=1)
y = data['target']


# Standardize the data
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

### Step 3: Understanding the C value in SVM

Take a moment to read the explanation about the C value. What does a higher C value emphasize? And what about a lower C value? How does the C value affect the trade-off between training error and margin?

#### C value in SVM

The C value in SVM is a hyperparameter that controls the trade-off between the training error and the margin. The C value determines how much the SVM algorithm is allowed to deviate from the training data in order to find a hyperplane with a wider margin. , while 

* A higher C value places more emphasis on minimizing the training error, potentially resulting in a narrower margin. 
  * If the C value is too high, the model may overfit the training data and perform poorly on new, unseen data.
* A lower C value places more emphasis on maximizing the margin, even if it means allowing more misclassifications during training.
  * If the C value is too low, the model may not be able to learn the underlying pattern in the data and may perform poorly on both the training and test data.

### Step 4: Build an SVM Classifier with GridSearchCV

Here, we start building our SVM classifier. Explore the hyperparameters used in GridSearchCV. What values are we trying for the C parameter, and what are the available kernel options? What does GridSearchCV do, and why is it valuable in hyperparameter tuning?

In [3]:
# Build an SVM classifier with GridSearchCV
param_grid = {'C': [0.1, 1, 10, 100], 
              'kernel': ['linear', 'rbf']}

svm_classifier = SVC()

grid_search = GridSearchCV(svm_classifier, param_grid, cv=5, scoring='accuracy')
grid_search.fit(X_scaled, y)

# Print the best parameters found by GridSearchCV
print("Best Parameters:", grid_search.best_params_)


Best Parameters: {'C': 10, 'kernel': 'rbf'}


### Step 5: Make Predictions and Evaluate the Model

In the final step, we make predictions using our trained model and evaluate its performance. What metrics are we using to evaluate the model? How accurate is our model, and what insights can you draw from the classification report?

In [4]:
# Make predictions using the best model
y_pred = grid_search.predict(X_scaled)

# Evaluate the model
accuracy = accuracy_score(y, y_pred)
report = classification_report(y, y_pred)

print(f"Accuracy: {accuracy}")
print("Classification Report:\n", report)


Accuracy: 0.9912126537785588
Classification Report:
               precision    recall  f1-score   support

           0       1.00      0.98      0.99       212
           1       0.99      1.00      0.99       357

    accuracy                           0.99       569
   macro avg       0.99      0.99      0.99       569
weighted avg       0.99      0.99      0.99       569

