# **<center> Solution Manual For Tutors </center>**

# <center><strong>Important:</strong> Make a Copy of this Google Colab Notebook!
</center>

<p>Please refrain from using or modifying this current Google Colab notebook directly. Instead, follow these instructions to create your own copy:</p>

<ol>
  <li>Go to the "File" menu at the top of the Colab interface.</li>
  <li>Select "Save a copy in Drive" to create a duplicate of this notebook.</li>
  <li>You can now work on your own copy without affecting the original.</li>
</ol>

<p>This ensures that you have a personalized version to work on and make changes according to your needs. Remember to save your progress as you go. Enjoy working on your own copy of the Google Colab notebook!</p>

# Moduel 21: Programming SVM Algorithm in Python using sklearn

## **Getting Started**

Run the provided code sections and follow the instructions. Implement your own code where indicated.

<h2>Recap: Support Vector Machines (SVM)</h2>
<p> Support Vector Machines (SVM) is a powerful supervised learning algorithm used for classification and regression tasks. It finds an optimal hyperplane that maximizes the margin between classes, allowing for effective separation of data points. </p>

<h2>Mathematical Principles of SVM</h2>
<p> SVM relies on key mathematical principles:
<ul>
<li><b>Margin:</b> The distance between the decision boundary and the nearest data points.</li>
<li><b>Support Vectors:</b> The data points closest to the decision boundary.</li>
<li><b>Kernel Trick:</b> A method to transform the data into a higher-dimensional space to make it more separable.</li>
</ul></p>

<h2>Advantages of SVM</h2>
<p>SVM has several advantages:
<ul>
<li>Effective in high-dimensional spaces and with complex datasets.</li>
<li>Works well with both linearly separable and non-linearly separable data. (We focus on lineraly separable data)</li>
<li>Can handle large datasets efficiently using the kernel trick.</li>
<li>Offers different kernel functions for flexibility in capturing complex relationships.</li>
<li>Robust against overfitting when the regularization parameter (C) is properly tuned.</li>
</ul>
</p>



## **Importing Python Packages**
The first step is to import your necessary Python packages.

In [None]:
from sklearn import svm
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import numpy as np
import matplotlib.pyplot as plt

In [None]:
from sklearn import svm
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import numpy as np
import matplotlib.pyplot as plt

# Task 3: Load the Dataset
# ------------------------
# Load the Iris dataset from sklearn
iris = load_iris()

# Task 4: Split the Dataset
# -------------------------
# Split the dataset into features (X) and target variable (y)
X = iris.data
y = iris.target

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Task 5: Preprocess the Data
# ---------------------------
# Perform feature scaling on the training and testing data
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Task 6: Create and Train the SVM Model
# -------------------------------------
# Create an SVM model object
model = svm.SVC(kernel='linear')

# Train the SVM model on the scaled training data
model.fit(X_train_scaled, y_train)

# Task 7: Make Predictions and Evaluate the Model
# -----------------------------------------------
# Make predictions on the scaled testing data
y_pred = model.predict(X_test_scaled)

# Calculate accuracy
accuracy = np.mean(y_pred == y_test)
print("Accuracy:", accuracy)

# Task 8: Visualize the Results
# -----------------------------
# Visualize the decision boundary
def plot_decision_boundary(X, y, model):
    h = 0.02  # step size in the mesh
    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                         np.arange(y_min, y_max, h))
    Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)
    plt.contourf(xx, yy, Z, alpha=0.8)
    plt.scatter(X[:, 0], X[:, 1], c=y, edgecolors='k')
    plt.xlabel('Feature 1')
    plt.ylabel('Feature 2')
    plt.title('SVM Decision Boundary')
    plt.show()

# Plot the decision boundary
plot_decision_boundary(X_test_scaled[:, :2], y_test, model)


Accuracy: 0.9666666666666667


ValueError: ignored

# **Support Vector Machine for Image Classification**
In this example, we'll be using the concepts we learned through the Linear Algebra crash course to solve a binary classification problem. The application we are investigating is determining if a tumor is malignant (cancerous) or benign (non-cancerous) based on features from digitized images of a fine needle aspirate (FNA) that describe the characteristics of the cell nuclei in the image.

## **Importing Python Packages**
The first step is to import your necessary Python packages. For this example, we'll be using scikit-learn and numpy to implement a support vector machine and classify if a tumor is malignant or benign.

In [None]:
import numpy as np # Importing the numpy library as np - this is a common practice for Python coding

from sklearn import datasets # Importing the datasets available from scikit-learn
from sklearn.model_selection import train_test_split, svm, metrics # Importing model functions from scikit-learn

## **Load and Explore the Dataset**

For this example, we'll be loading this pubicly available data from the scikit-learn library. This data was imported from the scikit-learn datasets and is the breast cancer dataset.

In [None]:
cancer = datasets.load_breast_cancer()


### Finding your own dataset
Use the built-in datasets offered by scikit-learn to develop your own question to analyze using support vector machines. Follow the code below to learn how to import these datasets and display descriptive information to choose your favorite dataset. Use this time to get creative!

In [None]:
from sklearn import datasets

# List the available datasets:
dir(datasets)

['__all__',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__getattr__',
 '__loader__',
 '__name__',
 '__package__',
 '__path__',
 '__spec__',
 '_arff_parser',
 '_base',
 '_california_housing',
 '_covtype',
 '_kddcup99',
 '_lfw',
 '_olivetti_faces',
 '_openml',
 '_rcv1',
 '_samples_generator',
 '_species_distributions',
 '_svmlight_format_fast',
 '_svmlight_format_io',
 '_twenty_newsgroups',
 'clear_data_home',
 'dump_svmlight_file',
 'fetch_20newsgroups',
 'fetch_20newsgroups_vectorized',
 'fetch_california_housing',
 'fetch_covtype',
 'fetch_kddcup99',
 'fetch_lfw_pairs',
 'fetch_lfw_people',
 'fetch_olivetti_faces',
 'fetch_openml',
 'fetch_rcv1',
 'fetch_species_distributions',
 'get_data_home',
 'load_breast_cancer',
 'load_diabetes',
 'load_digits',
 'load_files',
 'load_iris',
 'load_linnerud',
 'load_sample_image',
 'load_sample_images',
 'load_svmlight_file',
 'load_svmlight_files',
 'load_wine',
 'make_biclusters',
 'make_blobs',
 'make_checkerboard',
 'make_circl

In [None]:
# Output a discription of the dataset
print(datasets.load_digits().DESCR)

.. _digits_dataset:

Optical recognition of handwritten digits dataset
--------------------------------------------------

**Data Set Characteristics:**

    :Number of Instances: 1797
    :Number of Attributes: 64
    :Attribute Information: 8x8 image of integer pixels in the range 0..16.
    :Missing Attribute Values: None
    :Creator: E. Alpaydin (alpaydin '@' boun.edu.tr)
    :Date: July; 1998

This is a copy of the test set of the UCI ML hand-written digits datasets
https://archive.ics.uci.edu/ml/datasets/Optical+Recognition+of+Handwritten+Digits

The data set contains images of hand-written digits: 10 classes where
each class refers to a digit.

Preprocessing programs made available by NIST were used to extract
normalized bitmaps of handwritten digits from a preprinted form. From a
total of 43 people, 30 contributed to the training set and different 13
to the test set. 32x32 bitmaps are divided into nonoverlapping blocks of
4x4 and the number of on pixels are counted in each blo