# Support Vector Machine

Support Vector Machine (SVM) is a powerful supervised learning algorithm used for classification and regression tasks. Its primary objective is to find the optimal hyperplane that best separates data points into different classes while maximizing the margin between them. SVMs are effective in high-dimensional spaces, suitable for both linear and nonlinear data through kernel functions. By selecting support vectors, SVMs are robust to overfitting and generalize well with small datasets. This algorithm is widely used in various fields like image recognition, bioinformatics, and text classification due to its versatility and ability to handle complex datasets.

## importing neccsary files

In [1]:
from sklearn import datasets
cancer = datasets.load_breast_cancer()

## intoduction to problem
The Scikit-learn Breast Cancer dataset is a benchmark dataset widely used in machine learning for binary classification tasks. It comprises features computed from digitized images of breast cancer biopsies and aims to predict whether a tumor is malignant or benign. With 30 features, including texture, perimeter, and smoothness, it provides a rich source of information for model training and evaluation. This dataset is invaluable for testing the performance of classification algorithms, enabling researchers to develop accurate models for breast cancer diagnosis. Its availability within Scikit-learn facilitates easy access and integration into machine learning workflows.

In [2]:
print("features: ", cancer.feature_names)
print("lables: ", cancer.target_names)

features:  ['mean radius' 'mean texture' 'mean perimeter' 'mean area'
 'mean smoothness' 'mean compactness' 'mean concavity'
 'mean concave points' 'mean symmetry' 'mean fractal dimension'
 'radius error' 'texture error' 'perimeter error' 'area error'
 'smoothness error' 'compactness error' 'concavity error'
 'concave points error' 'symmetry error' 'fractal dimension error'
 'worst radius' 'worst texture' 'worst perimeter' 'worst area'
 'worst smoothness' 'worst compactness' 'worst concavity'
 'worst concave points' 'worst symmetry' 'worst fractal dimension']
lables:  ['malignant' 'benign']


In [3]:
print(f"Shape of the dataset: {cancer.data.shape}")

Shape of the dataset: (569, 30)


In [4]:
print(cancer.data[0:5])

[[1.799e+01 1.038e+01 1.228e+02 1.001e+03 1.184e-01 2.776e-01 3.001e-01
  1.471e-01 2.419e-01 7.871e-02 1.095e+00 9.053e-01 8.589e+00 1.534e+02
  6.399e-03 4.904e-02 5.373e-02 1.587e-02 3.003e-02 6.193e-03 2.538e+01
  1.733e+01 1.846e+02 2.019e+03 1.622e-01 6.656e-01 7.119e-01 2.654e-01
  4.601e-01 1.189e-01]
 [2.057e+01 1.777e+01 1.329e+02 1.326e+03 8.474e-02 7.864e-02 8.690e-02
  7.017e-02 1.812e-01 5.667e-02 5.435e-01 7.339e-01 3.398e+00 7.408e+01
  5.225e-03 1.308e-02 1.860e-02 1.340e-02 1.389e-02 3.532e-03 2.499e+01
  2.341e+01 1.588e+02 1.956e+03 1.238e-01 1.866e-01 2.416e-01 1.860e-01
  2.750e-01 8.902e-02]
 [1.969e+01 2.125e+01 1.300e+02 1.203e+03 1.096e-01 1.599e-01 1.974e-01
  1.279e-01 2.069e-01 5.999e-02 7.456e-01 7.869e-01 4.585e+00 9.403e+01
  6.150e-03 4.006e-02 3.832e-02 2.058e-02 2.250e-02 4.571e-03 2.357e+01
  2.553e+01 1.525e+02 1.709e+03 1.444e-01 4.245e-01 4.504e-01 2.430e-01
  3.613e-01 8.758e-02]
 [1.142e+01 2.038e+01 7.758e+01 3.861e+02 1.425e-01 2.839e-01 2.414

In [5]:
print(cancer.target) # 0 = magligant, 1 = benign

[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 1 0 0 0 0 0 0 0 0 1 0 1 1 1 1 1 0 0 1 0 0 1 1 1 1 0 1 0 0 1 1 1 1 0 1 0 0
 1 0 1 0 0 1 1 1 0 0 1 0 0 0 1 1 1 0 1 1 0 0 1 1 1 0 0 1 1 1 1 0 1 1 0 1 1
 1 1 1 1 1 1 0 0 0 1 0 0 1 1 1 0 0 1 0 1 0 0 1 0 0 1 1 0 1 1 0 1 1 1 1 0 1
 1 1 1 1 1 1 1 1 0 1 1 1 1 0 0 1 0 1 1 0 0 1 1 0 0 1 1 1 1 0 1 1 0 0 0 1 0
 1 0 1 1 1 0 1 1 0 0 1 0 0 0 0 1 0 0 0 1 0 1 0 1 1 0 1 0 0 0 0 1 1 0 0 1 1
 1 0 1 1 1 1 1 0 0 1 1 0 1 1 0 0 1 0 1 1 1 1 0 1 1 1 1 1 0 1 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 1 1 1 1 1 1 0 1 0 1 1 0 1 1 0 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1
 1 0 1 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 0 1 0 1 1 1 1 0 0 0 1 1
 1 1 0 1 0 1 0 1 1 1 0 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 0 0 1 0 0
 0 1 0 0 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 0 1 1 0 0 1 1 1 1 1 1 0 1 1 1 1 1 1
 1 0 1 1 1 1 1 0 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 0 1 0 1 1 1 1 1 0 1 1
 0 1 0 1 1 0 1 0 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 0 1
 1 1 1 1 1 1 0 1 0 1 1 0 

## dividing train and test split 

In [6]:
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(cancer.data, cancer.target, test_size = 0.3, random_state = 3 )

In [7]:
print(f"Shape of x_train: {x_train.shape}")
print(f"Shape of y_train: {y_train.shape}")
print(f"Shape of x_test:  {x_test.shape}")
print(f"Shape of y_test:  {y_test.shape}")

Shape of x_train: (398, 30)
Shape of y_train: (398,)
Shape of x_test:  (171, 30)
Shape of y_test:  (171,)


In [8]:
from sklearn import svm
model= svm.SVC(kernel="linear")
model.fit(x_train, y_train)

In [9]:
import pandas as pd
y_pred = model.predict(x_test)
y_pred = pd.DataFrame(y_pred)

In [10]:
y_pred

Unnamed: 0,0
0,1
1,1
2,1
3,1
4,0
...,...
166,1
167,0
168,1
169,1


In [11]:
from sklearn import metrics
metrics.confusion_matrix(y_test, y_pred)

array([[ 58,   4],
       [  4, 105]], dtype=int64)

## conclusion
The Scikit-learn Breast Cancer dataset contains features extracted from digitized images of breast cancer biopsies, aiming to classify tumors as malignant or benign. In the context of a confusion matrix with the outcome array [[58, 4], [4, 105]], it suggests that out of 171 instances:

58 instances were correctly classified as malignant (true positives).
4 instances were incorrectly classified as malignant when they were actually benign (false positives).
105 instances were correctly classified as benign (true negatives).
4 instances were incorrectly classified as benign when they were actually malignant (false negatives).
This confusion matrix provides insights into the performance of a classification model on the dataset.

In [12]:
print(f"accuracy of the model {metrics.accuracy_score(y_test, y_pred)}")

accuracy of the model 0.9532163742690059


In [13]:
print(metrics.classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       0.94      0.94      0.94        62
           1       0.96      0.96      0.96       109

    accuracy                           0.95       171
   macro avg       0.95      0.95      0.95       171
weighted avg       0.95      0.95      0.95       171



The classification report provides a detailed summary of the model's performance on the Scikit-learn Breast Cancer dataset:

Precision:

For class 0 (malignant tumors), precision is 0.94, indicating that 94% of instances classified as malignant were correctly classified.
For class 1 (benign tumors), precision is 0.96, showing that 96% of instances classified as benign were correctly classified.
Recall:

For class 0, recall is 0.94, indicating that 94% of actual malignant instances were correctly classified.
For class 1, recall is 0.96, indicating that 96% of actual benign instances were correctly classified.
F1-score:

For class 0, the F1-score is 0.94, representing the harmonic mean of precision and recall for malignant tumors.
For class 1, the F1-score is 0.96, representing the harmonic mean of precision and recall for benign tumors.
Support:

There are 62 instances of malignant tumors and 109 instances of benign tumors in the dataset.
Accuracy:

Overall accuracy is 0.95, indicating that 95% of all instances were correctly classified.
The macro average and weighted average provide summary statistics across classes, with both averaging at 0.95 for precision, recall, and F1-score. This report offers comprehensive insights into the model's performance, highlighting its effectiveness in accurately classifying breast cancer tumors.