<a href="https://colab.research.google.com/github/mrhamedani/Machine-learning-projects-ScikitLearn/blob/main/4_breast_cancer_GrideSearch.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Optimizing SVM with GridSearchCV on the Breast Cancer Dataset
This project demonstrates how to optimize a Support Vector Machine (SVM) model using GridSearchCV on the Breast Cancer Dataset from scikit-learn. The goal is to find the best hyperparameters for the SVM model using cross-validation to improve classification accuracy.

In [37]:
import numpy as np
import pandas as pd
from sklearn.model_selection import cross_validate
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
from sklearn.model_selection import GridSearchCV ,cross_validate
from sklearn import datasets
from sklearn.preprocessing import StandardScaler

In [7]:
x,y = datasets.load_breast_cancer(return_X_y=True) #The dataset is ready for scikit learn
print(x.shape)
print(y.shape)

(569, 30)
(569,)


In [41]:
scaler = StandardScaler()
x = scaler.fit_transform(x) # standardize the data

# cross_validate & k-fold
Here we use the model using the cross_validate method
Also, instead of using train-test-split, we use k-fold, which is a better method and divides the data into k numbers (also k times).

In [43]:
result1 = cross_validate(SVC(),x,y,cv=5,scoring='accuracy') # SVC is the model  #cv is the number of folds #scoring is the metric
print(result1['test_score']) #the accuracy of each fold
print(np.mean(result1['test_score'])) #the average accuracy

[0.97368421 0.95614035 1.         0.96491228 0.97345133]
0.9736376339077782


## GridSearchCV & hayperparameters & kernel
 kernel choice:  
- 'rbf' (default, for data with complex and unknown structure)  
- 'poly' (for data with polynomial relationships)  
- 'linear' (for linearly separable data)  
- 'sigmoid' (for neural network-like behavior)  


hayperparameters:
(Of course, there are other cases that we will examine here)
- 'C' (regularization parameter, controls trade-off between margin and misclassification-soft margin or hard margin)   
- 'degree' (only for 'poly' kernel, defines polynomial degree)
- 'gamma' (kernel coefficient, affects influence of a single data point)

GridSearchCV:
A method like cross_validate with the difference that it selects the best among the available options for hyperparameters.
  




In [None]:
hayperparameters = {'C':[30,60,90,100,120],'degree':[1,2,3,4],'gamma': [0.001, 0.01, 0.1, 1, 10],'kernel': ['rbf', 'poly']}
result2= GridSearchCV(SVC(),hayperparameters,cv=5,verbose=3)

result2.fit(x,y)

In [53]:
print(result2.best_params_)
print(result2.best_score_)

{'C': 30, 'degree': 1, 'gamma': 0.01, 'kernel': 'rbf'}
0.9807017543859649
