In [1]:
%matplotlib widget
import pandas as pd
import numpy as np
from sklearn import svm,datasets
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt

# Kernel

In machine learning, the kernel trick is a method, that is a class of pattern analysis and
also provides better boundary classification by giving better visualizations.

There are some data in lower-dimensional space, where the classification is linear.
Kernel tricks take those data and project it into the higher dimensional space to get a nonlinear classification.
Here it adds some more new columns, it does not classify anything. 
It’s just for getting good accuracy with better explanations.

- In a 3dimensional space, the classifier is 2dimension. So, in a current dimension, 
we cannot use any classification or logistic regression. Here we can do one thing 
by projecting the data from lower dimensional space to higher-dimensional space,
we can solve the existing issue i.e. the use of logistic regression. 
We can use logistic regression after applying kernel tricks.
This is the advantage of the kernel tricks method.

- Sometimes in kernel tricks, it is very difficult to find the perfect kernel.

# C parameter

### What is the Significance of C value in Support Vector Machine?




As we know, in Support Vector Machine we always look for 2 things:

    Setting a larger margin
    lowering misclassification rate(how much a model misqualifies a data)

Now the problem is above said 2 things are kind of contradictory.
If we increase margin, we will end up getting a high misclassfication rate on the other 
hand if we decrease a margin, we will end up getting a lower misclassification rate.

You must be thinking then why we want a larger margin, our priority 
should be getting a lower misclassfication rate then let me tell you 
above quoted things are for training dataset. Lower misclassification 
on training dataset doesn’t mean lower misclassification on validation/testing data.
To get a better a result of testing data, SVM looks for a higher margin.

So I finally confused you in how to set up this 2 contradictory things now? 
The answer is parameter C.

- Paramerter C:

- Large Value of parameter C => small margin

- Small Value of paramerter C => Large margin


### How should you choose the value of C?

There is no rule of thumb to choose a C value, it totally depends 
on your testing data. The only option I see is trying bunch of 
different values and choose the value which gives you lowest 
misclassification rate on testing data. I would suggest you to use
gridsearchCV, in which you can directly give a list of different 
values parameter and it will tell you which value is best.

### C controls the cost of misclassification on the training data.

The goal of SVM is to find a hyperplane that would leave the widest
possible "cushion" between input points from two classes. 
There is a tradeoff between "narrow cushion, little / no mistakes" 
and "wide cushion, quite a few mistakes".

Learning algorithms are about generalizing from input data, not 
explaining it. This is not to mention that, "thanks to" the curse of 
dimensionality, in large number of dimensions training data can often
be explained quite well by over fitting the model.

Therefore, often times it is desirable to specifically allow some 
training points to be misclassified in order to have an
"overall better" position of the separating hyperplane.

    Mathematically, "better" translates to "optimizing cost function valuing mistakes with certain coefficient".
    Intuitively, "better" implies "wider cushion, a few mistakes allowed".
    Practically "better" is to be understood as "performs well on real data". 


Small C makes the cost of misclassificaiton low ("soft margin"), 
thus allowing more of them for the sake of wider "cushion".

Large C makes the cost of misclassification high ('hard margin"),
thus forcing the algorithm to explain the input data stricter and potentially overfit.

The goal is to find the balance between "not too strict" and "not too loose". 
Cross-validation and resampling, along with grid search, 
are good ways to finding the best C.

C is a parameter of the SVC learner and is the penalty for misclassifying a data point.
                                                  
When C is small, the classi er is okay with misclassi ed data points (high bias, low
variance). When C is large, the classi er is heavily penalized for misclassi ed data and
therefore bends over backwards avoid any misclassified data points (low bias, high
variance).

# gamma

## Gamma
-In the four charts below, we apply the same SVC-RBF classi er to the same data while
holding C constant. The only difference between each chart is that each time we will
increase the value of gamma . By doing so, we can visually see the effect of gamma on the
decision boundary.
                                                  
## Gamma = 0.01
-In the case of our SVC classi er and data, when using a low gamma like 0.01, the decision
boundary is not very ‘curvy’, rather it is just one big sweeping arch

In [9]:
iris=datasets.load_iris()# we take this default data set

#we want to prepare dataset

#[:,:2] take two colums of the iris.data[] array  (data': array([[5.1, 3.5, 1.4, 0.2],) array give columsname to 
# array featurename [:2] default name 

   #below show the array in iris data
    
 #'feature_names': ['sepal length (cm)',
  #'sepal width (cm)',
 # 'petal length (cm)',
  #'petal width (cm)'],
    
x=pd.DataFrame(iris.data[:,:2],columns=iris.feature_names[:2])
#we take y as a target its always contain default array target in iris

#'target': array([0, 0,]

#so we make a data frame of target and give name species

y=pd.DataFrame(iris.target,columns=["species"],dtype='category')

## SUPPORT VECTOR MACHINES(SVM): 

A support vector is used to compute the margin between 
two sides of data and maximize the margins. It draws a 
line to maintain the same exact distance from both of 
the sides. Which is also known for the best margin.

- Every point in the margin is called a Vector.

### Example: 
- Classification of Male & female diagram, we use SVM for better understanding.

- After applying the kernel tricks method, we can do SVM and logistic regression.

There are different methods & algorithms for every dataset. 
“One size does not fit for all” ->It means there is no single algorithm 
that will be for all universe. After all, we have to use different 
algorithms according to our requirements. It’s all a matter of choice.


In [3]:
#kernel is used here as radio baise function rbf
svc=svm.SVC(kernel="rbf",C=2,gamma="auto")

In [10]:
#kernel is used here as linear gamma = 0.01
svc=svm.SVC(kernel="linear",C=1,gamma=0.01)

In [18]:
svc=svm.SVC(kernel="rbf",C=10,gamma=1)

In [11]:
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.3,random_state=0)



In [12]:
svc.fit(x_train,y_train)

  return f(**kwargs)


SVC(C=1, gamma=0.01, kernel='linear')

In [13]:
#create a mesh to plot in 
x_min,x_max = x.iloc[:,0].min() - 1, x.iloc[:,0].max() + 1
y_min,y_max = x.iloc[:,1].min() - 1, x.iloc[:,1].max() + 1
h = (x_max/x_min)/100


In [14]:
xx,yy=np.meshgrid(np.arange(x_min,x_max,h),np.arange(y_min,y_max,h))
z = svc.predict(np.c_[xx.ravel(),yy.ravel()])
z = z.reshape(xx.shape)


In [15]:
fig=plt.figure(figsize=(8,8))
ax = fig.add_subplot(1,1,1)
ax.pcolormesh(xx,yy,z,alpha=0.8)
ax.scatter(x.iloc[:,0], x.iloc[:,1],c=y['species'].cat.codes,cmap=plt.cm.Paired)
ax.set_xlabel(iris.feature_names[0])
ax.set_ylabel(iris.feature_names[1])
plt.show()

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

  ax.pcolormesh(xx,yy,z,alpha=0.8)
