In [1]:
#support vector machine is a supervised machine learning algorithm which is used for both classfication and regression task
#it is particularly effective in dealing with complex and high-dimensional dataset
#the fundamental principle of SVM is to find an optimal hyperplane that maximumly separates different classes in the input space

In [3]:
#How SVM works:
#1. SVM requires labelled training data consisting of input features and corresponding class labels
#2. Each data point is represented as feature vector where each feature describes a particular characteristics of the data points
#3. Then the data point should be preprocessed and scaled to ensure the feature are on similar scale typically between 0 and 1
#4. Hyperplane and Margin: SVM aims to find a hyperplane that best separates the different classes in the feature space. In a binary classification problem,
# the hyperplane is a line in 2d space, or a hyperplane in a higher dimensional space.
#5. SVM seeks to maximize the margin which is the distance betweeen the hyperplane and the nearest data points from each class. The points on the margin
# are known as support vectors as they play a crucial role  in defining the decision boundary.

#Linear Kernell: The linear SVM finds a linear hyperplane that separates the classes. The goal is to find the hyperplane that maximize the margin while 
# minimizing the missed classification of training examples. This can be formulated as an optimization problem with the objective of minimizing the weights
# of the hyperplane subject to the constraint that all training examples lie on the correct side of the hyperplane.

#Non-Linear SVM: IN cases where the data is non-linearly separable, SVM uses a technique called the Kernel-trick. The Kernel-trick maps the original input
#space into a higher dimensional space where the data points can be linearly separable.

#Training: SVM training involves finding the optimum hyperplane for decision boundary that separates the classes. The optimization problem is typically
# solved using methods such as quadratic programming or sequential minimal optimization. The process involves solving for the weights of the hyperplane and
# the biased term which defines the decision the decision boundary. The objective is to minimize the regularization terms while ensuring that the training examples
# correctly classified.

#Prediction: Once the SVM model is trained, it can be used to predict the class label of new unseen data points. This algorithm computes the distance from
# the test points to the decision boundary and the predicted class label is determined based on width size of the decision boundary the point lies.
# The decision function can also provide a confidence score and the probability estimates for the prediction.

#Advantage of SVM: 
#1. Effective for high-dimensional space
#2. it is robust against overfitting due to the marigin-maximization principle
#3. it is versatile through the use of different kernel-function 
#4. it can hanlde both linear and non-linear classification task

#Limitations:
#1. it is Computationally expensive for large datsets
#2.  it requires proper selection of hyper parameters such as the regularization paramaeter and the kernel parameter
#3. it is difficult to interpret the large model compared to simpler algorithm like logistic regression

In [5]:
#How i select which kernel i have to use for which data in SVM algorithm?
#the choice of kernel depends on the data we arevtrying to solve


#1. Linear Kernal: it is suitable for linearly separable data. it works well when there is a clear linear boundary between classes
#2. Polynomial Lernel: It maps the data into a higher dimensional space using polynomial functions. it is useful when  the decision boundary is curved or
# has higher degree complexity. The degree of the polynomial which determines the complexity can be specified 
#3. RBF kernel(Gaussian-Kernel): The gaussian-kernel max the data into an infinity dimensional space. it is suitable for non-linearly separable data and 
# works well when the decision boundary is complex. it is a popular choice due tpo its flexibility and ability to capture instigates patterns

#4. Sigmoid Kernel: this kernel maps the data into a higher dimensional space using the sigmoid function. it is useful when the decision  boundary is S shaped.


In [9]:
from sklearn import svm
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

In [15]:
#Generate synthetic classification data
X, y = make_classification(n_samples=100, n_features=2, n_informative=2,
                          n_redundant=0, random_state=42)

#make_classification: This uses make_classification from sklearn.datasets, which is a synthetic dataset generator.
# Purpose of make_classification
# To quickly create artificial datasets for testing machine learning algorithms.
# Instead of using a real-world dataset (which may be large, noisy, or hard to get), you can generate clean, controlled data.
# Great for demonstrations, debugging, and visualization.

In [17]:
#Splitting data into training and test sets
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.2, random_state=42)

In [21]:
#Linear kernel
linear_svc=svm.SVC(kernel="linear")
linear_svc.fit(X_train,y_train)
linear_predictions=linear_svc.predict(X_test)
linear_accuracy=accuracy_score(y_test,linear_predictions)
print("linear kernel accuracy:",linear_accuracy)

linear kernel accuracy: 0.95


In [25]:
poly_svc=svm.SVC(kernel="poly",degree=3)  #degree=3 ==> x**0 + x**1 + x**2 ==> 1+x+x**2
poly_svc.fit(X_train,y_train)
poly_predictions=poly_svc.predict(X_test)
poly_accuracy=accuracy_score(y_test,poly_predictions)
print("polynomial kernel accuracy:",poly_accuracy)

polynomial kernel accuracy: 0.9
