# Assignment 20 Solutions

##### 1. What is the underlying concept of Support Vector Machines ?

**Ans** Support Vector Machines (SVMs) are a type of supervised learning algorithm used for classification and regression tasks. The underlying concept of SVMs revolves around finding the optimal hyperplane that best separates data points belonging to different classes in a high-dimensional space.

##### 2. What is the concept of a support vector ?

**Ans** A support vector is a data point that is closest to the decision boundary (hyperplane) between different classes in a support vector machine (SVM). These are the critical data points that help define the position and orientation of the hyperplane.

Support vectors are essential because they directly influence the placement of the decision boundary. In fact, the decision boundary is entirely determined by these support vectors in an SVM. Other data points, which are not support vectors, do not affect the decision boundary and are irrelevant for defining it.

Support vectors are crucial for SVM because they represent the most challenging examples to classify accurately. If any of these support vectors were moved or removed, the position of the decision boundary would likely change. This emphasizes the robustness of SVM models, as they are primarily influenced by the most challenging data points.

##### 3. When using SVMs, why is it necessary to scale the inputs ?

**Ans** Scaling the inputs is necessary when using Support Vector Machines (SVMs) for several reasons:

1. Sensitive to Feature Scales: SVMs are sensitive to the scale of input features. Since the objective of SVM is to find the optimal hyperplane that best separates classes, the scale of features can significantly affect the decision boundary. Features with larger scales can dominate those with smaller scales, leading to a biased model.
2. Euclidean Distance: Many SVM algorithms, especially those based on the kernel trick, rely on measuring distances between data points. If the features are not scaled, features with larger scales will have a larger impact on distance computations, potentially distorting the geometric relationships between data points.
3. Regularization: In SVM, regularization is often applied to control the trade-off between maximizing the margin and minimizing the classification error. The regularization parameter (C) can have different effects on features with different scales if they are not normalized. Scaling the inputs ensures that the regularization parameter has a similar effect on all features.
4. Convergence Speed: Scaling features can also help improve the convergence speed of optimization algorithms used to train SVM models. Features with larger scales may lead to slower convergence or numerical instability during optimization.
5. Kernel Functions: When using kernel functions in SVM, such as the radial basis function (RBF) kernel, scaling is particularly important. The RBF kernel computes the similarity between data points based on their Euclidean distances. If features are not scaled, the notion of similarity between data points may become distorted.

##### 4. When an SVM classifier classifies a case, can it output a confidence score? What about a percentage chance ? 

**Ans** Yes, SVM classifiers can provide a confidence score or a measure of certainty regarding their predictions, but they do not inherently produce probability estimates like some other classifiers such as logistic regression or decision trees. However, there are methods to approximate confidence scores or probabilities from SVM outputs.

Yes, SVM classifiers can provide a confidence score or a measure of certainty regarding their predictions, but they do not inherently produce probability estimates like some other classifiers such as logistic regression or decision trees. However, there are methods to approximate confidence scores or probabilities from SVM outputs.

Here are a couple of common approaches:

1. Distance from Hyperplane: SVM classifiers assign a class label based on which side of the decision boundary (hyperplane) the data point lies. The distance of a data point from the hyperplane can serve as a measure of confidence. Typically, the further away a point is from the decision boundary, the more confident the classifier is in its prediction. However, this distance alone does not directly translate into a percentage chance or probability.
2. Platt Scaling: Platt scaling is a technique used to calibrate the output of SVM classifiers to approximate probabilities. It involves fitting a logistic regression model to the SVM decision values (raw output scores) and using the logistic function to transform these values into probabilities. Platt scaling requires a separate calibration dataset or cross-validation procedure to estimate the parameters of the logistic regression model.

##### 5. Should you train a model on a training set with millions of instances and hundreds of features using the primal or dual form of the SVM problem ?

**Ans** Given the dataset has millions of instances and hundreds of features, the choice between the primal and dual form may depend on the specific characteristics of your dataset and your computational resources.

1. If the dataset is sparse (i.e., most feature values are zero), the primal form may be more efficient due to its memory efficiency and ability to handle sparse data.
2. If the dataset is dense and has a relatively small number of features compared to the number of instances, the dual form might be more appropriate due to its computational efficiency in high-dimensional spaces.

It's also worth considering implementation details and available libraries. Some SVM implementations may handle large-scale datasets more efficiently in one form over the other. Therefore, it's advisable to experiment with both forms and choose the one that provides the best balance of computational efficiency and memory usage for your specific dataset and computational resources.

##### 6. Let's say you've used an RBF kernel to train an SVM classifier, but it appears to underfit the training collection. Is it better to raise or lower (gamma)? What about the letter C ?

**Ans** If your SVM classifier with an RBF kernel underfits the training data, you can experiment with raising gamma and/or lowering C to potentially improve its performance.

However, it's crucial to perform proper validation using a separate validation set or cross-validation to assess the impact of these hyperparameter changes on the model's performance on unseen data and to avoid overfitting.

##### 7. To solve the soft margin linear SVM classifier problem with an off-the-shelf QP solver, how should the QP parameters (H, f, A, and b) be set ?

##### 8. On a linearly separable dataset, train a LinearSVC. Then, using the same dataset, train an SVC and an SGDClassifier. See if you can get them to make a model that is similar to yours ?

In [2]:
from sklearn.datasets import make_classification
from sklearn.svm import LinearSVC, SVC
from sklearn.linear_model import SGDClassifier
from sklearn.preprocessing import StandardScaler

X, y = make_classification(n_samples=1000, n_features=2, n_informative=2, n_redundant=0, n_clusters_per_class=1, random_state=42)

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

linear_svc = LinearSVC(loss='hinge', random_state=42)
linear_svc.fit(X_scaled, y)

svc_linear = SVC(kernel='linear', random_state=42)
svc_linear.fit(X_scaled, y)

sgd_clf = SGDClassifier(loss='hinge', random_state=42)
sgd_clf.fit(X_scaled, y)

print("LinearSVC coefficients:", linear_svc.coef_)
print("LinearSVC intercept:", linear_svc.intercept_)

print("SVC coefficients:", svc_linear.coef_)
print("SVC intercept:", svc_linear.intercept_)

print("SGDClassifier coefficients:", sgd_clf.coef_)
print("SGDClassifier intercept:", sgd_clf.intercept_)


LinearSVC coefficients: [[-0.8458694   2.72214092]]
LinearSVC intercept: [0.68229091]
SVC coefficients: [[-0.85436812  2.74083787]]
SVC intercept: [0.69382653]
SGDClassifier coefficients: [[0.14565935 3.51557756]]
SGDClassifier intercept: [0.93960535]




##### 9. On the MNIST dataset, train an SVM classifier. You'll need to use one-versus-the-rest to assign all 10 digits because SVM classifiers are binary classifiers. To accelerate up the process, you might want to tune the hyperparameters using small validation sets. What level of precision can you achieve ?

In [None]:
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import classification_report

mnist = fetch_openml('mnist_784', version=1, cache=True)

X_train, X_test, y_train, y_test = train_test_split(mnist.data, mnist.target, test_size=0.2, random_state=42)

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

svm_model = SVC(kernel='rbf', decision_function_shape='ovr')

param_grid = {'C': [0.1, 1, 10], 'gamma': [0.01, 0.1, 1]}

grid_search = GridSearchCV(svm_model, param_grid, cv=3, scoring='precision_macro')
grid_search.fit(X_train_scaled, y_train)

best_params = grid_search.best_params_

best_svm_model = SVC(kernel='rbf', decision_function_shape='ovr', C=best_params['C'], gamma=best_params['gamma'])
best_svm_model.fit(X_train_scaled, y_train)

y_pred = best_svm_model.predict(X_test_scaled)
precision = classification_report(y_test, y_pred)

print("Precision:")
print(precision)


  warn(
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


##### 10. On the California housing dataset, train an SVM regressor ?

In [1]:
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVR
from sklearn.metrics import mean_squared_error

# Load the California housing dataset
california_housing = fetch_california_housing()

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(california_housing.data, california_housing.target, test_size=0.2, random_state=42)

# Scale the features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Choose an SVM regressor model
svm_regressor = SVR(kernel='rbf')

# Train the SVM regressor
svm_regressor.fit(X_train_scaled, y_train)

# Make predictions on the test set
y_pred = svm_regressor.predict(X_test_scaled)

# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
print("Mean Squared Error:", mse)


Mean Squared Error: 0.35700264267544685
