#### authors: Rafael Dousse, Eva Ray, Massimo Stefani

# Ex2 - Review questions

### a) What are the two fundamental ideas a SVM are built on ? Summarize them with your own words.

- **Maximizing the margin**: The SVM tries to find the hyperplane that separates the classes with the largest possible margin. This means that it looks for the decision boundary that is as far away as possible from the nearest data points of any class, which helps improve the model's generalization to unseen data. Indeed, if we don't take into account the margin, like in logistic regression, the decision boundary will separate the classes but maybe not in the "best way" to generalize.
- **Kernel trick**: Sometimes, the data is not linearly separable. In these case, SVMs can use the kernel trick to transform the data into a higher-dimensional space where it becomes linearly separable. For example, if the decision boundary is circular in 2D, we can map the data to a 3D space where the decision boundary becomes a plane. The kernel trick allows to perform this transformation without explicitly computing the coordinates in the higher-dimensional space, which makes it less computationally expensive.

### b) With the hinge loss, training points can fall into three cases. Re-explain these cases with your own words.

1. **Correctly classified and outside the margin**: These are the points that are on the correct side of the decision boundary and are also beyond the margin. For these points, the hinge loss is zero because they are correctly classified with a good margin, which means it is safe from misclassification.
2. **Correctly classified but inside the margin**: These points are on the correct side of the decision boundary but are within the margin. For these points, the hinge loss is positive because they are correctly classified but too close to the decision boundary, which means it would be better to push them further away from the boundary to improve the model's robustness.
3. **Misclassified points**: These points are on the wrong side of the decision boundary. For these points, the hinge loss is also positive, and it increases as the point gets further away from the correct side of the decision boundary. These points contribute to the loss and indicate that the model needs to adjust the decision boundary to correctly classify them.

### c) What are the two implementations of SVMs available in SciKit Learn ? Which one would you take if you have a system that needs to incorporate incremental learning ?

The two implementations of SVMs available in SciKit Learn are:
- **`svm.SVC`**: Based on numerical procedures. This implementation is suitable for small to medium-sized datasets and provides a wide range of kernel options. It is not designed for incremental learning, as it requires retraining the model from scratch when new data is added.
- **``linear_model.SGDClassifier()``**: Based on gradient descent. This implementation is more scalable and can handle larger datasets. It is also more suitable for incremental learning, as it allows for updating the model with new data without needing to retrain from scratch.

Thus, if you have a system that needs to incorporate incremental learning, you would choose the `linear_model.SGDClassifier()` implementation.

### d) A SVM can classify between 2 classes. Cite and explain in your own words the 2 strategies we have to build a multi-class (with K classes) system with SVM ?

- **One vs All**: For each class, we train a separate SVM classifier that distinguishes that class from all the other classes combined. This means that for K classes, we will have K different classifiers. During prediction, we run the input through all K classifiers and choose the class corresponding to the classifier that gives the highest score.
- **One vs One**: We train a separate SVM classifier for every pair of classes. During prediction, we run the input through all the classifiers, that predict which of the two classes the input belongs to. Each classifier votes for one class, and the class with the most votes is chosen as the final prediction.

### e) Are the strategies of point d) equal in terms of cpu ? (elaborate your answer considering training and testing times)

No, the strategies of point d) are not equal in terms of CPU usage, both for training and testing times.

- **Training time**: The One vs All strategy requires training K classifiers, while the One vs One strategy requires training K(K-1)/2 classifiers. Therefore, the One vs One strategy generally requires more training time, especially as the number of classes increases.
- **Testing time**: During testing, the One vs All strategy requires running the input through K classifiers, while the One vs One strategy requires running the input through K(K-1)/2 classifiers. Thus, the One vs One strategy also tends to have a higher testing time compared to the One vs All strategy.

### f) Describe a machine learning task for which SVM would be a better choice than any of the algorithms previously studied. Explain why.

Amongst other tasks, SVMs are good at tasks linked to bioinformatics. For example, SVMs perform very well in cancer genomics, where the goal is to classify different types of cancer based on gene expression data. Indeed, SVMs can handle high-dimensional, low-sample-size data well and generalize robustly. SVMs maximize the margin between classes rather than fitting every point, which reduces overfitting when the number of features (genes) far exceeds the number of samples. In practice, combining SVMs with feature selection identifies the most discriminant genes and improves performance compared with simpler models. As we saw before, SVMs are adaptable to multiclass problems via one-vs-rest or one-vs-one strategies and can be applied across multiple biological data types, and advanced variants enable multi-omic integration. Empirical studies report very high accuracy and low error rates, demonstrating SVMs’ practical effectiveness for diagnostic and subtype classification in cancer.

Source: Huang, S., Cai, N., Pacheco, P. P., Narandes, S., Wang, Y., & Xu, W. (2017). *Applications of Support Vector Machine (SVM) learning in cancer genomics*. **Cancer Genomics & Proteomics, 15**(1), 41–51. https://pmc.ncbi.nlm.nih.gov/articles/PMC5822181/#sec2
