label: colab_R_link
https://colab.research.google.com/github/slds-lmu/lecture_sl/blob/main/exercises/svm-quarto/inserted/sol_svm_1_R.ipynb

label: colab_python_link
https://colab.research.google.com/github/slds-lmu/lecture_sl/blob/main/exercises/svm-quarto/inserted/sol_svm_1_py.ipynb

label: exercise
# Exercise
Write your own stochastic subgradient descent routine to solve the soft-margin SVM in the primal formulation.

Hints:

- Use the regularized-empirical-risk-minimization formulation, i.e., an optimization criterion without constraints.
- No kernels, just a linear SVM.
- Compare your implementation with an existing implementation (e.g., `kernlab` in R or `sklearn.svm.SVC` in Python). Are your results similar? Note that you might have to switch off the automatic data scaling in the already existing implementation.

label: import_and_globals
## Imports and global variables

label: algorithm_explanation

# PEGASOS Algorithm Explanation

The PEGASOS algorithm is a stochastic gradient descent method for training linear SVMs. It works by:

1. **Random Sampling**: At each iteration, randomly select one training example
2. **Weight Decay**: Apply regularization by shrinking the weight vector: `θ ← (1 - λα)θ`
3. **Margin Check**: If the selected example is within the margin (i.e., `y_i * f(x_i) < 1`), update the weights: `θ ← θ + α * y_i * x_i`
4. **Repeat**: Continue until convergence or maximum iterations

More details can be found in the [i2ml chapter on linear SVMs](https://slds-lmu.github.io/i2ml/chapters/16_linear_svm/16-05-optimization)


label: data_setup

## Data Generation and Setup

For R we'll use the `mlbench.twonorm` dataset, which generates a two-class problem with two features. This is a classic benchmark dataset for binary classification.

For Python, we can use `sklearn.datasets.make_classification` to generate a similar dataset.

label: visual_inspection
# Visual Inspection of the Data

We can see that the data is mostly linearly separable, but there are some points that are close to the decision boundary.

label: train_pegasos
# Training the PEGASOS Algorithm

label: decision_boundaries
# Decision Boundaries Visualization

Now we'll use the trained model to visualize the decision boundaries. Additionally, for comparison we will also fit a Logistic Regression model and visualize its decision boundary as well.

label: eval_logistic_regression

We can see that the decision boundaries are quite similar, let's also check the predictive performances of the logistic regression and Pegasos.

label: evaluation

# Evaluating the PEGASOS Model

We'll compute the accuracy and the confusion matrix. In practice, other metrics should be also considered, e.g. precision, recall, F1-score (threshold dependent); ROC AUC, PR AUC (threshold independent).

label: kernellab_or_sklearn
# Using `kernlab` (R) or `sklearn.svm.SVC` (Python) for Comparison

Accuracy is identical

label: comparison_coefs
## Comparison of Model Coefficients

We can see that both the predictions and coefficients are almost identical.

label: emp_risk_comparions
## Comparison of empirical risks

label: plotting_everything
# Plotting

label: complete_visualization
## Complete visualization

- We can see that all the models produce almost identical decision boundary
- The margins are plotted for the `kernlab`'s (R) / `sklearn.svm.SVC` (Python) model