### Load Dataset & imports

In [105]:
import numpy as np
import sklearn
from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn.neural_network import MLPClassifier
from sklearn.ensemble import AdaBoostClassifier
from sklearn.metrics import accuracy_score


# Charger les données
X_train = np.load("../data/classification/X_train.npy")
y_train = np.load("../data/classification/y_train.npy")
X_test = np.load('../data/classification/X_test.npy')
y_test = np.load('../data/classification/y_test.npy')

## Gridsearch for each model

### Logistic Regression

During the grid search process, we encountered several warnings indicating that certain combinations of `penalty` and `solver` are incompatible. These configurations were automatically skipped by scikit-learn, which is expected behavior. Despite testing various settings (including different regularization strengths and penalties such as `"l1"`, `"l2"`, and `"elasticnet"`), the best test accuracy we were able to achieve peaked at **0.746**. This performance remained significantly below the target of 0.85, suggesting that logistic regression may not be well suited for this particular dataset, likely due to its linear decision boundary limitations.


In [158]:
# Model parameters
pipe = Pipeline([
    ('scaler', StandardScaler()),
    ('clf', LogisticRegression(max_iter=5000))
])
param_grid = {"clf__C": [0.01, 0.05, 0.1, 0.2 ,0.4, 0.8, 1, 2, 5, 10], "clf__penalty": ["l2", "l1", "elasticnet"], "clf__solver": ["lbfgs", "liblinear", "saga", "newton-cg", "newton-cholesky", "sag", "saga"]}

# Execute GridSearchCV
grid = GridSearchCV(pipe, param_grid, cv=5, scoring='accuracy', n_jobs=-1)
grid.fit(X_train, y_train)
print(f"Logistic Regression:\n  Best CV Accuracy: {grid.best_score_:.4f}")
print(f"  Best Params: {grid.best_params_}")
best_model = grid.best_estimator_

# Evaluate on test set
y_pred = best_model.predict(X_test)
test_accuracy = accuracy_score(y_test, y_pred)

print(f"  Test set accuracy: {test_accuracy}\n")
    

Logistic Regression:
  Best CV Accuracy: 0.7165
  Best Params: {'clf__C': 0.05, 'clf__penalty': 'l1', 'clf__solver': 'saga'}
  Test set accuracy: 0.746



550 fits failed out of a total of 1050.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_score='raise'.

Below are more details about the failures:
--------------------------------------------------------------------------------
50 fits failed with the following error:
Traceback (most recent call last):
  File "/Users/leo/Documents/s8/ftml_2025/.venv/lib/python3.10/site-packages/sklearn/model_selection/_validation.py", line 859, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "/Users/leo/Documents/s8/ftml_2025/.venv/lib/python3.10/site-packages/sklearn/base.py", line 1363, in wrapper
    return fit_method(estimator, *args, **kwargs)
  File "/Users/leo/Documents/s8/ftml_2025/.venv/lib/python3.10/site-packages/sklearn/pipeline.py", line 661, in fit
    self._final_estimator.fit(Xt, y, **last_step_params["fit"])
  File "/Users/leo/Documents/s8/ftml_202

### SVC

We initially explored various kernels using a basic set of hyperparameters and observed that the most performant one was `"rbf"`, reaching a test accuracy of approximately **0.80**. However, further improvements with this kernel were limited, as `"rbf"` does not leverage additional tunable parameters like `coef0` or `degree`. While we could have attempted finer tuning of `C` or `gamma`, the performance plateau suggested diminishing returns.

We therefore decided to shift focus to the `"poly"` kernel, which was the second-best performer in early experiments. This choice allowed us to explore additional hyperparameters such as `coef0` and `degree`, which are exclusive to the polynomial kernel. Interestingly, we found that the optimal value for `degree` consistently remained at **3**, the default, so we opted not to further tune this parameter.

In our refined grid search, we observed that the best models systematically preferred `gamma="scale"` over numeric values, allowing us to reduce the search space and computational cost. Additionally, the most promising results were obtained with `C` values around **3** and `coef0` values near **0.03**.

One of the most impactful discoveries was that **removing the `StandardScaler`** from the pipeline significantly improved performance. This change alone boosted our test accuracy from **0.822** to a peak of **0.8895**, surpassing the 0.85 target and confirming the sensitivity of SVMs to input scaling when using certain kernels.


In [None]:
# Model parameters

# First step (all kernels)
# pipe = Pipeline([
#     ('scaler', StandardScaler()),
#     ('clf', SVC())
# ])
# param_grid = {"clf__C": [0.05, 0.1, 0.5, 1.0, 3.0, 5.0, 10.0], "clf__kernel": ["poly", "rbf", "linear", "sigmoid"], "clf__gamma": ["scale", "auto", 0, 0.5, 1, 5, 10]}

# Focus on "poly" kernel
pipe = Pipeline([
    ('clf', SVC())
])
param_grid = {"clf__C": [2.3, 2.5, 3.0, 3.5, 3.7, 4.0], "clf__kernel": ["poly"], "clf__gamma": ["scale", "auto"], "clf__coef0": [0.00, 0.005, 0.01, 0.02, 0.03, 0.05, 0.07]}


# Execute GridSearchCV
grid = GridSearchCV(pipe, param_grid, cv=7, scoring='accuracy', n_jobs=-1, verbose=1)
grid.fit(X_train, y_train)

# Retrieve top 5 models
cv_results = grid.cv_results_
top_5_indices = np.argsort(cv_results['mean_test_score'])[::-1][:5]

print("=== Top 5 models through cross validation ===\n")

for rank, idx in enumerate(top_5_indices):
    params = cv_results['params'][idx]
    clean_params = {k.replace("clf__", ""): v for k, v in params.items()}
    print(f"Modèle #{rank+1} - Params: {params}")
    
    # Create and fit the model with the best parameters
    model = SVC(**clean_params)
    model.fit(X_train, y_train)
    
    # Predict and evaluate
    y_pred = model.predict(X_test)
    test_acc = accuracy_score(y_test, y_pred)
    mean_cv_score = cv_results['mean_test_score'][idx]
    
    print(f"           CV Accuracy:    {mean_cv_score:.4f}")
    print(f"           Test Accuracy: {test_acc:.4f}\n")


Fitting 7 folds for each of 84 candidates, totalling 588 fits
=== Top 5 models through cross validation ===

Modèle #1 - Params: {'clf__C': 3.5, 'clf__coef0': 0.03, 'clf__gamma': 'scale', 'clf__kernel': 'poly'}
           CV Accuracy:    0.7990
           Test Accuracy: 0.8895

Modèle #2 - Params: {'clf__C': 2.5, 'clf__coef0': 0.03, 'clf__gamma': 'scale', 'clf__kernel': 'poly'}
           CV Accuracy:    0.7975
           Test Accuracy: 0.8710

Modèle #3 - Params: {'clf__C': 3.7, 'clf__coef0': 0.03, 'clf__gamma': 'scale', 'clf__kernel': 'poly'}
           CV Accuracy:    0.7975
           Test Accuracy: 0.8870

Modèle #4 - Params: {'clf__C': 3.0, 'clf__coef0': 0.05, 'clf__gamma': 'scale', 'clf__kernel': 'poly'}
           CV Accuracy:    0.7965
           Test Accuracy: 0.8735

Modèle #5 - Params: {'clf__C': 2.3, 'clf__coef0': 0.03, 'clf__gamma': 'scale', 'clf__kernel': 'poly'}
           CV Accuracy:    0.7955
           Test Accuracy: 0.8690



### KNeighbors

We explored all compatible distance `metrics` supported by `KNeighborsClassifier` without encountering any warnings. Throughout the grid search, we observed that the test accuracy improved steadily as the number of neighbors increased. The best performance was achieved with approximately **30 neighbors**, yielding a test accuracy of around **0.786**. This suggests that more global, smoothed decision boundaries were more effective on this dataset than local ones defined by fewer neighbors.


In [160]:
# Model parameters
pipe = Pipeline([
    ('scaler', StandardScaler()),
    ('clf', KNeighborsClassifier())
])
param_grid = {"clf__n_neighbors": [5, 15, 25, 27, 29, 30, 35, 50], "clf__weights": ["uniform", "distance"], "clf__metric": ["euclidean", "manhattan", "cityblock", "l1", "l2"], "clf__algorithm": ["auto", "ball_tree", "kd_tree", "brute"]}

# Executer GridSearchCV
grid = GridSearchCV(pipe, param_grid, cv=7, scoring='accuracy', n_jobs=-1, verbose=3)
grid.fit(X_train, y_train)
print(f"KNeighbors:\n  Best CV Accuracy: {grid.best_score_:.4f}")
print(f"  Best Params: {grid.best_params_}")
best_model = grid.best_estimator_

# Evaluation on the test set
y_pred = best_model.predict(X_test)
test_accuracy = accuracy_score(y_test, y_pred)

print(f"  Test set accuracy: {test_accuracy}\n")


Fitting 7 folds for each of 320 candidates, totalling 2240 fits
[CV 1/7] END clf__algorithm=auto, clf__metric=euclidean, clf__n_neighbors=5, clf__weights=uniform;, score=0.738 total time=   0.0s
[CV 2/7] END clf__algorithm=auto, clf__metric=euclidean, clf__n_neighbors=5, clf__weights=uniform;, score=0.710 total time=   0.0s
[CV 6/7] END clf__algorithm=auto, clf__metric=euclidean, clf__n_neighbors=5, clf__weights=uniform;, score=0.705 total time=   0.0s
[CV 4/7] END clf__algorithm=auto, clf__metric=euclidean, clf__n_neighbors=5, clf__weights=uniform;, score=0.689 total time=   0.0s
[CV 3/7] END clf__algorithm=auto, clf__metric=euclidean, clf__n_neighbors=5, clf__weights=uniform;, score=0.710 total time=   0.0s
[CV 7/7] END clf__algorithm=auto, clf__metric=euclidean, clf__n_neighbors=5, clf__weights=uniform;, score=0.688 total time=   0.0s
[CV 2/7] END clf__algorithm=auto, clf__metric=euclidean, clf__n_neighbors=5, clf__weights=distance;, score=0.710 total time=   0.0s
[CV 3/7] END clf__

### MLPClassifier

The `MLPClassifier` reached a peak test accuracy of **0.7725**, which remained below the performance achieved with `SVC`. The model was more complex and significantly slower to train, and convergence sometimes required careful tuning of learning-related parameters. One notable difficulty was selecting the right `hidden_layer_sizes`, which could heavily impacts performance but is difficult to adjust. Despite trying several configurations, we were not able to match or surpass the accuracy obtained with support vector machines.


In [None]:
# Model parameters
pipe = Pipeline([
    ('scaler', StandardScaler()),
    ('clf', MLPClassifier(max_iter=5000))
])
param_grid = {"clf__hidden_layer_sizes": [(50,), (100,), (100, 50)], "clf__alpha": [0.1, 1, 2, 5], "clf__learning_rate": ["invscaling", "adaptive"], "clf__activation": ["relu", "tanh", "logistic"]}

# Execute GridSearchCV
grid = GridSearchCV(pipe, param_grid, cv=5, scoring='accuracy', n_jobs=-1, verbose=3)
grid.fit(X_train, y_train)
print(f"MLPClassifier:\n  Best CV Accuracy: {grid.best_score_:.4f}")
print(f"  Best Params: {grid.best_params_}")
best_model = grid.best_estimator_

# Evaluation on the test set
y_pred = best_model.predict(X_test)
test_accuracy = accuracy_score(y_test, y_pred)

print(f"  Test set accuracy: {test_accuracy}\n")


Fitting 5 folds for each of 72 candidates, totalling 360 fits
[CV 2/5] END clf__activation=relu, clf__alpha=0.1, clf__hidden_layer_sizes=(50,), clf__learning_rate=adaptive;, score=0.723 total time=   4.0s
[CV 3/5] END clf__activation=relu, clf__alpha=0.1, clf__hidden_layer_sizes=(50,), clf__learning_rate=adaptive;, score=0.728 total time=   4.1s
[CV 5/5] END clf__activation=relu, clf__alpha=0.1, clf__hidden_layer_sizes=(50,), clf__learning_rate=invscaling;, score=0.645 total time=   4.2s
[CV 3/5] END clf__activation=relu, clf__alpha=0.1, clf__hidden_layer_sizes=(50,), clf__learning_rate=invscaling;, score=0.720 total time=   4.2s
[CV 2/5] END clf__activation=relu, clf__alpha=0.1, clf__hidden_layer_sizes=(50,), clf__learning_rate=invscaling;, score=0.713 total time=   4.3s
[CV 1/5] END clf__activation=relu, clf__alpha=0.1, clf__hidden_layer_sizes=(50,), clf__learning_rate=invscaling;, score=0.710 total time=   4.4s
[CV 4/5] END clf__activation=relu, clf__alpha=0.1, clf__hidden_layer_siz

### AdaBoostClassifier

The `AdaBoostClassifier` reached a peak test accuracy of **0.7475**, which remained below the performance of `SVC`. Although it outperformed logistic regression, it did not clearly surpass other non-linear models such as `KNeighborsClassifier` or `MLPClassifier`. In practice, training was relatively slow due to the sequential nature of boosting and the increasing number of estimators. Overall, AdaBoost failed to reach the target accuracy and was not competitive with the best-performing models.


In [162]:
# Model parameters
pipe = Pipeline([
    ('scaler', StandardScaler()),
    ('clf', AdaBoostClassifier())
])
param_grid = {"clf__n_estimators": [100, 200, 300, 500, 700], "clf__learning_rate": [0.01, 0.02, 0.05, 0.1, 0.05, 1]}

# Execute GridSearchCV
grid = GridSearchCV(pipe, param_grid, cv=7, scoring='accuracy', n_jobs=-1, verbose=3)
grid.fit(X_train, y_train)
print(f"AdaBoost:\n  Best CV Accuracy: {grid.best_score_:.4f}")
print(f"  Best Params: {grid.best_params_}")
best_model = grid.best_estimator_

# Evaluation on the test set
y_pred = best_model.predict(X_test)
test_accuracy = accuracy_score(y_test, y_pred)

print(f"  Test set accuracy: {test_accuracy}\n")

Fitting 7 folds for each of 30 candidates, totalling 210 fits
[CV 2/7] END clf__learning_rate=0.01, clf__n_estimators=100;, score=0.692 total time=   0.9s
[CV 1/7] END clf__learning_rate=0.01, clf__n_estimators=100;, score=0.647 total time=   0.9s
[CV 6/7] END clf__learning_rate=0.01, clf__n_estimators=100;, score=0.653 total time=   0.9s
[CV 5/7] END clf__learning_rate=0.01, clf__n_estimators=100;, score=0.710 total time=   0.9s
[CV 3/7] END clf__learning_rate=0.01, clf__n_estimators=100;, score=0.650 total time=   0.9s
[CV 4/7] END clf__learning_rate=0.01, clf__n_estimators=100;, score=0.668 total time=   1.0s
[CV 7/7] END clf__learning_rate=0.01, clf__n_estimators=100;, score=0.698 total time=   1.0s
[CV 1/7] END clf__learning_rate=0.01, clf__n_estimators=200;, score=0.650 total time=   2.3s
[CV 2/7] END clf__learning_rate=0.01, clf__n_estimators=200;, score=0.717 total time=   2.5s
[CV 4/7] END clf__learning_rate=0.01, clf__n_estimators=200;, score=0.664 total time=   2.5s
[CV 5/7]