## Hyperparameter tuning

### **1. Lasso Regression (L1 Regularization)**
- `alpha`: Regularization strength (higher values increase regularization)
- `fit_intercept`: Whether to calculate the intercept (default: True)
- `max_iter`: Maximum number of iterations (default: 1000)
- `tol`: Tolerance for stopping criteria (default: 1e-4)
- `selection`: Random or cyclic coordinate descent selection (`"random"`, `"cyclic"`)

---

### **2. Ridge Regression (L2 Regularization)**
- `alpha`: Regularization strength
- `fit_intercept`: Whether to calculate the intercept
- `solver`: Solver for optimization (`"auto"`, `"saga"`, `"lsqr"`, `"sparse_cg"`, `"sag"`, etc.)
- `max_iter`: Maximum iterations (used for some solvers)
- `tol`: Tolerance for stopping criteria

---

### **3. Elastic Net (L1 + L2 Regularization)**
- `alpha`: Regularization strength
- `l1_ratio`: Mixing parameter (0 = Ridge, 1 = Lasso, default: 0.5)
- `max_iter`: Maximum number of iterations
- `tol`: Tolerance for stopping criteria
- `fit_intercept`: Whether to calculate the intercept
- `selection`: Whether to use `"random"` or `"cyclic"` coordinate descent

---

### **4. Support Vector Machine (SVM)**
#### **For SVC (Support Vector Classification)**
- `C`: Regularization parameter (higher = less regularization)
- `kernel`: Kernel function (`"linear"`, `"poly"`, `"rbf"`, `"sigmoid"`, `"precomputed"`)
- `degree`: Degree of polynomial kernel function (used when `kernel="poly"`)
- `gamma`: Kernel coefficient (`"scale"`, `"auto"`, or a float)
- `coef0`: Independent term in polynomial and sigmoid kernels
- `tol`: Stopping criterion tolerance
- `max_iter`: Maximum number of iterations

#### **For SVR (Support Vector Regression)**
- `C`: Regularization parameter
- `epsilon`: Defines a margin of tolerance where no penalty is given in the loss function
- `kernel`: Kernel function (`"linear"`, `"poly"`, `"rbf"`, `"sigmoid"`, `"precomputed"`)
- `degree`: Degree of polynomial kernel (used when `kernel="poly"`)
- `gamma`: Kernel coefficient
- `coef0`: Independent term in polynomial and sigmoid kernels
- `tol`: Stopping criterion tolerance
- `max_iter`: Maximum number of iterations

---

### **5. Decision Tree**
- `criterion`: Function to measure the quality of a split (`"gini"`, `"entropy"` for classification, `"squared_error"`, `"friedman_mse"` for regression)
- `max_depth`: Maximum depth of the tree
- `min_samples_split`: Minimum number of samples required to split a node
- `min_samples_leaf`: Minimum number of samples required to be at a leaf node
- `max_features`: Number of features to consider for best split
- `splitter`: Strategy for choosing the split at each node (`"best"`, `"random"`)

---

### **6. Random Forest**
- `n_estimators`: Number of trees in the forest
- `criterion`: Function to measure split quality (`"gini"`, `"entropy"`, `"squared_error"`)
- `max_depth`: Maximum depth of each tree
- `min_samples_split`: Minimum samples required to split a node
- `min_samples_leaf`: Minimum samples required at a leaf node
- `max_features`: Number of features to consider when looking for the best split
- `bootstrap`: Whether bootstrap samples are used


Let's recall the cost functions of some models:

### **1. Ridge Regression (L2 Regularization)**
The cost function for **Ridge Regression** is:

$$
J(w) = \frac{1}{2m} \sum_{i=1}^{m} (y_i - X_i w)^2 + \lambda \sum_{j=1}^{n} w_j^2
$$

where:
- \( m \) is the number of training samples,
- \( X \) is the feature matrix,
- \( y \) is the target vector,
- \( w \) represents the weight vector,
- \( \lambda \) is the regularization parameter.

This adds an **L2 penalty** that shrinks weights but does not lead to sparsity.

---

### **2. Lasso Regression (L1 Regularization)**
The cost function for **Lasso Regression** is:

$$
J(w) = \frac{1}{2m} \sum_{i=1}^{m} (y_i - X_i w)^2 + \lambda \sum_{j=1}^{n} |w_j|
$$

where:
- The **L1 penalty** enforces sparsity, meaning it can force some weights to be exactly zero.

---

### **3. Elastic Net (Combination of Lasso and Ridge)**
The cost function for **Elastic Net** is:

$$
J(w) = \frac{1}{2m} \sum_{i=1}^{m} (y_i - X_i w)^2 + \lambda_1 \sum_{j=1}^{n} |w_j| + \lambda_2 \sum_{j=1}^{n} w_j^2
$$


where:
- \( \lambda_1 \) and \( \lambda_2 \) control the L1 and L2 penalties separately.
- The second form introduces a **mixing parameter** \( \alpha \) (where \( 0 \leq \alpha \leq 1 \)), allowing a balance between L1 and L2 regularization.

In [1]:
import pandas as pd
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score


In [2]:
dataset = fetch_openml('Cardiovascular-Disease-dataset', as_frame=True)  # OpenML ID for Cardiovascular Disease dataset

In [3]:
df = dataset.frame

In [4]:
df.head()

Unnamed: 0,age,gender,height,weight,ap_hi,ap_lo,cholesterol,gluc,smoke,alco,active,cardio
0,18393,2,168,62.0,110.0,80.0,1,1,0,0,1,0
1,20228,1,156,85.0,140.0,90.0,3,1,0,0,1,1
2,18857,1,165,64.0,130.0,70.0,3,1,0,0,0,1
3,17623,2,169,82.0,150.0,100.0,1,1,0,0,1,1
4,17474,1,156,56.0,100.0,60.0,1,1,0,0,0,0


In [5]:
X = dataset.data
y = dataset.target

In [6]:
X_train, X_temp, y_train, y_temp = train_test_split(X, y, test_size=0.3, random_state=42)
X_val, X_test, y_val, y_test = train_test_split(X_temp, y_temp, test_size=0.5, random_state=42)

In [7]:
# Standardize the features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_val = scaler.transform(X_val)
X_test = scaler.transform(X_test)

In [8]:
# Initialize the Logistic Regression model
log_reg = LogisticRegression()

# Fit the model to the training data
log_reg.fit(X_train, y_train)

# Predict on the validation set
y_val_pred = log_reg.predict(X_val)

# Calculate the accuracy on the validation set
accuracy = accuracy_score(y_val, y_val_pred)
print(f'Accuracy on validation set: {accuracy}')

Accuracy on validation set: 0.717047619047619


In [10]:
# Define a range of values for the C parameter and l1_ratio
C_values = [0.01, 10, 100]
l1_ratio_values = [0.5, 1]

# Initialize a dictionary to store the accuracy for each combination of C and l1_ratio
accuracy_dict = {}

for C in C_values:
    for l1_ratio in l1_ratio_values:
        # Initialize the Logistic Regression model with the current C value and l1_ratio
        log_reg = LogisticRegression(C=C, penalty='elasticnet', solver='saga', l1_ratio=l1_ratio, max_iter=10000)
        
        # Fit the model to the training data
        log_reg.fit(X_train, y_train)
        
        # Predict on the validation set
        y_val_pred = log_reg.predict(X_val)
        
        # Calculate the accuracy on the validation set
        accuracy = accuracy_score(y_val, y_val_pred)
        
        # Store the accuracy in the dictionary
        accuracy_dict[(C, l1_ratio)] = accuracy

# Print the accuracy for each combination of C and l1_ratio
for (C, l1_ratio), acc in accuracy_dict.items():
    print(f'C: {C}, l1_ratio: {l1_ratio}, Accuracy: {acc}')

C: 0.01, l1_ratio: 0.5, Accuracy: 0.6999047619047619
C: 0.01, l1_ratio: 1, Accuracy: 0.711047619047619
C: 10, l1_ratio: 0.5, Accuracy: 0.7172380952380952
C: 10, l1_ratio: 1, Accuracy: 0.7172380952380952
C: 100, l1_ratio: 0.5, Accuracy: 0.7173333333333334
C: 100, l1_ratio: 1, Accuracy: 0.7173333333333334


In [11]:
from sklearn.model_selection import GridSearchCV

# Define the parameter grid
param_grid = {
    'C': [0.01, 0.1, 1, 10, 100],
    'l1_ratio': [0.5, 1],
}

# Initialize the Logistic Regression model
log_reg = LogisticRegression(penalty='elasticnet', solver='saga', max_iter=10000)


# Initialize GridSearchCV
grid_search = GridSearchCV(log_reg, param_grid, cv=2, scoring='accuracy')

# Fit GridSearchCV to the training data
grid_search.fit(X_train, y_train)

In [12]:
# Get the best parameters and the best score
best_params = grid_search.best_params_
best_score = grid_search.best_score_

print(f'Best parameters: {best_params}')
print(f'Best cross-validation accuracy: {best_score}')

Best parameters: {'C': 10, 'l1_ratio': 1}
Best cross-validation accuracy: 0.7163673469387755


In [13]:
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import uniform, loguniform

# Define the parameter distribution
param_dist = {
    'C': loguniform(0.01, 100),
    'l1_ratio': uniform(0, 1),
}

# Initialize the Logistic Regression model
log_reg = LogisticRegression(penalty='elasticnet', solver='saga', max_iter=10000)

# Initialize RandomizedSearchCV
random_search = RandomizedSearchCV(log_reg, param_distributions=param_dist, n_iter=5, cv=2, scoring='accuracy', random_state=42)

# Fit RandomizedSearchCV to the training data
random_search.fit(X_train, y_train)

# Get the best parameters and the best score
best_params = random_search.best_params_
best_score = random_search.best_score_

print(f'Best parameters: {best_params}')
print(f'Best cross-validation accuracy: {best_score}')

Best parameters: {'C': 8.471801418819974, 'l1_ratio': 0.5986584841970366}
Best cross-validation accuracy: 0.7163061224489796


In [18]:
values = loguniform(0.01, 100).rvs(10)
print(values)

[3.10610211e+00 5.55170664e+01 2.33438161e+01 8.25357512e+00
 1.21951717e+00 2.21068212e-01 8.59137959e+01 1.87385816e+00
 7.93945329e-02 6.80463584e-02]
