# Randomized Search
    Efficient Hyperparameter Optimization at Scale
## Objective

This notebook introduces Randomized Search as a computationally efficient alternative to grid search. It focuses on:

- When randomized search is superior to grid search

- Designing probability distributions for hyperparameters

- Controlling compute via n_iter

- Interpreting results under stochastic search

- Comparing randomized search to grid search

It answers:

    How do we explore large hyperparameter spaces without exploding compute cost?

## Why Randomized Search Matters

Grid search:

- Explodes combinatorially

- Wastes effort on unimportant parameters

- Scales poorly

Randomized search:

- Samples intelligently

- Finds strong configurations faster

- Scales to high-dimensional spaces

Often:

Randomized search finds better models in less time.

## When to Use Randomized Search

- ✔ Large hyperparameter spaces
- ✔ Continuous parameters
- ✔ Limited compute budgets
- ✔ Ensemble models
- ✔ Baseline optimization

- ❌ Very small grids
- ❌ Regulatory environments requiring determinism

## Imports and dataset

In [1]:
import numpy as np
import pandas as pd

from scipy.stats import loguniform, randint

from sklearn.model_selection import (
    RandomizedSearchCV,
    train_test_split
)

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.impute import SimpleImputer

from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score

In [19]:
df = pd.read_csv("D:/GitHub/Data-Science-Techniques/datasets/Supervised-classification/synthetic_credit_default_classification.csv")

X = df.drop(columns=["default", "customer_id"])
y = df["default"]

X_train, X_test, y_train, y_test = train_test_split(
    X, y,
    test_size=0.3,
    stratify=y,
    random_state=2010
)

# MODEL
## Leakage-Safe Pipeline

In [4]:
pipeline = Pipeline(steps=[
    ("imputer", SimpleImputer(strategy="median")),
    ("scaler", StandardScaler()),
    ("model", LogisticRegression(
        max_iter=1000,
        solver="liblinear"
    ))
])

## Define Parameter Distributions

Unlike grid search, parameters are sampled from distributions.

In [9]:

param_distributions = {
    "model__C": loguniform(1e-3, 1e2),
    "model__penalty": ["l1", "l2"],
    "model__class_weight": [None, "balanced"]
}

Why This Works

- C is sampled on a __log scale__

- More focus on impactful parameters

- Avoids unnecessary combinations

## – Configure RandomizedSearchCV

In [10]:
random_search = RandomizedSearchCV(
    estimator=pipeline,
    param_distributions=param_distributions,
    n_iter=30,
    scoring="roc_auc",
    cv=5,
    n_jobs=-1,
    random_state=42,
    verbose=1,
    return_train_score=True
)

`n_iter` -  directly controls compute cost.

## Fit Randomized Search

In [11]:
random_search.fit(X_train, y_train)

Fitting 5 folds for each of 30 candidates, totalling 150 fits


## Best Parameters and Score

In [13]:
random_search.best_params_

{'model__C': np.float64(0.14445251022763064),
 'model__class_weight': None,
 'model__penalty': 'l1'}

In [14]:
random_search.best_score_

np.float64(0.9157140129667354)

✔ Near-optimal
✔ Much cheaper than grid search

## Evaluate on Test Set

In [15]:
best_model = random_search.best_estimator_

In [16]:
y_test_prob = best_model.predict_proba(X_test)[:, 1]

roc_auc_score(y_test, y_test_prob)

np.float64(0.9138236949058834)

## Analyze Search Results

In [17]:
results = pd.DataFrame(random_search.cv_results_)

results[
    [
        "mean_test_score",
        "mean_train_score",
        "std_test_score",
        "params"
    ]
].sort_values("mean_test_score", ascending=False)

Unnamed: 0,mean_test_score,mean_train_score,std_test_score,params
9,0.915714,0.917549,0.01057,"{'model__C': 0.14445251022763064, 'model__clas..."
0,0.915674,0.917474,0.010483,"{'model__C': 0.0745934328572655, 'model__class..."
23,0.915622,0.917543,0.01049,"{'model__C': 0.03618723330959624, 'model__clas..."
24,0.91562,0.917553,0.010583,"{'model__C': 0.5414413211338525, 'model__class..."
14,0.915617,0.91755,0.010614,"{'model__C': 0.9163741808778786, 'model__class..."
27,0.915606,0.917562,0.010566,"{'model__C': 0.9761125443110458, 'model__class..."
15,0.915604,0.91755,0.010637,"{'model__C': 1.0907475835157696, 'model__class..."
4,0.915602,0.91756,0.010568,"{'model__C': 1.0129197956845732, 'model__class..."
12,0.915601,0.917537,0.010616,"{'model__C': 0.19069966103000435, 'model__clas..."
1,0.915599,0.917548,0.010628,"{'model__C': 4.5705630998014515, 'model__class..."


## Bias–Variance Diagnostics

In [18]:
results["overfit_gap"] = (
    results["mean_train_score"] -
    results["mean_test_score"]
)

results.sort_values("overfit_gap", ascending=False).head()

Unnamed: 0,mean_fit_time,std_fit_time,mean_score_time,std_score_time,param_model__C,param_model__class_weight,param_model__penalty,params,split0_test_score,split1_test_score,...,std_test_score,rank_test_score,split0_train_score,split1_train_score,split2_train_score,split3_train_score,split4_train_score,mean_train_score,std_train_score,overfit_gap
16,0.01121,0.000511,0.005399,0.00049,0.002115,balanced,l1,"{'model__C': 0.0021147447960615704, 'model__cl...",0.741803,0.77287,...,0.01555,27,0.732147,0.780912,0.743786,0.772332,0.803077,0.766451,0.025595,0.004279
5,0.013318,0.001328,0.006315,0.001166,0.001267,balanced,l2,"{'model__C': 0.001267425589893723, 'model__cla...",0.925593,0.90318,...,0.010614,24,0.914722,0.920369,0.914241,0.916487,0.920275,0.917219,0.002642,0.002195
28,0.011019,0.000623,0.005508,0.000626,0.00277,,l2,"{'model__C': 0.0027698899227562817, 'model__cl...",0.925314,0.903594,...,0.010342,22,0.914754,0.920354,0.914278,0.916616,0.920278,0.917256,0.002618,0.002147
7,0.012601,0.001018,0.006599,0.000491,0.008112,,l1,"{'model__C': 0.008111941985431923, 'model__cla...",0.913832,0.890663,...,0.011519,25,0.903618,0.908681,0.901488,0.903114,0.908295,0.905039,0.002905,0.00211
20,0.011405,0.000489,0.006312,0.000921,0.004076,,l2,"{'model__C': 0.00407559644007287, 'model__clas...",0.925359,0.903882,...,0.01026,21,0.914846,0.920475,0.914311,0.916677,0.920336,0.917329,0.002632,0.002067


## Randomized vs Grid Search

| Aspect            | Grid Search | Randomized Search |
| ----------------- | ----------- | ----------------- |
| Coverage          | Exhaustive  | Stochastic        |
| Compute           | High        | Controlled        |
| Continuous Params | Poor        | Excellent         |
| Reproducibility   | High        | Medium            |
| Practical Default | ❌           | ✔                 |


## Reproducibility Considerations

In [20]:
random_state=2010

- ✔ Fixes randomness
- ✔ Enables comparability
- ❌ Still non-exhaustive

## Common Mistakes (Avoided)

- ❌ Using uniform distribution for log-scale parameters
- ❌ Forgetting n_iter control
- ❌ Tuning on test set
- ❌ Ignoring train–validation gap
- ❌ Oversampling weak parameters

## Key Takeaways

- Randomized search is the default choice for tuning

- n_iter controls cost directly

- Use log-uniform for regularization

- Pipelines prevent leakage

- Grid search still useful for final refinement

## Next Notebook
07_Model_Tuning_and_Optimization/

└── [03_bayesian_optimization.ipynb](03_bayesian_optimization.ipynb)