# Hyperparameter Tuning 
---

## 🔹 1. **Gaussian Naive Bayes (`GaussianNB`)**

* Used when features are **continuous** and follow (roughly) a **normal distribution**.

**Hyperparameters:**

* `var_smoothing`:

  * Adds a small constant to the variance to prevent division by zero (numerical stability).
  * Default: $10^{-9}$.
  * Tuning: Test values like $10^{-9}, 10^{-8}, 10^{-7}, …$.
  * Effect: Too small → unstable estimates; too large → overly smoothed probabilities.


In [1]:
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import GridSearchCV
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split

# Load dataset (continuous features)
X, y = load_iris(return_X_y=True)

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Define parameter grid for var_smoothing
param_grid = {'var_smoothing': np.logspace(-9, -2, 8)}

# Grid search
grid = GridSearchCV(GaussianNB(), param_grid, cv=5, scoring='accuracy')
grid.fit(X_train, y_train)

print("Best Parameters:", grid.best_params_)
print("Best CV Accuracy:", grid.best_score_)
print("Test Accuracy:", grid.score(X_test, y_test))


Best Parameters: {'var_smoothing': np.float64(1e-09)}
Best CV Accuracy: 0.9333333333333333
Test Accuracy: 0.9777777777777777



---

## 🔹 2. **Multinomial Naive Bayes (`MultinomialNB`)**

* Used for **discrete features** (e.g., word counts in text classification).

**Hyperparameters:**

* `alpha` (Laplace/Lidstone smoothing):

  * Controls how much smoothing is applied to avoid zero probabilities.
  * Default: `1.0`.
  * Tuning: Usually tested between `0.01` to `10`.
  * Effect:

    * Small `alpha` (close to 0): Model relies heavily on observed frequencies (risk of overfitting).
    * Large `alpha`: More smoothing, may underfit.

* `fit_prior`:

  * Whether to learn class priors from data.
  * Can be set `True` (default) or `False` (if you want uniform priors).

---



In [2]:
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.pipeline import Pipeline

# Load subset of text dataset
categories = ['alt.atheism', 'sci.space']
newsgroups = fetch_20newsgroups(subset='train', categories=categories, remove=('headers','footers','quotes'))
X, y = newsgroups.data, newsgroups.target

# Pipeline: vectorizer + NB
pipeline = Pipeline([
    ('vect', CountVectorizer()),
    ('nb', MultinomialNB())
])

# Define grid
param_grid = {
    'nb__alpha': [0.01, 0.1, 1, 5, 10],
    'nb__fit_prior': [True, False]
}

# Grid search
grid = GridSearchCV(pipeline, param_grid, cv=5, scoring='accuracy')
grid.fit(X, y)

print("Best Parameters:", grid.best_params_)
print("Best CV Accuracy:", grid.best_score_)

KeyboardInterrupt: 



## 🔹 3. **Bernoulli Naive Bayes (`BernoulliNB`)**

* Used for **binary/boolean features** (e.g., whether a word appears in a document).

**Hyperparameters:**

* `alpha` (same role as in MultinomialNB).
* `binarize`:

  * Threshold for turning feature values into binary (0/1).
  * Default: `0.0` (all non-zero values set to 1).
  * Tuning: Try different thresholds based on data distribution.



In [None]:
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import BernoulliNB
from sklearn.pipeline import Pipeline
from sklearn.model_selection import GridSearchCV

# Load binary classification dataset
categories = ['rec.sport.baseball', 'sci.space']
newsgroups = fetch_20newsgroups(subset='train', categories=categories, remove=('headers','footers','quotes'))
X, y = newsgroups.data, newsgroups.target

# Pipeline: vectorizer + BernoulliNB
pipeline = Pipeline([
    ('vect', CountVectorizer(binary=True)),   # force binary word presence
    ('nb', BernoulliNB())
])

# Define parameter grid
param_grid = {
    'nb__alpha': [0.01, 0.1, 1, 5, 10],
    'nb__binarize': [0.0, 0.5, 1.0, None],
    'nb__fit_prior': [True, False]
}

# Grid Search
grid = GridSearchCV(pipeline, param_grid, cv=5, scoring='accuracy')
grid.fit(X, y)

print("Best Parameters:", grid.best_params_)
print("Best CV Accuracy:", grid.best_score_)



---

## 🔹 4. **Hyperparameter Tuning Process**

We usually apply:

* **GridSearchCV** → exhaustively checks combinations.
* **RandomizedSearchCV** → samples combinations randomly (faster).
* **Cross-validation (StratifiedKFold)** → ensures fair evaluation, especially in imbalanced datasets.

---