## Gaussian Naive Bayes
Use case: Continuous data (e.g., financial ratios, age, height)

Scaling: Recommended (StandardScaler or MinMaxScaler)

Loss Function: Not defined explicitly; use Accuracy, Log Loss, or Negative Log Likelihood

Key Parameters:

var_smoothing: Portion of the largest variance added to all variances for stability.

In [None]:
from sklearn.naive_bayes import GaussianNB
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, log_loss
from sklearn.model_selection import train_test_split

# Example dataset (replace with your own)
X, y = datasets.make_classification(n_samples=500, n_features=10, random_state=42)

# Scaling
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Split
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

# Model
gnb = GaussianNB(var_smoothing=1e-9)
gnb.fit(X_train, y_train)
y_pred = gnb.predict(X_test)
y_proba = gnb.predict_proba(X_test)

# Evaluation
print(f"Accuracy: {accuracy_score(y_test, y_pred):.4f}")
print(f"Log Loss: {log_loss(y_test, y_proba):.4f}")


## Multinomial Naive Bayes
Use case: Count/frequency data (e.g., word counts, bag-of-words, number of clicks)

Scaling: Do not scale with StandardScaler; use raw counts or normalize if needed.

Loss Function: Accuracy, Log Loss

Key Parameters:

alpha: Laplace smoothing parameter

fit_prior: Learn class prior from training data

class_prior: Manually set prior probabilities

In [None]:
from sklearn.naive_bayes import MultinomialNB
from sklearn.datasets import fetch_20newsgroups_vectorized

# Use pre-vectorized text data
data = fetch_20newsgroups_vectorized()
X, y = data.data, data.target

# Split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Model
mnb = MultinomialNB(alpha=1.0)
mnb.fit(X_train, y_train)
y_pred = mnb.predict(X_test)
y_proba = mnb.predict_proba(X_test)

# Evaluation
print(f"Accuracy: {accuracy_score(y_test, y_pred):.4f}")
print(f"Log Loss: {log_loss(y_test, y_proba):.4f}")


## Bernoulli Naive Bayes
Use case: Binary features (e.g., yes/no, 0/1 presence)

Scaling: Not needed â€” input should be binary (0/1)

Loss Function: Accuracy, Log Loss

Key Parameters:

alpha: Laplace smoothing

binarize: Threshold for converting values to 0/1

fit_prior: Whether to learn class prior

class_prior: Manually specify class prior

In [None]:
from sklearn.naive_bayes import BernoulliNB
from sklearn.preprocessing import Binarizer

# Generate data
X, y = datasets.make_classification(n_samples=500, n_features=10, random_state=42)

# Binarize features
binarizer = Binarizer(threshold=0.0)
X_binary = binarizer.fit_transform(X)

# Split
X_train, X_test, y_train, y_test = train_test_split(X_binary, y, test_size=0.2, random_state=42)

# Model
bnb = BernoulliNB(alpha=1.0, binarize=None)
bnb.fit(X_train, y_train)
y_pred = bnb.predict(X_test)
y_proba = bnb.predict_proba(X_test)

# Evaluation
print(f"Accuracy: {accuracy_score(y_test, y_pred):.4f}")
print(f"Log Loss: {log_loss(y_test, y_proba):.4f}")
