Let's go step by step  🚀  

---

# **🔵 Step 1: Import Required Libraries**
First, we need to **import** all necessary libraries that help in **data handling, machine learning, and evaluation**.

```python
import numpy as np  # For numerical operations
import pandas as pd  # For handling datasets

# Splitting dataset into training & testing
from sklearn.model_selection import train_test_split

# Importing different Naïve Bayes models
from sklearn.naive_bayes import GaussianNB, MultinomialNB, BernoulliNB, ComplementNB  

# Preprocessing utilities
from sklearn.preprocessing import StandardScaler, LabelEncoder  
from sklearn.feature_extraction.text import TfidfVectorizer  

# Model evaluation metrics
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix, log_loss, roc_auc_score
```

### **📌 Why these libraries?**
- `numpy` → Helps in numerical computations.  
- `pandas` → Helps in handling tabular data.  
- `train_test_split` → Splits dataset into **training** and **testing** parts.  
- `sklearn.naive_bayes` → Provides **4 different Naïve Bayes models**.  
- `StandardScaler` & `LabelEncoder` → Used for **data preprocessing**.  
- `TfidfVectorizer` → Converts **text into numerical data**.  
- `metrics` → Evaluates the **performance of models**.  

---

# **🟢 Step 2: Generate a Dummy Dataset**
Now, let’s **create a fake dataset** with a mix of **numerical, categorical, and text features**.

```python
# Setting seed for reproducibility
np.random.seed(42)  

# Creating the dataset
data = pd.DataFrame({
    'Age': np.random.randint(18, 60, 100),  # Random ages between 18-60
    'Salary': np.random.randint(20000, 100000, 100),  # Random salaries
    'Job Type': np.random.choice(['Engineer', 'Doctor', 'Teacher', 'Lawyer'], 100),  # Categorical column
    'City': np.random.choice(['New York', 'San Francisco', 'Chicago', 'Los Angeles'], 100),  # Another categorical
    'Review': np.random.choice(['Great service!', 'Terrible experience.', 'Okay, but not great.', 'Loved it!'], 100),  # Text column
    'Purchased': np.random.choice([0, 1], 100)  # Target (0 = No, 1 = Yes)
})

# Display first 5 rows of dataset
print(data.head())
```

### **📌 Understanding the dataset:**
| Age | Salary | Job Type | City | Review | Purchased |
|----|--------|---------|------|--------|-----------|
| 35 | 85000  | Doctor  | NYC  | Great service!  | 1 |
| 22 | 50000  | Lawyer  | LA   | Terrible experience. | 0 |
| 48 | 70000  | Engineer| SF   | Loved it!  | 1 |
| 29 | 65000  | Teacher | Chicago | Okay, but not great. | 0 |

- **Age & Salary** → **Numerical features** (✅ Good for GaussianNB).  
- **Job Type & City** → **Categorical features** (🔄 Need encoding).  
- **Review** → **Text feature** (🔄 Needs vectorization).  
- **Purchased** → **Target column** (🚀 The variable we want to predict).  

---

# **🟠 Step 3: Data Preprocessing**
### **🔹 1. Encoding Categorical Features (Job Type, City)**
Since Naïve Bayes models only work with **numbers**, we need to **convert categorical features into numbers**.

```python
# Initialize Label Encoder
le = LabelEncoder()

# Encode categorical features
data['Job Type'] = le.fit_transform(data['Job Type'])
data['City'] = le.fit_transform(data['City'])

# Display updated dataset
print(data.head())
```

### **📌 What happened here?**
- `Engineer`, `Doctor`, `Lawyer`, etc., are **converted into numbers** (`0,1,2,3`).  
- `New York`, `Chicago`, etc., are also **converted into numbers**.  

✅ Now, all categorical columns are **numerical**, and the dataset is ready for training!  

---

### **🔹 2. Convert Text Data into Numerical Features using TF-IDF**
For text features like `Review`, we use **TF-IDF Vectorization** to convert words into **numerical values**.

```python
# Initialize TF-IDF Vectorizer
vectorizer = TfidfVectorizer()

# Transform text data into numerical vectors
tfidf_matrix = vectorizer.fit_transform(data['Review']).toarray()

# Convert TF-IDF array to DataFrame
tfidf_df = pd.DataFrame(tfidf_matrix, columns=vectorizer.get_feature_names_out())

# Drop original text column and merge TF-IDF data
data = data.drop(columns=['Review']).reset_index(drop=True)
data = pd.concat([data, tfidf_df], axis=1)

# Display dataset after text processing
print(data.head())
```

✅ Now, the `Review` column is **replaced with TF-IDF features**, making the dataset fully numerical! 🎯  

---

# **🔵 Step 4: Splitting the Dataset**
We split our dataset into **80% training** and **20% testing**.

```python
X = data.drop(columns=['Purchased'])  # Features
y = data['Purchased']  # Target variable

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
```

📌 **Why split the dataset?**  
- **Training data** → Used to teach the model.  
- **Testing data** → Used to check the model’s accuracy.  

---

# **🟢 Step 5: Apply Different Naïve Bayes Models**
### **1️⃣ Gaussian Naïve Bayes (For Continuous Data)**
```python
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

gnb = GaussianNB()
gnb.fit(X_train_scaled, y_train)
y_pred_gnb = gnb.predict(X_test_scaled)

print("GaussianNB Accuracy:", accuracy_score(y_test, y_pred_gnb))
```
✔ Used for **continuous numerical data**.  
✔ **Standardization improves performance**.  

---

### **2️⃣ Multinomial Naïve Bayes (For Text Data)**
```python
mnb = MultinomialNB()
mnb.fit(X_train, y_train)
y_pred_mnb = mnb.predict(X_test)

print("MultinomialNB Accuracy:", accuracy_score(y_test, y_pred_mnb))
```
✔ Used for **text-based classification** (e.g., spam detection).  

---

### **3️⃣ Bernoulli Naïve Bayes (For Binary Data)**
```python
bnb = BernoulliNB()
bnb.fit(X_train, y_train)
y_pred_bnb = bnb.predict(X_test)

print("BernoulliNB Accuracy:", accuracy_score(y_test, y_pred_bnb))
```
✔ Used for **binary feature data** (e.g., word presence/absence).  

---

### **4️⃣ Complement Naïve Bayes (For Imbalanced Data)**
```python
cnb = ComplementNB()
cnb.fit(X_train, y_train)
y_pred_cnb = cnb.predict(X_test)

print("ComplementNB Accuracy:", accuracy_score(y_test, y_pred_cnb))
```
✔ Best for **imbalanced datasets**.  

---

# **🟠 Step 6: Model Evaluation**
```python
print("Classification Report:\n", classification_report(y_test, y_pred_gnb))
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred_gnb))
print("Log Loss:", log_loss(y_test, gnb.predict_proba(X_test_scaled)))
print("ROC-AUC Score:", roc_auc_score(y_test, gnb.predict_proba(X_test_scaled)[:, 1]))
```

✅ **Accuracy** → Measures correct predictions.  
✅ **Log Loss** → Lower is better.  
✅ **ROC-AUC** → Measures class separation.  

---

### **🎯 Summary**
🔥 We applied **all Naïve Bayes models** step by step!  
🔥 Converted **categorical & text data** into numerical form!  
🔥 Evaluated models using **accuracy, log loss, and ROC-AUC**!  

 🚀