# Naive Bayes
You should build a machine learning pipeline using a naive bayes model. In particular, you should do the following:
- Load the `mnist` dataset using [Pandas](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html). You can find this dataset in the datasets folder.
- Split the dataset into training and test sets using [Scikit-Learn](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html). 
- Train and test a naive bayes model using [Scikit-Learn](https://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.GaussianNB.html).
- Check the documentation to identify the most important hyperparameters, attributes, and methods of the model. Use them in practice.

In [1]:
# Import necessary libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import StandardScaler



### Step 1: Load the MNIST dataset using Pandas

In [2]:
# reading the mnist csv file
mnist_data = pd.read_csv("mnist.csv")

# Display the first 5 rows of the dataset
print("MNIST Dataset:")
mnist_data.head()

MNIST Dataset:


Unnamed: 0,id,class,pixel1,pixel2,pixel3,pixel4,pixel5,pixel6,pixel7,pixel8,...,pixel775,pixel776,pixel777,pixel778,pixel779,pixel780,pixel781,pixel782,pixel783,pixel784
0,31953,5,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,34452,8,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,60897,5,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,36953,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,1981,3,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


### Step 2: Split the dataset into training and test sets

In [3]:
# Assigning the target variable, and the rest are features
X = mnist_data.drop("class", axis=1)
y = mnist_data["class"]

In [4]:
# Split the dataset into 80% training and 30% testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

### Step 3: Preprocess the data (optional, but often beneficial)

In [5]:
# Standardize the feature values to have mean=0 and variance=1
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

### Step 4: Train a Gaussian Naive Bayes model

In [6]:
# Create a Gaussian Naive Bayes classifier
naive_bayes_classifier = GaussianNB()

In [7]:
# Train the model using the training data
naive_bayes_classifier.fit(X_train_scaled, y_train)

### Step 5: Test the Naive Bayes model

In [8]:
# Make predictions on the test data
y_pred = naive_bayes_classifier.predict(X_test_scaled)

### Step 6: Evaluate the model performance

In [9]:
# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy * 100:.2f}%")

Accuracy: 57.42%


In [10]:
# Save the trained model if needed
# joblib.dump(naive_bayes_classifier, 'naive_bayes_model.joblib')