# Naives Bayes
Naive Bayes is a popular supervised learning method used for classification tasks. It is based on the application of [Baye's Theorem](https://salvadorz.atlassian.net/wiki/spaces/SYS/pages/33882128/Bayes+Theorem "Baye's Theorem") with the assumption of feature independence, known as the "naive" assumption. Despite its simplicity, Naive Bayes can achieve good performance and is particularly effective in text classification and spam filtering tasks.

The concept behind Naive Bayes is to calculate the probability of a given sample belonging to each class and then classify the sample based on the highest probability. It utilizes Bayes' theorem, which calculates the posterior probability of a class given the observed features.

Mathematically, the Naive Bayes classifier can be represented as:
$$ \large P(class|features)= \frac{P(features|class)}{P(features)}P(class) $$

To estimate these probabilities, different approaches are used based on the type of data and the distribution assumptions. Some common approaches include:

* **Multinomial Naive Bayes**:
    This approach is suitable for discrete features, often used in text classification. It assumes that the features follow a multinomial distribution, and the probabilities are calculated based on the frequency of each feature.

* **Gaussian Naive Bayes**:
    This approach assumes that the features follow a Gaussian (normal) distribution. It is suitable for continuous or real-valued features. The probabilities are calculated based on the mean and variance of each feature in each class.

* **Bernoulli Naive Bayes**:
    This approach is similar to Multinomial Naive Bayes but assumes that the features are binary variables. It is commonly used for document classification tasks, where the presence or absence of specific words determines the class probabilities.

In [3]:
# Naive Bayes, Bernoulli version. Bernoulli uses binary feature vectors,
# see: http://scikit-learn.org/stable/modules/naive_bayes.html
# from CS 229 Standford University.
# Classifier : Spam filter example

import numpy as np
import sklearn



# Each feature vector's length is equal to the number of words in "the dictionary".
# The dictionary is incredibly short, just 16 words
# The depth of the array is equal to the number m of training emails. There are examples of
# 8 non-spam emails, and 8 spam emails.
# Dictionary words:
# hello goodbye dave bill ted rufus vacation dinner restaurant eating drinking sleeping equal the price buy
#   0     1       2   3    4   5        6      7        8        9      10       11      12    13  14   15

# These are word occurance vectors represented by 1=present, 0=absent in the positions shown above
#   ===>>> where we translate features into values: present or absent
X = np.array(
    [
    # non-spam emails
    # These are our P(x|y)'s = 0, NOT SPAM
    [1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0], # 0, hello dave ted rufus vacation restaurant drinking equal the
    [0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0], # 1
    [0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0], # 2
    [0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0], # 3
    [1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0], # 4
    [0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0], # 5
    [0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0], # 6
    [1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0], # 7
    # spam emails, contains the words price, buy or price+buy
    # These are our P(x|y)'s = 1, SPAM
    [0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0], # 8
    [0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1], # 9
    [0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1], # 10
    [1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1], # 11
    [0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0], # 12
    [0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1], # 13
    [0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1], # 14
    [1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1]  # 15
    ]
) 

# Outcomes (the y's). First 8 are not spam, second 8 are spam
#   ===>>> where we translate outcomes into values: not-spam or spam
Y = [0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1]



# Naive Bayes uses X and Y to calculate P(y|x)
#    What is the probability that email "x" is or is not spam?
# test feature vectors representing new emails
TEST_X = np.array(
    [
    # 5 new emails
    [0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0], # 0
    [1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0], # 1
    [0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0], # 2 price
    [1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1], # 3 buy
    [1, 1, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1]  # 4 price+buy
    ]
) 



from sklearn.naive_bayes import BernoulliNB
bernoulli_nb = BernoulliNB()

# Train
bernoulli_nb.fit(X, Y)


# Set params
BernoulliNB(alpha=1.0, binarize=0.0, class_prior=None, fit_prior=True)


m, n = TEST_X.shape
for index in range(m) :
    
    test = TEST_X[index]      # extract a row, test becomes a column vector
    test = test.reshape(1,n)  # reshape it into a row vector, what the .predict() method expects
    
    # Predict
    y_prediction = bernoulli_nb.predict(test)
    if (y_prediction[0] == 0) :
        print ("Email", index, "is not spam")
    else :
        print ("Email", index, "is spam")

   

Email 0 is not spam
Email 1 is not spam
Email 2 is not spam
Email 3 is spam
Email 4 is spam


On this Example we can see failed only on `1` case but this can be adjusted with the parameters.