# Q2 Naive-Bayes Classification (30 Points)
## Definition
Naive Bayes is a relatively simple classification algorithm based on probability and uses Bayes Theorom with an independence assumption among the features in the data. The fundamental idea of Naive Bayes is that it computes the probability of every class, which we want to reveal, based on the probability of every feature in the data.

According to Naive Bayes algorithm, we are going to assume that every feature in the data is in an independent condition on the outcome probability of each separate class. Let's assume that we are doing a car classification and we have a data such as;

| buying   | maint    | doors    | persons  | lug-boot | safety   | class    |
| :------- | :------- | :------- | :------- | :------- | :------- | :------- |
| vvhigh   | vhigh    | 2        | 2        | small    | low      | unacc    |

**Description of dataset:**
* CAR                      car acceptability
    * PRICE                  overall price
        * _buying_               buying price
        * _maint_                price of the maintenance
* TECH                   technical characteristics
    * COMFORT              comfort
        * _doors_              number of doors
        * _persons_            capacity in terms of persons to carry
        * _lug-boot_           the size of luggage boot
    * _safety_               estimated safety of the car
   
Naive Bayes assumes that above mentioned features are independent of each other.

In machine learning, Naive Bayes is advantageous against other commonly used classification algorithms because of its simplicity, speed and accuracy on small datasets and it also enables us to make classification despite missing information. Naive Bayes is a supervised learning algorithm because it needs to be trained with a labeled dataset.

## Bayes Theorem
Consider two events, $A$ and $B$. For example, $A$ is a set of car features, which are $A \in \{ vvhigh, vhigh, 2, 2, small, low \}$,and $B$ is a set of car classes that are $B \in \{ unacc, acc, good, vgood \}$


* $A \cap B$ means the intersection of $A$ and $B$.
* $P(A \mid B)$ is read as probability of A given B.

When we know that $B$ is given (Event $B$ has occurred), it means our sample space is $B$ that is the right figure. Now we are trying to compute the probability of also occuring $A$ at the same time (the conditional probability of $A$). It is obvious that we are trying to find the probability of $A \cap B$ given that we are in the space of $B$.

\begin{equation}
P(A \mid B) = \frac{P(A \cap B)}{P(B)}
\end{equation}

We can rewrite $P(A \cap B)$ as $P(A, B)$. Two of these mean the probability of $A$ and $B$ at the same time. So the new form of the equation is :

\begin{equation}
P(A \mid B) = \frac{P(A, B)}{P(B)}
\end{equation}

For the probability of $A$ and $B$, we can deduce equations below from the figure above.

\begin{align}
& P(A, B) = P(B, A) = P(A \mid B)P(B) \\
& P(A, B) = P(B, A) = P(B \mid A)P(A)
\end{align}

Let's look at the new form of the equation putting the second form of $P(A, B)$:

\begin{equation}
P(A \mid B) = \frac{P(B \mid A)P(A)}{P(B)}
\end{equation}

This equation is known as **Bayes Theorem**.
* $P(A \mid B)$ : posterior that is the probability of $A$ when it is known that $B$ is given
* $P(B)$ : evidence that is the marginal probability of $B$
* $P(B \mid A)$ : likelihood
* $P(A)$ : prior probability that is marginal probabiliy of $A$

## Naive-Bayes Formulation
Suppose we have a dataset which each observation belongs to a class from the finite set $C = \{ c_1, c_2, ..., c_n \}$ and each observation constitutes from a few features $F = \{ f_1, f_2, ..., f_b \}$. If we could compute the probabilities of $P(c_1 | F), P(c_2 | F), ..., P(c_n | F)$ then we could predict the class for a new observation $i$ to be one of those which have the highest probability.

To compute the conditional probabilities, we can use Bayes Theorem;

\begin{equation}
P(c_i \mid f_1, f_2, \dots ,f_b) = \frac{P(f_1, f_2, \dots ,f_b \mid c_i)P(c_i)}{P(f_1, f_2, \dots ,f_b)} 
\end{equation}

As you know, Naive-Bayes supposes that all features are in independent conditions, therefore we can rewrite this equation like;

\begin{equation}
P(c_i \mid f_1, f_2, \dots ,f_b) = \frac{P(f_1 \mid c_i)P(f_2 \mid c_i) \dots P(f_b \mid c_i)P(c_i)}{P(f_1, f_2, \dots ,f_b)} 
\end{equation}

The final form of equation is

\begin{align}
& \text{for} \; i = 1, 2, \dots , n \\
& P(c_i \mid f_1, f_2, \dots ,f_b) = P(c_i) \frac{\Pi_{j=1}^b P(f_j \mid c_i)}{P(f_1, f_2, \dots ,f_b)} 
\end{align}

Since $P(f_1, f_2, \dots ,f_b)$ is a constant, we can use the classification rule below.

\begin{align}
& P(c_i \mid f_1, f_2, \dots ,f_b) \propto P(c_i) \Pi_{j=1}^b P(f_j \mid c_i)
\end{align}



# Task. 
- Use the 'car_eval.csv' data set. Train a Naive Bayes model  using odd-indexed rows. Test the accuracy of your model using even-indexed rows of the dataset.
- Display and explain the likelihood (Class conditional probabilities) for each input variable.
- Discuss one missclassified case for each category in terms of class conditional probabilities. Why do you think it was missclassified.



In [150]:
from operator import pos
import pprint
import numpy as np
import pandas as pd



# Returns a dictionary with the probabilities of each class
def getClassProbabilities(data):
    # Get the "clazz" column
    data = data.loc[:, "clazz"]
    output = dict(train_dataset.loc[:,"clazz"].value_counts(dropna=False, normalize=True))
    return output


# Returns a dictionary with the probabilities of each input variable
def getFeatureValueProbabilities(data):
    # exclude the "clazz" column
    data = data.iloc[:, :-1]
    columns = data.columns.tolist()
    output = {}
    for col in columns:
        output[col] = {}
        for key, val in dict(data[col].value_counts(dropna=False, normalize=True)).items():
            output[col][key] = val
        # for key, val in dict(data[col].value_counts(dropna=False)).items():
        #     output[col][key]["count"]: val
    return output


# Returns a dictionary with the joint probabilities of every input variable and class combination
def getJointProbabilities(data):
    op = data.loc[:, "clazz"]  # last column is the op
    ip = data.iloc[:, :6]  # first 6 columns are the ip
    output = {}
    # For each class in "clazz" column

    for className, classProb in dict(op.value_counts(dropna=False, normalize=True)).items():
        # filter the data by the given class
        ip = data.loc[data["clazz"] == className].iloc[:, :-1]
        # print(str(className) + " length = " + str(len(ip)))
        output[className] = {}
        # For each column in ip
        for ipCol in ip.columns.tolist():
            check = 0
            output[className][ipCol] = {}
            # Add all joint probabilities
            # For each value in ipCol
            for key, val in dict(ip[ipCol].value_counts(normalize=True)).items():
                p = val*classProb
                output[className][ipCol][key] = p
                check += p
    return output


# Calculate likelihoods for each input variable
def getLikelihoods(classProbabilities, jointProbabilities):
    output = {}
    # For each class
    for clazz in classProbabilities.keys():
        # Store P(class)
        pClass = classProbabilities[clazz]
        output[clazz] = {}
        # Iterate through every joint probability and multiply it by pClass, divide it by pFeature, then store in output
        for featureName in jointProbabilities[clazz].keys():
            output[clazz][featureName] = {}
            for featureValue in jointProbabilities[clazz][featureName].keys():
                jProb = jointProbabilities[clazz][featureName][featureValue]
                p = jProb / pClass 
                # print("jProb of class " + str(clazz))
                output[clazz][featureName][featureValue] = p
    return output


# Display the class conditional probabilities for each input variable
def printLikelihoods(likelihoods):
    for _class in likelihoods:
        print("class: " + str(_class))
        for _colName in likelihoods[_class]:
            print("   " + str(_colName))
            for _variable in likelihoods[_class][_colName]:
                prob = likelihoods[_class][_colName][_variable]
                print("      " + str(_variable) + ": " + str(round(prob, 4)))


# Calculate the posteriors - the probability of each class given a feature
def getPosteriors(classProbs, featureProbs, likelihoods):
    output = {}
    # For each class
    for clazz in classProbs.keys():
        # Store P(class)
        pClass = classProbs[clazz]
        output[clazz] = {}
        # Iterate through every joint probability and multiply it by pClass, divide it by pFeature, then store in output
        for featureName in likelihoods[clazz].keys():
            output[clazz][featureName] = {}
            for featureValue in likelihoods[clazz][featureName].keys():
                likelihood = likelihoods[clazz][featureName][featureValue]
                pFeature = featureProbs[featureName][featureValue]
                p = likelihood * pClass / pFeature
                output[clazz][featureName][featureValue] = p
    return output
    

# Function to classify an instance by finding the class with the maximum product of posteriors for a given feature
def classifyInstance(testInstance, likelihoods, classProbs):
    # Ensure the testInstance doesn't include the "class" column
    data = testInstance.drop(index=["clazz"])
    # Find the max likelihood
    max = 0
    prediction = None
    # For each class, get the product of it's class conditional probabilities across all features
    for className, classP in classProbs.items():
        likelihood_product = 1 # instantiate as 1
        # Find each class conditional probability (likelihood) and multiply them
        for featureName in data.keys().tolist():
            value = data[featureName]
            # Get likelihood
            if value in likelihoods[className][featureName].keys():
                likelihood = likelihoods[className][featureName][value]
            else:
                likelihood = 0
            # update the product of likelihoods
            likelihood_product *= likelihood
        # update prediction if the product of all class conditional probabilitiles > max
        if likelihood_product > max:
            prediction = className
            max = likelihood_product
    return prediction


# Function to classify all instances given a matrix/vector of test data
def classifyTestData(testData, posteriors, classProbs):
    output = testData.copy()
    output.reset_index()
    for index, row in output.iterrows():
        predictedClass = classifyInstance(row, posteriors, classProbs)
        output.at[index, "clazz"] = predictedClass
    return output


# Function to determine the accuracy of the model
def getModelAccuracy(actualData, predictions):
    total = 0
    right = 0
    for index, row in actualData.reset_index().iterrows():
        total += 1
        actual = actualData.iloc[index, -1]
        predicted = predictions.iloc[index, -1]
        if actual == predicted:
            right += 1
    return right / total


# Function to find and return a dataframe of misclassified instances
def getMisclassifiedInstances(predictions, actuals):
    columns = predictions.columns.tolist()
    columns.append("p_class")
    p = predictions.to_numpy()
    a = actuals.to_numpy()
    a_ = a[p[:,-1]!=a[:,-1]]
    p_ = p[p[:,-1]!=a[:,-1], -1]
    z = np.insert(a_, -1, p_, axis=1)
    return pd.DataFrame(columns=columns, data=z)
    


# Read the data from .csv and split between test and training sets
dataset = pd.read_csv('car-eval.csv')
test_dataset = dataset[dataset.index % 2 == 0] # even indexes
train_dataset = dataset[dataset.index % 2 != 0] # odd indexes
test_dataset.reset_index()
train_dataset.reset_index()

# Calculate probability of each class
classProbs = getClassProbabilities(train_dataset)

# Calculate the probability of each feature value
featureProbs = getFeatureValueProbabilities(train_dataset)

# Calculate joint probabilities
jointProbs = getJointProbabilities(train_dataset)

# Calculate class conditional probabilities
likelihoods = getLikelihoods(classProbs, jointProbs)

# Calculate postriors.... probabilities of a class given a feature
posteriors = getPosteriors(classProbs, featureProbs, likelihoods)

# Make predictions
dataPredictions = classifyTestData(test_dataset, posteriors, classProbs)

# Get misclassified instances
misClassified = getMisclassifiedInstances(dataPredictions, test_dataset)

# Get model accuracy
modelAccuracy = getModelAccuracy(test_dataset, dataPredictions)
print(f"Accruacy = {modelAccuracy*100:.2f}%")




Accruacy = 70.95%


In [146]:
'''
The class conditional probabilities are printed below. Exampple of what they mean is:
 "Given that the class equals "unacc", the probability of the 'Buying' feature being "vhigh" is 0.2982
'''
# print(classCondProbs)
printLikelihoods(likelihoods)

class: unacc
   buying
      vhigh: 0.2982
      high: 0.2663
      med: 0.2211
      low: 0.2144
   maint
      vhigh: 0.2982
      high: 0.2596
      med: 0.2211
      low: 0.2211
   doors
      3: 0.258
      5more: 0.258
      2: 0.2529
      4: 0.2312
   persons
      2: 0.4824
      4: 0.2613
      more: 0.2563
   lug_boot
      small: 0.3702
      med: 0.3216
      big: 0.3082
   safety
      low: 0.4824
      med: 0.2965
      high: 0.2211
class: acc
   buying
      med: 0.298
      high: 0.2879
      low: 0.2222
      vhigh: 0.1919
   maint
      med: 0.298
      high: 0.2778
      low: 0.2323
      vhigh: 0.1919
   doors
      3: 0.2677
      4: 0.2576
      5more: 0.2576
      2: 0.2172
   persons
      4: 0.5
      more: 0.5
   lug_boot
      big: 0.3636
      med: 0.3586
      small: 0.2778
   safety
      high: 0.5455
      med: 0.4545
class: good
   buying
      low: 0.6667
      med: 0.3333
   maint
      low: 0.6667
      med: 0.3333
   doors
      3: 0.3077
      2: 0

In [148]:
'''Discuss one misclassified case for each class and why it may have been misclassified:'''

# For each class, print the total number of misclassified instances
def printNumMiscategorized(misclassified, classProbs):
    for category in classProbs.keys():
        # get list of misclassified examples for given category
        examples = misClassified[misClassified["clazz"] == category]
        print("For class", category, "->", len(examples), "examples were predicted incorrectly")
        # if (len(examples) > 0):
        #     print(examples.iloc[0, :])

# For each class, print one example of a miscategorization
def printMiscategorizedExamples(misClassified, classProbs):
    for category in classProbs.keys():
        # printNumMiscategorized(misClassified, category)
        examples = misClassified[misClassified["clazz"] == category]
        if (len(examples) > 0):
            print(examples.iloc[0, :])


printNumMiscategorized(misClassified, classProbs)

printMiscategorizedExamples(misClassified, classProbs)
    


For class unacc -> 251 examples were predicted incorrectly
For class acc -> 0 examples were predicted incorrectly
For class good -> 0 examples were predicted incorrectly
For class vgood -> 0 examples were predicted incorrectly
buying      vhigh
maint         med
doors           2
persons         4
lug_boot      med
safety       high
clazz       unacc
p_class       acc
Name: 0, dtype: object


- All incorrect predictions were from class "unacc". In the test data, there are 613 instances of class "unacc" and 251 were predicted incorrectly based on their features.
- The input variable likelihoods for class "unacc" are close to being evenly distributed for a given feature. For example, take the class conditional probabilities of the "buying", "maint", and "doors" features below. All are roughly 1/4
    buying
      vhigh: 0.2982
      high: 0.2663
      med: 0.2211
      low: 0.2144
    maint
      vhigh: 0.2982
      high: 0.2596
      med: 0.2211
      low: 0.2211
    doors
      3: 0.258
      5more: 0.258
      2: 0.2529
      4: 0.2312

Take the following example of a misclassified instance:
        buying      vhigh
        maint         med
        doors           2
        persons         4
        lug_boot      med
        safety       high
        clazz       unacc
        predicted_class       acc
This is misclassified because other classes have higher likelihoods for the given feature values:
    buying = vhigh:
      "unacc" = 0.2982
      "acc" = 0.1919
      "good" = 0
      "vgood" = 0    
    maint = med:
      "unacc" = 0.2211
      "acc" = 0.298
      "good" = 0.3333
      "vgood" = 0.4
    doors = 2
      "unacc" = 0.4824
      "acc" = 0.2172
      "good" = 0.2308
      "vgood" = 0.1667
    persons = 4
      "unacc" = 0.2613
      "acc" = 0.5
      "good" = 0.4615
      "vgood" = 0.5
    lug_boot = med
      "unacc" = 0.3216
      "acc" = 0.3586
      "good" = 0.3846
      "vgood" = 0.3333
    safety = high
      "unacc" = 0.2211
      "acc" = 0.5455
      "good" = 0.4615
      "vgood" = 1.0
    Result: Product of likelihoods across all features:
      "unacc" = 0.000590946
      "acc" = 0.00121486
      "good" = 0
      "vgood" = 0

For the example above, class "acc" had the highest product of likelihoods and therefore the class was classified as "acc" (albeit incorrectly). 