#Naive Bayes
**Naive Bayes** is a classification algorithm used for a *supervised* machine learning problem.  An example of such problem would be determining who wrote some text based on prior information.  Before going into details about the algorithm, we need to clarify some terminology.  First let's discuss what are **features** and **labels**.

##Features
**Features** can be defined as details or information that you consider important to be able to classify or draw some conclusion about a particular item.  For example, If I wanted to classify cars, I may consider wheels, number of doors, interior, and engine size, as features for determining what type of car it is.  Features are important for *supervised learning* algorithms.

##Labels
**Labels** are used to determine how we classify different objects.  In a supervised Learning problem, We label our training and test data so that our Machine Learning algorithm can fit a model based on the given inputs.  By training our model, we can make better predictions about unseen data in the future.

#The Naive Bayes Algorithm
Naive Bayes algorithm can be used to determine the **decision surface** within a scatterplot of data points.  The Decision surface is defined as a surface in which we are able to classify how future data points should be labeled based on where they lie in the graph.  The Naive Bayes Algorithm in Python is apart of the *sci-kit learn* module.  Let's look at a small example


In [4]:
import numpy as np
features_train = np.array([[-1,-1],[-2,-1],[-3,-2],[1,1],[2,1],[3,2]])
labels_train = np.array([1,1,1,2,2,2])
from sklearn.naive_bayes import GaussianNB
clf = GaussianNB()
clf.fit(features_train,labels_train)

print(clf.predict([-0.8,-1]))

[1]


We can check the accuracy of our model by running the code below

In [5]:
from sklearn.metrics import accuracy_score
labels_test = np.array([1])
pred = clf.predict([-0.8,-1])
accuracy = accuracy_score(pred,labels_test,normalize=True)

accuracy

1.0

Of course in this example, our accuracy is 100%, but that is not always the case

#Bayes Rule
**Bayes Rule** defines probablistic inference and is heavily used in statistics and artificial intelligence.  Bayes Rule starts with a *prior probability*.  The **sensitivity** of the test is the probability that the prior probability is true and the outcome is true.  The **specitivity** of the test is the probability that the prior probability is true and the outcome is actually false. Bayes Rule is defined as:

> Prior probability + test evidence -> posterior probability

The Algorithm to compute the posterior probability is as follows:
- Calculate the joint probability. That is calculate the probability of the prior * evidence being true/false and calculate the probability of the negation of the prior * evidence being true/false.
- Calculate the normalizer.  Given the joint probability will not sum of to one, it needs to normalized which is summing the results of the two joint probabilities above.
- Divide the two joint probabilities by the normalizer to get the posterior probabilites.

To make it more concrete, Suppose you were given:

1. P(C) - prior
2. P(Pos | C) - sensitivity
3. P(Neg | C) - specitivity

Then the Bayes Algorithm would go as follows:

- Multiply P(C) and P(Pos | C) which will give joint probability P(Pos,C)
- Multiply P(C) and P(Pos | ~C) which will give joint probability P(Pos,~C)
- Sum the two products above to give you the P(Pos) which is independent of C
- Divide The first product by P(Pos) which will give you the posterior probability P(C | Pos)
- Divide The second product by P(Pos) which will give you the posterior probability P(~C | Pos)

Naive Bayes is commonly used for text learning, that is, given the probability(or frequency) that two people use certain words, determine who wrote some unseen text.  Naive Bayes is "naive" because it allows you to determine how to label data given that the labels are hidden based only on the probability information given to you.  For example, using text learning, we can determine who wrote some text based on how frequent they used words, and we naively ignore the word order of their usage.  That is what makes Naive Bayes "naive".  Naive Bayes works very well on large feature spaces in text learning, but it can give inaccurate words when considering phrases that may have some meaning when paired together, but a different meaning if seperated.
