# Classification: Naive Bayes

## Bayesian classification

* finding $P(L~|~{\rm features})$, the **probability of a label given some observed features**

* relies on **Bayes Theorem**

## Naive Bayes model:

* generative model for each label
* assumes predictors are independent ("NAIVE") "class conditional independence"

    * i.e. the presence/effect of a particular feature in a class is unrelated to the presence of any other feature
    
    * e.g. a fruit may be classified as an apple if it is red, round, and approx 5 cm in diameter
        * these features may depend on each other or upon the existence of the other features
        * all of these properties independently contribute to the probability that this fruit is an apple => NAIVE

    * e.g. a loan applicant is desirable or not depending on his/her income, previous loan and transaction history, age, and location
        * Even if these features are interdependent, these features are still considered independently

### Steps:

1. Calculate the prior probability for given class labels
* Find Likelihood probability with each attribute for each class (i.e. via mean and standard devation)
* Put these value in Bayes Formula and calculate posterior probability.
* See which class has a higher probability, given the input belongs to the higher probability class.

### Benefits/Advantages: 

* **simple** generative model 
* extremely **fast** and simple classification algorithm 
* suitable for very **high-dimensional** datasets
* few tunable parameters, they end up being very useful as a "quickanddirty" baseline for a classification problem

### Disadvantages:

* In practice, it is almost impossible that model will get a set of predictors which are entirely independent.

### Applications

* spam filtering
* text classification
* sentiment analysis
* recommender systems

**Multinomial Naive Bayes**

    This is mostly used for document classification problem,
    i.e whether a document belongs to the category of sports, politics, technology etc. 
    The features/predictors used by the classifier are the frequency of the words present in the document.

**Bernoulli Naive Bayes**

    This is similar to the multinomial naive bayes but the predictors are boolean variables. 
    The parameters that we use to predict the class variable take up only values yes or no, for example if a word occurs in the text or not.

**Gaussian Naive Bayes**

    When the predictors take up a continuous value and are not discrete, 
    we assume that these values are sampled from a gaussian distribution.

* further reading: https://scikit-learn.org/stable/modules/naive_bayes.html