# **NAIVE Bayes algorithm**
> Naive Bayes Algorithm is a `classification` algorithm based on `Bayes Theorem`. It is called naive because it assumes that the features in a dataset are `independent of each other`. This assumption is not true in real life but it simplifies the computation and gives good results in most of the cases.

It is probabilistic machine learning model that is used for classification task. Which describes the probability of an event based on prior knowledge of conditions related to the event. 

### **Bayes Theorem:**
Bayes Theorem is a mathematical formula used for calculating `conditional probability`. It is defined as:

$$P(A|B) = \frac{P(B|A)P(A)}{P(B)}$$

where A and B are events and P(B) != 0
- `P(A|B)` is the probability of event A occurring given that event B has already occurred.
- `P(B|A)` is the probability of event B occurring given that event A has already occurred.
- `P(A)` and P(B) are the probabilities of events A and B occurring independently.
- `P(B)` is the probability of event B occurring.
- `P(A|B)` is the posterior probability.
- `P(B|A)` is the likelihood.

### **Bayyes Theorm Example 1:**
> Suppose we have a dataset of emails and we want to classify them as `spam` or `not spam`. We can use Bayes Theorem to calculate the probability of an email being spam given that it contains certain words.
- We can calculate the probability of an email being spam given that it contains the word "free" and the word "money" as:
- `P(spam|free, money) = P(free, money|spam)P(spam)/P(free, money)`
- `P(spam|free, money) = P(free|spam)P(money|spam)P(spam)/P(free, money)`
- `P(spam|free, money) = P(free|spam)P(money|spam)P(spam)`

### **Bayyes Theorm Example 2:**
> Imagine you're a teacher with a class of teachers, and you know the following information:
- 60% of the students owns a bicycle.
- 30% of them bring their bicycle to school.
- Of those students who do not own a bicycle, 10% bring their bicycle to school (maybe they borrow one from a friend).

Now, if you see a student riding a bicycle to school, what is the probability that they own a bicycle?
- Let's use Bayes' Theorem to solve this problem. We'll use the following notation:
- A as the event "students owns a bicycle
- B as the event "students bring their bicycle to school

We know:
- P(A) = 0.6 (probability of owning a bicycle)
- P(B|A) = 0.3 (probability of bringing a bicycle to school given that they own a bicycle)
- P(B|A') = 0.1 (probability of bringing a bicycle to school given that they don't own a bicycle)

We want to find P(A|B) (probability of owning a bicycle given that they bring a bicycle to school).
- We can use the formula:
`P(A|B) = P(B|A)P(A)/P(B)`
- The tricky part is finding P(B). We can use the law of total probability to find it:
`P(B) = P(B|A)P(A) + P(B|A')P(A')`
- P(B) = 0.3 * 0.6 + 0.1 * 0.4 = 0.18 + 0.04 = 0.22

Now we can find P(A|B):
- P(A|B) = P(B|A)P(A)/P(B) = 0.3 * 0.6 / 0.22 = `0.8182`

So, if you see a student riding a bicycle to school, there's an 81.82% chance that they own a bicycle.

### **Naive Bayes Algorithm:**

Naive Bayes Algorithm is based on Bayes Theorem. It is defined as:

$$P(y|x_1,x_2,...,x_n) = \frac{P(x_1,x_2,...,x_n|y)P(y)}{P(x_1,x_2,...,x_n)}$$

where y is the class variable and x1, x2, ..., xn are the features.

The algorithm assumes that the features are independent of each other. So, the above equation can be written as:

$$P(y|x_1,x_2,...,x_n) = \frac{P(x_1|y)P(x_2|y)...P(x_n|y)P(y)}{P(x_1,x_2,...,x_n)}$$

The denominator is constant for a given input. So, the equation can be written as:

$$P(y|x_1,x_2,...,x_n) \propto P(x_1|y)P(x_2|y)...P(x_n|y)P(y)$$

The class with the highest probability is the output of the algorithm.

### **Types of Naive Bayes Algorithm:**
1. **Gaussian Naive Bayes:** It is used in classification when features are `continuous` and `normally distributed`.
2. **Multinomial Naive Bayes:** It is used in text/document classification where the features are `frequencies of words` or tokens in a document.
3. **Bernoulli Naive Bayes:** It is used in text/document classification when features are `binary values` (0 or 1).

### **Uses of Naive Bayes Algorithm:**
- Email spam detection, sentiment analysis, etc.
- Sentiment analysis
- Document categorization
- It is used in `recommendation systems`.
- It is used in `medical diagnosis`.
- It is used in `weather prediction`.
- It is used in `face recognition`.
- It is used in `credit scoring`.

### **Advantages of Naive Bayes Algorithm:**
- It is `simple` and `easy to implement`.
- It is `fast` and `scalable`.
- It can be used for `multi-class` classification.
- It can make `real-time predictions`.
- It is `robust` to `irrelevant features`.
- It is `robust` to `missing data`.
- It is `less sensitive` to `overfitting`.

### **Limitations of Naive Bayes Algorithm:**
- It assumes that the features are `independent` of each other which is not true in real life.
- It is `sensitive` to `imbalanced` datasets.
- Data scarcity can affect the performance of the algorithm.
  - If a categorical variable has a category in the test dataset, which was not observed in training dataset, then the model will assign a `zero probability` and will be unable to make a prediction.
- The algorithm might not perform if the features are highly `correlated`.