![image.png](attachment:b21f6323-cd9f-4cb6-b7e4-0436bfd89e65.png)

Prior probability, in Bayesian statistical inference, is the probability of an event before new data is collected. This is the best rational assessment of the probability of an outcome based on the current knowledge before an experiment is performed.

 Suppose that we observe a number of independent realizations of a Bernoulli random variable (i.e., a variable that is equal to 1 if a certain experiment succeeds and 0 otherwise). In this case, the set of data-generating distributions is the set of all Bernoulli distributions, which are indexed by a single parameter (the probability of success of the experiment). The distribution assigned to the parameter before observing the outcomes of the experiments is the prior distribution (usually a Beta distribution).

![image.png](attachment:9620222e-2b81-4823-98a5-76d4d84d8432.png)

Posterior probability is a revised probability that takes into account new available information. For example, let there be two urns, urn A having 5 black balls and 10 red balls and urn B having 10 black balls and 5 red balls. Now if an urn is selected at random, the probability that urn A is chosen is 0.5.

This is the a priori probability. If we are given an additional piece of information that a ball was drawn at random from the selected urn, and that ball was black, what is the probability that the chosen urn is urn A? Posterior probability takes into account this additional information and revises the probability downward from 0.5 to 0.333 according to Bayes´ theorem, because a black ball is more probable from urn B than urn A.

![image.png](attachment:ad39aff2-2f61-43c1-9bd3-736381503576.png)

- In Probability, likely means the possibility of an event that can occur. Probability is the branch of mathematics, where the occurrence of a random event is dealt. Probability is also termed as likelihood or possibility sometimes.
- The higher the probability number or percentage of an event, the more likely is it that the event will occur. The probability of a certain event occurring depends on how many possible outcomes the event has. If an event has only one possible outcome, the probability for this outcome is always 1 (or 100 percent).

![image.png](attachment:198955c9-9cc7-43a3-9098-f261ec838d03.png)

Naive Bayes classifiers are a collection of classification algorithms based on Bayes’ Theorem. It is not a single algorithm but a family of algorithms where all of them share a common principle, i.e. every pair of features being classified is independent of each other.

Naive Bayes is called naive because it assumes that each input variable is independent. This is a strong assumption and unrealistic for real data; however, the technique is very effective on a large range of complex problems.

![image.png](attachment:cb620225-989f-490d-a01f-1c0a23dcc4c4.png)

The Bayes Optimal Classifier is a probabilistic model that makes the most probable prediction for a new example.

It is described using the Bayes Theorem that provides a principled way for calculating a conditional probability. It is also closely related to the Maximum a Posteriori: a probabilistic framework referred to as MAP that finds the most probable hypothesis for a training dataset.

In practice, the Bayes Optimal Classifier is computationally expensive, if not intractable to calculate, and instead, simplifications such as the Gibbs algorithm and Naive Bayes can be used to approximate the outcome.

Bayes Theorem provides a principled way for calculating conditional probabilities, called a posterior probability.
Maximum a Posteriori is a probabilistic framework that finds the most probable hypothesis that describes the training dataset.
Bayes Optimal Classifier is a probabilistic model that finds the most probable prediction using the training data and space of hypotheses to make a prediction for a new data instance.

![image.png](attachment:092dbdbf-b449-4080-a946-e0713b8e01ab.png)

- Each observed training example can incrementally decrease or increase the estimated probability that a hypothesis is correct.
-  This provides a more flexible approach to learning than algorithms that completely eliminate a hypothesis if it is found to be inconsistent with any single example.

- Prior knowledge can be combined with observed data to determine the final probability of a hypothesis. In Bayesian learning, prior knowledge is provided by asserting.

- A prior probability for each candidate hypothesis, and
- A probability distribution over observed data for each possible hypothesis.
- Bayesian methods can accommodate hypotheses that make probabilistic predictions
- New instances can be classified by combining the predictions of multiple hypotheses,
weighted by their probabilities.
- Even in cases where Bayesian methods prove computationally intractable, they can
provide a standard of optimal decision making against which other practical methods
can be measured.

![image.png](attachment:96474a85-d57a-403e-bedf-2d2a824972c3.png)

-  A learner L using a hypothesis H and training data D is said to be a consistent learner if it always outputs a hypothesis with zero error on D whenever H contains such a hypothesis.
- By definition, a consistent learner must produce a hypothesis in the version space for H given D.
- Therefore, to bound the number of examples needed by a consistent learner, we just need to bound the number of examples needed to ensure that the version-space contains no hypotheses with unacceptably high error.

![image.png](attachment:a8759c56-4762-45eb-9f62-94e1471b3641.png)

- It is simple and easy to implement.
- It doesn't require as much training data.
- It handles both continuous and discrete data.
- It is highly scalable with the number of predictors and data points.
- It is fast and can be used to make real-time predictions.

![image.png](attachment:17e4444e-4baf-435e-839d-929a25bc1ea7.png)

- If your test data set has a categorical variable of a category that wasn’t present in the training data set, the Naive Bayes model will assign it zero probability and won’t be able to make any predictions in this regard. This phenomenon is called ‘Zero Frequency,’ and you’ll have to use a smoothing technique to solve this problem.
- This algorithm is also notorious as a lousy estimator. So, you shouldn’t take the probability outputs of ‘predict_proba’ too seriously. 
- It assumes that all the features are independent. While it might sound great in theory, in real life, you’ll hardly find a set of independent features.

![image.png](attachment:46df24c5-2890-4db6-af61-807d37d75875.png)

**Text Classification**

Naive Bayesian algorithm is a simple classification algorithm which uses probability of the events for its purpose. It is based on the Bayes Theorem which assumes that there is no interdependence amongst the variables. For example, if a fruit is banana and it has to be yellow/green in colour, in the shape of a banana and 1-2cm in radius. All of the properties stated above contribute individually towards that fruit being a banana and hence these features are referred to as “Naive”. As it considered the feature set to be Naive, the Naive Bayesian algorithm can be trained using less training data and also mislabeled data. 

The Bayes Theorem is based on the following formula :

`P(A/B) =P(A) x P(B/A)P(B)`

Here we are calculating posterior probability of the class A when predictor B is given to us ie. P(A/B). P(A) is the prior probability of the class. P(B/A) is the likelihood of predictor B given class A probability. P(B) is the prior probability of the predictor B. Calculating these probabilities will help us calculate probabilities of the words in the text.

In the text classification we have to do two things at most priority that are :-
 - Make_pipeline
 - TfidfVectorizer for the text data.

**Spam Filtering**

- Naive Bayes classifiers are a popular statistical technique of e-mail filtering. They typically use bag-of-words features to identify spam e-mail, an approach commonly used in text classification.

- Naive Bayes classifiers work by correlating the use of tokens (typically words, or sometimes other things), with spam and non-spam e-mails and then using Bayes' theorem to calculate a probability that an email is or is not spam.

- Naive Bayes spam filtering is a baseline technique for dealing with spam that can tailor itself to the email needs of individual users and give low false positive spam detection rates that are generally acceptable to users. It is one of the oldest ways of doing spam filtering.
- https://medium.com/secure-and-private-ai-math-blogging-competition/spam-detection-and-filtering-with-naive-bayes-algorithm-f6c2ac181174

**Market Sentimental Analysis**

- https://towardsdatascience.com/sentiment-analysis-introduction-to-naive-bayes-algorithm-96831d77ac91