## 1. What is prior probability? Give an example.

#### Prior probability:

The prior probability of an event will be revised as new data or information becomes available, to produce a more accurate measure of a potential outcome. That revised probability becomes the posterior probability and is calculated using Bayes' theorem. In statistical terms, the posterior probability is the probability of event A occurring given that event B has occurred.


#### Example:

For example, three acres of land have the labels A, B, and C. One acre has reserves of oil below its surface, while the other two do not. The prior probability of oil being found on acre C is one third, or 0.333. But if a drilling test is conducted on acre B, and the results indicate that no oil is present at the location, then the posterior probability of oil being found on acres A and C become 0.5, as each acre has one out of two chances.

Bayes' Theorem is expressed as:

$$P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}$$

Where:

- $P(A)$ = the prior probability of A occurring
- $P(B|A)$ = the conditional probability of B given that A occurs
- $P(B)$ = the probability of B occurring


## 2. What is posterior probability? Give an example.

#### Posterior probability:

A posterior probability, in Bayesian statistics, is the revised or updated probability of an event occurring after taking into consideration new information. The posterior probability is calculated by updating the prior probability using Bayes' theorem.

#### Example:

An example of posterior probability is re-evaluating the probability of a medical diagnosis (e.g., a disease) based on new test results, taking into account prior knowledge and the test's accuracy.

## 3. What is likelihood probability? Give an example.

#### Likelihood:

The term Likelihood refers to the process of determining the best data distribution given a specific situation in the data.


Likelihood is a measure of how well a statistical model or hypothesis explains observed data. It assesses the probability of observing the data given a specific model or hypothesis.

#### Example:

Suppose you are conducting a coin-flipping experiment. You have two hypotheses or models: Model A suggests the coin is fair (50% chance of heads or tails), and Model B suggests the coin is biased (60% chance of heads, 40% chance of tails).

Now, you perform 10 coin flips, and you observe 8 heads and 2 tails. To calculate the likelihood of the data under each model:

- Likelihood under Model A: You calculate the probability of getting 8 heads and 2 tails if the coin is fair. This probability is given by the binomial probability formula.
- Likelihood under Model B: You calculate the probability of getting 8 heads and 2 tails if the coin is biased as per Model B.

In this case, you are assessing how well each model explains the observed data, and the likelihood probability quantifies this explanation. It does not include prior beliefs or probabilities, as Bayesian analysis does; it focuses solely on the observed data and its compatibility with a particular model. The model with the higher likelihood given the data is considered more plausible.

**Likelihood measures how well a model or hypothesis fits the observed data, and it plays a fundamental role in statistical modeling and hypothesis testing.**

## 4. What is Naïve Bayes classifier? Why is it named so?

#### Naïve Bayes classifier:

In statistics, naive Bayes classifiers are considered as simple probabilistic classifiers that apply Bayes’ theorem. This theorem is based on the probability of a hypothesis, given the data and some prior knowledge. The naive Bayes classifier assumes that all features in the input data are independent of each other, which is often not true in real-world scenarios. However, despite this simplifying assumption, the naive Bayes classifier is widely used because of its efficiency and good performance in many real-world applications.

Bayes theorem provides a way of computing posterior probability P(c|x) from P(c), P(x) and P(x|c). Look at the equation below:

![Bayes theorem](https://av-eks-blogoptimized.s3.amazonaws.com/Bayes_rule-300x172-300x172-111664.png)



Above,

- P(c|x) is the posterior probability of class (c, target) given predictor (x, attributes).
- P(c) is the prior probability of class.
- P(x|c) is the likelihood which is the probability of the predictor given class.
- P(x) is the prior probability of the predictor.

#### Why is it named so?

**It is named "naïve" because the naive Bayes classifier assumes that all features in the input data are independent of each other**

## 5. What is optimal Bayes classifier?

The Bayes optimal classifier is a probabilistic model that makes the most probable prediction for a new example, given the training dataset.

1. **Modeling the Problem:** To use the Optimal Bayes Classifier, you first need to build a probabilistic model of the data. You estimate the prior probabilities (the probability of each class occurring without any data), and the conditional probabilities (the likelihood of observing particular data for each class).

2. **Decision Rule:** The classifier then applies the Bayes' Theorem to calculate the posterior probabilities for each class, given the data. It selects the class with the highest posterior probability as the predicted class.

Mathematically, the decision rule for the Optimal Bayes Classifier is:

Choose class $C_k$ if:

$$P(C_k | \mathbf{x}) = \frac{P(\mathbf{x} | C_k) \cdot P(C_k)}{P(\mathbf{x})}$$

is maximized over all classes $k$.


**Challenge with Optimal Bayes classifier:**


The main challenge with the Optimal Bayes Classifier is that it requires knowing the true underlying probability distributions, which is often not the case in practical machine learning applications. Instead, various classification algorithms are used, such as Naive Bayes, Support Vector Machines, or Decision Trees, to make predictions based on the available data. These algorithms approximate the Optimal Bayes Classifier's performance under the given data constraints.

## 6. Write any two features of Bayesian learning methods.

**1. Probabilistic Framework:** 

Bayesian methods provide a probabilistic framework for modeling uncertainty. They assign probabilities to different hypotheses or model parameters, allowing for quantification of uncertainty and updating beliefs as new data becomes available.

**2. Incorporation of Prior Information:** 

Bayesian learning allows the incorporation of prior knowledge or beliefs into the modeling process through prior probability distributions. This prior information can influence the model's predictions and is particularly useful when dealing with limited data.

## 7. Define the concept of consistent learners.

**Consistent Learners:**


"Consistent learner" refers to a type of learning algorithm or model that has a critical property related to its performance and behavior as the amount of data available for training grows. Specifically, a consistent learner is one that converges to the "true" or "population" model as the size of the training dataset becomes infinitely large. This concept is closely tied to statistical consistency.

Here's a more detailed explanation:

1. **True Model:** In any machine learning problem, there is a true underlying model that generates the data. For example, if you're working on a regression problem, there's a true function that relates the input variables to the output. Consistency is concerned with how closely the learner's model approaches this true model.

2. **Convergence:** A consistent learner is expected to converge to the true model. In other words, as you feed more and more data into the learner, the model it produces becomes increasingly accurate and approaches the true model. This is a desirable property because it means that, given enough data, the learner's predictions will be very close to the actual data-generating process.

3. **Asymptotic Behavior:** Consistency often relates to the asymptotic behavior of the learner. Asymptotics involve studying what happens as the sample size approaches infinity.

Consistency is a theoretical concept and may not always apply to practical machine learning algorithms. Whether a specific algorithm is consistent depends on various factors, including the learning algorithm itself, the problem's complexity, and the quality and quantity of the data.

One common example of consistent learners is the family of Maximum Likelihood Estimators (MLEs). MLEs are consistent estimators because, with a sufficiently large sample size, they converge to the true parameters of the underlying probability distribution.

Consistency is a desirable property because it suggests that, given enough data, the learner will make predictions that are arbitrarily close to the best possible predictions, based on the true model. However, in practice, this theoretical idea may be limited by factors like model complexity and the quality of the data.

## 8. Write any two strengths of Bayes classifier.

1. Efficient and Scalable: Naïve Bayes is a computationally efficient algorithm that scales well with high-dimensional datasets. It can handle a large number of features, making it suitable for text classification, document categorization, and other tasks involving a multitude of attributes.


2. Simple and Effective for Categorical Data: Naïve Bayes is particularly effective for categorical and discrete data. It performs well in situations where feature independence assumptions are reasonable, such as spam detection, sentiment analysis, and recommendation systems, even though these assumptions might not hold in reality.

## 9. Write any two weaknesses of Bayes classifier.



1. Strong Independence Assumption: The classifier assumes that features are conditionally independent, which may not hold in some real-world scenarios. This can lead to suboptimal performance when features are correlated.



2. Limited Expressiveness: Naïve Bayes is not well-suited for tasks that require modeling complex relationships and dependencies between features. It may struggle with capturing nuanced patterns in data, especially when the independence assumption is violated.

## 10. Explain how Naïve Bayes classifier is used for
### 1. Text classification
### 2. Spam filtering
### 3. Market sentiment analysis



### Text Classification:
   - In text classification, the goal is to categorize text documents into predefined classes or categories, such as news articles, customer reviews, or social media posts.
   - The Naïve Bayes classifier is employed to determine the probability of a document belonging to a specific category based on the words or features present in the document.
   - The classifier calculates the conditional probabilities of each word occurring in a document given its category. The product of these probabilities is used to estimate the probability of the document belonging to a category.
   - This method is particularly effective for text data because it can efficiently handle a large number of features (words) and is well-suited for high-dimensional data. It's commonly used in applications like spam detection, topic classification, and sentiment analysis.

### Spam Filtering:
   - Spam filtering aims to identify and filter out unwanted or unsolicited emails from a user's inbox.
   - Naïve Bayes is used to assess the probability of an email being spam or not, based on the words, phrases, and patterns in the email content.
   - The classifier is trained on a dataset of labeled emails (spam and non-spam) to estimate the conditional probabilities of certain words or features occurring in spam or legitimate emails.
   - When a new email arrives, the classifier calculates the probability that it is spam and compares it to the probability of it being legitimate. The email is then classified as spam or not based on this comparison.
   - Naïve Bayes is effective for spam filtering because it can handle a large number of email features efficiently, making it suitable for real-time email processing.

### Market Sentiment Analysis:
   - Market sentiment analysis involves assessing the sentiment or emotional tone of market-related content, such as news articles, social media posts, and financial reports.
   - The Naïve Bayes classifier can be used to determine whether the overall sentiment is positive, negative, or neutral based on the sentiment expressed in the text.
   - To do this, the classifier is trained on labeled data with examples of positive, negative, and neutral sentiment expressions. It estimates the conditional probabilities of certain words or phrases being associated with each sentiment category.
   - When analyzing new market-related text, the classifier calculates the probabilities of it falling into different sentiment categories. This information can be valuable for traders, investors, and financial analysts in making informed decisions.
