## 1. Provide an example of the concepts of Prior, Posterior, and Likelihood.

**Ans:**

Let's use a medical diagnosis example to illustrate the concepts of Prior, Posterior, and Likelihood:

Imagine a patient undergoing a medical test to determine whether they have a rare disease, let's call it "Disease X." We want to calculate the probability of the patient having the disease based on the test results.

1. Prior Probability (Prior):
   - Prior probability is our initial belief or estimate of the likelihood of the patient having Disease X before any test results are known.
   - Let's say, based on historical data and known risk factors, the prior probability of a random person having Disease X is 2%.

2. Likelihood (Likelihood):
   - Likelihood is the probability of observing the test results (evidence) given the patient's disease status.
   - In our case, the likelihood describes the probability of getting positive test results if the patient actually has Disease X and the probability of getting negative test results if the patient doesn't have the disease.
   - Let's assume that the test has a sensitivity of 95% (it correctly identifies the disease in 95% of cases) and a specificity of 90% (it correctly identifies the absence of the disease in 90% of cases).

3. Posterior Probability (Posterior):
   - Posterior probability is the updated probability of the patient having Disease X after taking the test results into account. It's calculated using Bayes' Theorem.
   - We want to find P(Disease X | Positive Test), which is the probability of the patient having the disease given that they received a positive test result.

Now, let's calculate the Posterior Probability (P(Disease X | Positive Test)):

Using Bayes' Theorem:
\[P(Disease X | Positive Test) = \frac{P(Positive Test | Disease X) * P(Disease X)}{P(Positive Test)}\]

- Prior Probability (P(Disease X)): 2% or 0.02 (our initial belief).
- Likelihood (P(Positive Test | Disease X)): 95% or 0.95 (the probability of a positive test result when having the disease).
- Complement of Prior Probability (P(Not Disease X)): 1 - 0.02 = 0.98 (the probability of not having the disease).
- Likelihood (P(Positive Test | Not Disease X)): 10% or 0.10 (the probability of a positive test result when not having the disease).

Now, let's calculate P(Positive Test) using the law of total probability:
\[P(Positive Test) = P(Positive Test | Disease X) * P(Disease X) + P(Positive Test | Not Disease X) * P(Not Disease X)\]
\[P(Positive Test) = (0.95 * 0.02) + (0.10 * 0.98) = 0.019 + 0.098 = 0.117\]

Now, we can calculate the Posterior Probability using Bayes' Theorem:
\[P(Disease X | Positive Test) = \frac{0.95 * 0.02}{0.117} ≈ 0.163\]

So, the Posterior Probability of the patient having Disease X after receiving a positive test result is approximately 16.3%.

In this example:
- Prior Probability (Prior) represents our initial belief.
- Likelihood (Likelihood) describes the test's accuracy.
- Posterior Probability (Posterior) is the updated probability after considering the test results.

## 2. What role does Bayes&#39; theorem play in the concept learning principle?

**Ans:**

Bayes' theorem plays a role in the concept learning principle by providing a probabilistic framework to update beliefs about the likelihood of a concept or hypothesis based on new evidence or data. In concept learning, Bayes' theorem helps in assessing the probability of a hypothesis being correct or incorrect given the observed data, allowing for the refinement of the learned concept. It enables a systematic and data-driven approach to concept learning, where prior beliefs are updated to form posterior beliefs based on the observed evidence, facilitating more accurate and informed concept acquisition.

## 3. Offer an example of how the Nave Bayes classifier is used in real life.

**Ans:**

**Email Spam Filtering:**

- **Problem**: The goal is to automatically classify incoming emails as either spam (unwanted or unsolicited emails) or non-spam (legitimate emails) to protect users from unwanted content and clutter in their inboxes.

- **How Naïve Bayes is Used**:
  - **Training Phase**: During the training phase, the classifier is provided with a labeled dataset of emails. Each email is tagged as spam or non-spam.
  - **Feature Extraction**: The contents of each email are processed to extract relevant features, such as words or phrases. These features are used to build a vocabulary of terms.
  - **Calculating Conditional Probabilities**: For each term in the vocabulary, the classifier calculates the conditional probabilities of that term occurring in spam emails and non-spam emails. This is done using Bayes' theorem and the training data.
  - **Building a Model**: The classifier constructs a probabilistic model that assigns a probability to an incoming email being spam or non-spam based on the presence and frequency of terms from the vocabulary in the email.
  - **Classification**: When a new email arrives, the Naïve Bayes classifier calculates the probability that the email is spam or non-spam based on the observed terms in the email.
  - **Thresholding**: The classifier compares the calculated probabilities to a threshold (e.g., 0.5). If the probability of being spam is above the threshold, the email is classified as spam; otherwise, it's classified as non-spam.

## 4. Can the Nave Bayes classifier be used on continuous numeric data? If so, how can you go about doing it?

**Ans:**

Yes, the Naïve Bayes classifier can be used on continuous numeric data, but it typically requires discretization or the use of probability density functions. 

Here's how you can go about using Naïve Bayes with continuous data:

1. **Discretization**:
   - One common approach is to discretize the continuous data into discrete bins or intervals. This allows you to treat the continuous data as categorical data.
   - For example, if you have a dataset of people's ages, you can create age groups like "under 20," "20-30," "30-40," and so on.
   - After discretization, you can apply the standard Naïve Bayes classifier for categorical data.

2. **Probability Density Functions (PDFs)**:
   - Instead of discretization, you can use probability density functions to model the continuous data directly.
   - You assume a probability distribution for the data, such as a Gaussian (normal) distribution for continuous data.
   - You estimate the parameters (mean and variance) of the distribution from the training data for each class (e.g., spam and non-spam).
   - When a new data point is presented, you calculate the likelihood of it belonging to each class using the PDF of the distribution.
   - Bayes' theorem is then applied to compute the posterior probabilities for each class.
   - The class with the highest posterior probability is the predicted class.

3. **Kernel Density Estimation (KDE)**:
   - KDE is a non-parametric approach that estimates the probability density function of the data without assuming a specific distribution.
   - It's especially useful when the underlying data distribution is not known or when data is multimodal (has multiple peaks).
   - KDE estimates the likelihood of a data point belonging to a class based on its density.

4. **Naïve Bayes Variants**:
   - There are variants of the Naïve Bayes classifier designed for continuous data, such as the Gaussian Naïve Bayes, which assumes a Gaussian distribution for each feature.
   - Other variants include Multinomial Naïve Bayes (for count-based data) and Bernoulli Naïve Bayes (for binary data).


## 5. What are Bayesian Belief Networks, and how do they work? What are their applications? Are they capable of resolving a wide range of issues?

**Ans:**

Bayesian Belief Networks (BBNs), also known as Bayesian Networks or Bayes Nets, are graphical models that represent probabilistic relationships among a set of variables. BBNs are used to encode and reason about uncertainty, making them a powerful tool in various fields. 

Here's an overview of BBNs:

**How Bayesian Belief Networks Work:**

1. **Graphical Structure:** BBNs consist of two main components: a directed acyclic graph (DAG) and conditional probability tables (CPTs). In the graph:
   - Nodes represent random variables or events.
   - Directed edges indicate probabilistic dependencies between variables. An arrow from node A to node B implies that A influences B.

2. **Conditional Probability Tables (CPTs):** Each node has an associated CPT that specifies the conditional probabilities of the node given its parent nodes in the graph. These tables capture the probabilistic relationships among variables.

3. **Propagation:** BBNs allow for efficient probabilistic inference. Given evidence (observed values) for some variables, BBNs can propagate this evidence to calculate the probabilities of other variables in the network.

4. **Updating Beliefs:** BBNs can update beliefs about a variable as new evidence becomes available. This makes them suitable for dynamic scenarios where information evolves over time.

**Applications of Bayesian Belief Networks:**

BBNs have a wide range of applications, including:

1. **Medical Diagnosis:** BBNs are used to assist doctors in diagnosing diseases and medical conditions by combining patient symptoms and test results to provide probabilistic diagnoses.

2. **Risk Assessment:** BBNs help assess risks in various domains, such as finance, insurance, and project management, by modeling uncertainties and identifying potential risks and their impacts.

3. **Natural Language Processing:** BBNs can model language structures, semantics, and disambiguation for applications like speech recognition, machine translation, and sentiment analysis.

4. **Criminal Justice:** BBNs aid in criminal profiling, decision support for law enforcement, and predicting criminal activities by considering various pieces of evidence and indicators.

5. **Environmental Modeling:** BBNs are used to model complex ecosystems, climate systems, and pollution propagation to assess environmental risks and make informed decisions.

6. **Quality Control:** BBNs can help improve quality control processes by modeling production defects, identifying root causes, and optimizing manufacturing processes.

**Capability and Limitations:**

BBNs are versatile and can handle complex, real-world problems by explicitly modeling and reasoning about uncertainty. They provide a structured way to combine domain knowledge with data to make informed decisions. However, they do have some limitations:

1. **Scalability:** As the number of variables and dependencies increases, BBNs can become computationally expensive.

2. **Data Requirements:** BBNs require substantial data for parameter estimation and may not perform well with limited data.

3. **Model Assumptions:** The "Naïve Bayes" assumption of conditional independence between parent nodes for each node may not hold in all situations, potentially leading to inaccuracies.

4. **Complexity:** Building BBNs can be complex and requires expertise in modeling and domain knowledge.

In summary, Bayesian Belief Networks are powerful tools for modeling and reasoning about uncertainty, making them suitable for a wide range of applications. However, their performance depends on the quality of data and the modeling process. They are capable of addressing diverse problems but should be used thoughtfully to address the specific requirements of each problem.

## 6. Passengers are checked in an airport screening system to see if there is an intruder. Let I be the random variable that indicates whether someone is an intruder I = 1) or not I = 0), and A be the variable that indicates alarm I = 0). If an intruder is detected with probability P(A = 1|I = 1) = 0.98 and a non-intruder is detected with probability P(A = 1|I = 0) = 0.001, an alarm will be triggered, implying the error factor. The likelihood of an intruder in the passenger population is P(I = 1) = 0.00001. What are the chances that an alarm would be triggered when an individual is actually an intruder?

### Solution:


   $P(T = 1 | A = 1) = P(A = 1 | T = 1) * P(T = 1) / P(A = 1)$
   
   

   $P(T = 1 | A = 1) = [P(A = 1 | T = 1) * P(T = 1)] / [P(A = 1 | T = 1) * P(T = 1) + P(A = 1 | T = 0) * P(T = 0)]$



  $P(T = 1 | A = 1) = [0.98 * 0.00001] / [0.98 * 0.00001 + 0.001 * (1 - 0.00001)]$



   $P(T = 1 | A = 1) ≈ 0.0097$
   

So, the probability of event T = 1 occurring given that event A = 1 has occurred is approximately 0.01.

## 7. An antibiotic resistance test (random variable T) has 1% false positives (i.e., 1% of those who are not immune to an antibiotic display a positive result in the test) and 5% false negatives (i.e., 1% of those who are not resistant to an antibiotic show a positive result in the test) (i.e. 5 percent of those actually resistant to an antibiotic test negative). Assume that 2% of those who were screened were antibiotic-resistant. Calculate the likelihood that a person who tests positive is actually immune (random variable D).

### Solution:

- T = p means Test positive,

- T = n means Test negative,

- D = p means person takes antibiotics,

- D = n means person does not take antibiotics

We know:

$$P(T=p|D=n) = 0.01   (false positives)$$


$$(false negatives)  P(T=n|D=p) = 0.05 =⇒ P(T=p|D=p) = 0.95    (true positives)$$	

$$P(D=p) = 0.02 =⇒ P(D=n) = 0.98$$


We want to know the probability that somebody who tests positive is actually taking antibiotics:


$$P(D=p|T=p) = \frac{P(T=p|D=p)P(D=p)}{P(T = p)}$$ (Bayes theorem)

We do not know $P(T = p)$:

$$P(T=p)= P(T=p|D=p)P(D = p) + P(T=p|D=n)P(D=n)$$


We get:


$$P(D=p|T=p)=\frac{P(T = p|D = p)P(D = p)}{P(T = p)}$$


 $$=\frac{P(T=p|D=p)P(D = p)}{P(T = p|D = p)P(D = p)+P(T=p|D=n)P(D = n)}$$

$$=\frac{0.95 • 0.02}{0.95 • 0.02 + 0.01 • 0.98}$$
 
$$= 0.019/0.0288 ≈ 0.66$$


There is a chance of only two thirds that someone with a positive test is actually taking antibiotics.

## 8. In order to prepare for the test, a student knows that there will be one question in the exam that is either form A, B, or C. The chances of getting an A, B, or C on the exam are 30 percent, 20%, and 50 percent, respectively. During the planning, the student solved 9 of 10 type A problems, 2 of 10 type B problems, and 6 of 10 type C problems.
### 1. What is the likelihood that the student can solve the exam problem?
### 2. Given the student&#39;s solution, what is the likelihood that the problem was of form A?

### Solution:

### 1.


$P (solved)   =   P (solved|A)P (A) + P (solved|B)P (B) + P (solved|C)P (C)$


=9/10 • 30% + 2/10 • 20% + 6/10 • 50


=27/100 + 4/100 + 30/100 = 61/100 = 0.614


## 2.

$P(A|solved)= P(solved|A)P(A)/P(solved)$
 

 
= (9/10•30%)/(61/100) = (27/100)/(61/100) = 27/61 = 0.442


## 9. A bank installs a CCTV system to track and photograph incoming customers. Despite the constant influx of customers, we divide the timeline into 5 minute bins. There may be a customer coming into the bank with a 5% chance in each 5-minute time period, or there may be no customer (again, for simplicity, we assume that either there is 1 customer or none, not the case of multiple customers). If there is a client, the CCTV will detect them with a 99 percent probability. If there is no customer, the camera can take a false photograph with a 10% chance of detecting movement from other objects.
### 1. How many customers come into the bank on a daily basis (10 hours)?
### 2. On a daily basis, how many fake photographs (photographs taken when there is no customer) and how many missed photographs (photographs taken when there is a customer) are there?
### 3. Explain likelihood that there is a customer if there is a photograph?

### Solution:

## 1.


There are 10×12 = 120 five-minute periods per day

In each period there is a probability of 5%  customers being present. 

Thus the average number of cutomers is 120×5% = 120×0.05 = 6.0

# 2.

On average there is no cutomer in (120 − 6) of the five-minute periods. 

This times the probability of 10% per period for a fake photographs yields (120 − 6) × 10% = 114 × 0.1 = 11.4 fake photographs.


On average there are 6 customers, each of which has a probability of 1% of getting missed. Thus the number of missed photographs is 

6 × 1% = 6 × 0.01 = 0.06.

# 3.


For this question we need Bayes theorem.


$$P(customer|photograph) = \frac{P(photograph|customer)P(customer)}{P(photograph)}$$

$$=\frac{P(photograph|customer)P(customer)}{P(photograph|customer)P(customer) + P((photograph|no customer)P(no customer)}$$

$$=\frac{(0.99)(0.05)}{(0.99)(0.05) + (0.1)(1 − 0.05)}$$

$$= 0.34256055363321797$$

## 10. Create the conditional probability table associated with the node Won Toss in the Bayesian Belief network to represent the conditional independence assumptions of the Nave Bayes classifier for the match winning prediction problem in Section 6.4.4.

### Solution:

In [9]:
## I dont have the access of Section 6.4.4. so assuming the data from my side

import pandas as pd

# Create a DataFrame for the conditional probability table
data = {
    'Match Outcome': ['Yes (Win)', 'Yes (Win)', 'No (Lose)', 'No (Lose)'],
    'Won Toss': [1, 0, 1, 0],
    'Probability': [0.6, 0.4, 0.3, 0.7]  # Hypothetical probabilities
}

df = pd.DataFrame(data)
df

Unnamed: 0,Match Outcome,Won Toss,Probability
0,Yes (Win),1,0.6
1,Yes (Win),0,0.4
2,No (Lose),1,0.3
3,No (Lose),0,0.7


In [10]:
# Pivot the table for a more structured view
cpt = df.pivot(index='Match Outcome', columns='Won Toss', values='Probability')

# Fill missing values with zeros
cpt = cpt.fillna(0)

print(cpt)

Won Toss         0    1
Match Outcome          
No (Lose)      0.7  0.3
Yes (Win)      0.4  0.6
