(Maximum_Likelihood)=
# Chapter 2 -- Maximum Likelihood

## Introduction
Maximum Likelihood is a fundamental concept in statistics and machine learning used to estimate the parameters of a model. This chapter provides a mathematical foundation for understanding maximum likelihood and its applications in evaluating the performance of predictive models. Maximum Likelihood Estimation (MLE) is a method used to determine the parameters of a statistical model that maximize the likelihood of the observed data. In other words, it finds the parameters that make the observed data most probable.

Given a statistical model parameterized by $\theta$, and a set of observed data $X = \{ x_1, x_2, \ldots, x_n \}$, the likelihood function $L(\theta)$ is defined as the probability of the observed data given the parameters $\theta$:

$$
L(\theta) = P(X|\theta)
$$


The goal of MLE is to find the parameter $\theta$ that maximizes this likelihood function.

## Mathematical Formulation

For a set of independent and identically distributed (i.i.d.) data points, the likelihood function can be expressed as the product of the individual probabilities:

$$
L(\theta) = \prod_{i=1}^{n} P(x_i|\theta)
$$


To simplify the computation, we often work with the log-likelihood, which is the natural logarithm of the likelihood function:

$$
\ell(\theta) = \log L(\theta) = \sum_{i=1}^{n} \log P(x_i|\theta)
$$


Maximizing the log-likelihood $\ell(\theta)$ is equivalent to maximizing the likelihood $L(\theta)$.

Now, with the probabilities of predictions given by the model, we need to consider what is a good model. Although all models (good or bad) can give us a probability of certain event, yet some models will have higher accuracy in terms prediction, and that infers it to be a better model.

For example, if an event actually happened, and the model thinks it is highly likely to happen, then this model is better than the model that thinks it is unlikely to happen. Let's look at the admission example again, the first AI model coded by Jianou has the following predictions:

|   | Student A | Student B | Student C | Student D |
|------|------|------|------|------|
| Prob(Accepted)  | 0.7 | 0.8 | 0.2 | 0.4 |
| Actual Result  | Y | N | Y | N |
| Prob(Event)  | 0.7 | 0.2 | 0.2 | 0.6 |

The AI above predicts that the student A has 70\% of the chances to be accepted by the university. Student A actually was accepted (Y). So again, the AI thinks that the probability of this event (student A got accepted) to happen is 70\%. For student B, however, the AI predicts that student B has a very high chance to be accepted (80\%), but the student got rejected (N) in reality, so in that case, the AI thinks that this event (student B got rejected) has only 20\% of the chances to actually happen. Same for the student C and D. For an evaluation of the overall performance of the AI, we can multiply the 4 Prob(Event).

$$
 0.7*0.2*0.2*0.6 = 1.68\%
$$ (eq2_1)

The second AI model's predictions (shown below) are far closer to the actual results.

|   | Student A | Student B | Student C | Student D |
|------|------|------|------|------|
| Prob(Accepted)  | 0.7 | 0.3 | 0.8 | 0.2 |
| Actual Result  | Y | N | Y | N |
| Prob(Event)  | 0.7 | 0.7 | 0.8 | 0.8 |

So the Prob(Event) is high for a more accurate prediction. In the contrary, if the Jianou's AI keeps making unsatisfying predictions such as it did for student B, then the Prob(Even) would be very low. Again, we can evaluate the performance of the AI quantitatively by multiplying Prob(Event) together.

$$
 0.7*0.7*0.8*0.8 = 31.36\%
$$ (eq2_2)

This quantitative comparison between the two model reveals that the second AI, the Cambridge Analytica, is a better prediction model. The approach of choosing the larger multiplication of Prob(Event) is known as `The Maximum Likelihood'.

## Example code 

In [2]:

import numpy as np

# Function to calculate the likelihood of a model
def calculate_likelihood(predictions, actual_results):
    likelihood = 1.0
    for prediction, actual in zip(predictions, actual_results):
        if actual == 'Y':
            likelihood *= prediction
        else:
            likelihood *= (1 - prediction)
    return likelihood

# Model predictions and actual results
model_1_predictions = [0.7, 0.8, 0.2, 0.4]
model_1_actual_results = ['Y', 'N', 'Y', 'N']

model_2_predictions = [0.7, 0.3, 0.8, 0.2]
model_2_actual_results = ['Y', 'N', 'Y', 'N']

# Calculate likelihoods for both models
likelihood_model_1 = calculate_likelihood(model_1_predictions, model_1_actual_results)
likelihood_model_2 = calculate_likelihood(model_2_predictions, model_2_actual_results)

# Print the likelihoods
print(f"Likelihood of Model 1: {likelihood_model_1:.4f} ({likelihood_model_1 * 100:.2f}%)")
print(f"Likelihood of Model 2: {likelihood_model_2:.4f} ({likelihood_model_2 * 100:.2f}%)")

# Determine which model is better
if likelihood_model_1 > likelihood_model_2:
    print("Model 1 is better based on Maximum Likelihood.")
else:
    print("Model 2 is better based on Maximum Likelihood.")


Likelihood of Model 1: 0.0168 (1.68%)
Likelihood of Model 2: 0.3136 (31.36%)
Model 2 is better based on Maximum Likelihood.


## Conclusion
The Maximum Likelihood method provides a quantitative way to evaluate and compare predictive models. By calculating the product of the probabilities of the observed events, we can determine which model is more likely to produce accurate predictions. This approach is widely used in various fields, including statistics, machine learning, and data science, to estimate model parameters and improve predictive performance.