<div style="background-color: #00008B; padding: 20px;">
    <h1 style="font-size: 100px; color: #ffffff;">Maximum Likelihood</h1>
</div>

<div style="border: 2px solid purple; border-radius: 10px; padding: 15px;">
    <h2 style="color: black;">Maximum Likelihood Estimation (MLE):</h2>
    <p style="font-size: 16px; color: black;">
        <b style="color: purple;">Maximum Likelihood Estimation (MLE)</b> is a statistical method used to estimate the parameters of a probability distribution by maximizing a likelihood function. This technique is widely used in various fields, including statistics, machine learning, and data analysis.
    </p>
    <h3 style="color: black;">What is MLE?</h3>
    <p style="font-size: 16px; color: black;">
        MLE seeks to find the parameter values that make the observed data most probable. By maximizing the likelihood function, we identify the set of parameters that best explains the data. This method leverages the information in the data to provide robust estimates, making it a cornerstone in the field of statistical inference.
    </p>
    <h3 style="color: black;">Why is MLE Important?</h3>
    <p style="font-size: 16px; color: black;">
        MLE is crucial because it provides a consistent and efficient way to estimate parameters. Its properties include:
    </p>
    <ul style="font-size: 16px; color: black;">
        <li style="margin-bottom: 10px;"><b style="color: purple;">Consistency:</b> As the sample size increases, the MLE estimates converge to the true parameter values.</li>
        <li style="margin-bottom: 10px;"><b style="color: purple;">Efficiency:</b> Among all unbiased estimators, MLE has the smallest variance, making it highly reliable.</li>
        <li style="margin-bottom: 10px;"><b style="color: purple;">Invariance:</b> The MLE of a function of a parameter is the function of the MLE of the parameter.</li>
    </ul>
    <h3 style="color: black;">Applications of MLE</h3>
    <p style="font-size: 16px; color: black;">
        MLE is used extensively in various domains:
    </p>
    <ul style="font-size: 16px; color: black;">
        <li style="margin-bottom: 10px;"><b style="color: purple;">Machine Learning:</b> MLE is foundational for training models, especially in the context of supervised learning algorithms.</li>
        <li style="margin-bottom: 10px;"><b style="color: purple;">Econometrics:</b> Used for estimating economic models and understanding relationships between economic variables.</li>
        <li style="margin-bottom: 10px;"><b style="color: purple;">Biostatistics:</b> Helps in the estimation of parameters for various biological processes and medical research.</li>
    </ul>
    <h3 style="color: black;">Challenges and Considerations</h3>
    <p style="font-size: 16px; color: black;">
        While MLE is powerful, it has limitations:
    </p>
    <ul style="font-size: 16px; color: black;">
        <li style="margin-bottom: 10px;"><b style="color: purple;">Computational Complexity:</b> For complex models or large datasets, finding the MLE can be computationally intensive.</li>
        <li style="margin-bottom: 10px;"><b style="color: purple;">Sensitivity to Initial Values:</b> The optimization process may be sensitive to the starting values, potentially leading to local maxima.</li>
        <li style="margin-bottom: 10px;"><b style="color: purple;">Assumptions:</b> MLE assumes that the model is correctly specified; if the model is misspecified, the estimates may be biased.</li>
    </ul>
    <p style="font-size: 16px; color: black;">
        Despite these challenges, MLE remains a fundamental and widely used method in statistical inference, providing a robust framework for parameter estimation.
    </p>
</div>


<div style="border: 2px solid purple; border-radius: 10px; padding: 15px;">
    <h2 style="color: black;">Principles of Maximum Likelihood Estimation (MLE):</h2>
    <p style="font-size: 16px; color: black;">
        The <b style="color: purple;">Maximum Likelihood Estimation (MLE)</b> method relies on fundamental principles to estimate the parameters of a statistical model. These principles are rooted in the idea of maximizing the likelihood function, which measures how well the model explains the observed data.
    </p>
    <h3 style="color: black;">Likelihood Function</h3>
    <p style="font-size: 16px; color: black;">
        Given a set of independent and identically distributed (i.i.d.) data points \( X_1, X_2, \ldots, X_n \) drawn from a probability distribution with a parameter \( \theta \), the likelihood function \( L(\theta) \) is defined as the joint probability of the observed data:
    </p>
    <p style="font-size: 16px; color: black; margin-left: 40px;">
        \[
        L(\theta) = P(X_1, X_2, \ldots, X_n \mid \theta) = \prod_{i=1}^{n} P(X_i \mid \theta)
        \]
    </p>
    <h3 style="color: black;">Log-Likelihood Function</h3>
    <p style="font-size: 16px; color: black;">
        For convenience, the log-likelihood function \( \ell(\theta) \) is often used because it transforms the product of probabilities into a sum, which is easier to differentiate and optimize:
    </p>
    <p style="font-size: 16px; color: black; margin-left: 40px;">
        \[
        \ell(\theta) = \log L(\theta) = \log \left( \prod_{i=1}^{n} P(X_i \mid \theta) \right) = \sum_{i=1}^{n} \log P(X_i \mid \theta)
        \]
    </p>
    <h3 style="color: black;">Maximizing the Log-Likelihood</h3>
    <p style="font-size: 16px; color: black;">
        The MLE method estimates the parameter \( \theta \) by finding the value that maximizes the log-likelihood function. This is achieved by solving the following optimization problem:
    </p>
    <p style="font-size: 16px; color: black; margin-left: 40px;">
        \[
        \hat{\theta} = \arg \max_{\theta} \ell(\theta)
        \]
    </p>
    <p style="font-size: 16px; color: black;">
        To find the maximum, we take the derivative of the log-likelihood function with respect to \( \theta \), set it to zero, and solve for \( \theta \):
    </p>
    <p style="font-size: 16px; color: black; margin-left: 40px;">
        \[
        \frac{\partial \ell(\theta)}{\partial \theta} = 0
        \]
    </p>
    <h3 style="color: black;">Second-Order Conditions</h3>
    <p style="font-size: 16px; color: black;">
        To ensure that the solution \( \hat{\theta} \) is indeed a maximum, we also check the second derivative (or Hessian) of the log-likelihood function:
    </p>
    <p style="font-size: 16px; color: black; margin-left: 40px;">
        \[
        \frac{\partial^2 \ell(\theta)}{\partial \theta^2} < 0
        \]
    </p>
    <h3 style="color: black;">Properties of MLE</h3>
    <p style="font-size: 16px; color: black;">
        MLE has several desirable properties:
    </p>
    <ul style="font-size: 16px; color: black;">
        <li style="margin-bottom: 10px;"><b style="color: purple;">Consistency:</b> As the sample size increases, the MLE converges to the true parameter value.</li>
        <li style="margin-bottom: 10px;"><b style="color: purple;">Efficiency:</b> The MLE achieves the lowest possible variance among all unbiased estimators, as per the Cramér-Rao bound.</li>
        <li style="margin-bottom: 10px;"><b style="color: purple;">Invariance:</b> If \( \hat{\theta} \) is the MLE for \( \theta \), then \( g(\hat{\theta}) \) is the MLE for \( g(\theta) \) for any function \( g \).</li>
    </ul>
</div>


<div style="border: 2px solid purple; border-radius: 10px; padding: 15px;">
    <h2 style="color: black;">Prerequisites for Understanding Maximum Likelihood Estimation (MLE)</h2>
    <p style="font-size: 16px; color: black;">
        To fully grasp the principles of <b style="color: purple;">Maximum Likelihood Estimation (MLE)</b>, it is beneficial for the reader to be familiar with several key concepts in statistics. These concepts provide the foundation upon which MLE is built and help in understanding its theoretical properties and practical applications.
    </p>
    <h3 style="color: black;">1. Cramér-Rao Bound</h3>
    <p style="font-size: 16px; color: black;">
        The <b style="color: purple;">Cramér-Rao Bound (CRB)</b> sets a lower limit on the variance of unbiased estimators, providing a benchmark for the efficiency of an estimator. It is crucial for understanding the optimality of MLEs, which often achieve the CRB asymptotically.
    </p>
    <h3 style="color: black;">2. Hessian</h3>
    <p style="font-size: 16px; color: black;">
        The <b style="color: purple;">Hessian</b> matrix is the matrix of second-order partial derivatives of a function. In the context of MLE, the Hessian of the log-likelihood function is used to assess the curvature of the likelihood surface and plays a role in estimating the variance of the MLE.
    </p>
    <h3 style="color: black;">3. Unbiased Estimators</h3>
    <p style="font-size: 16px; color: black;">
        An <b style="color: purple;">unbiased estimator</b> is an estimator whose expected value equals the true value of the parameter being estimated. Understanding unbiasedness is fundamental in statistics, and while MLEs are not always unbiased, they are often preferred due to their other desirable properties.
    </p>
    <h3 style="color: black;">4. Completeness (Statistics)</h3>
    <p style="font-size: 16px; color: black;">
        <b style="color: purple;">Completeness</b> is a property of a statistic that ensures it captures all the information about a parameter contained in the data. It is an advanced concept in statistical theory that, while not directly used in the definition of MLE, provides deeper insights into the sufficiency and efficiency of estimators.
    </p>
</div>


<div style="border: 2px dashed black; border-radius: 10px; padding: 15px; background-color: #E5E4E2;">
    <h2 style="color: blue;">Cramér-Rao Bound</h2>
    <p style="font-size: 16px; color: black;">
        The <b style="color: purple;">Cramér-Rao bound (CRB)</b> provides a lower bound on the variance of an unbiased estimator of a parameter. It is a fundamental result in estimation theory, giving us a benchmark to assess the efficiency of an estimator.
    </p>
    <h3 style="color: black;">Mathematical Formulation</h3>
    <p style="font-size: 16px; color: black;">
        Let \( \theta \) be the parameter to be estimated, and let \( \hat{\theta} \) be an unbiased estimator of \( \theta \). The variance of \( \hat{\theta} \) is bounded below by the inverse of the Fisher information \( I(\theta) \):
    </p>
    <p style="font-size: 16px; color: black; margin-left: 40px;">
        \[
        \text{Var}(\hat{\theta}) \geq \frac{1}{I(\theta)}
        \]
    </p>
    <h3 style="color: black;">Fisher Information</h3>
    <p style="font-size: 16px; color: black;">
        The Fisher information \( I(\theta) \) measures the amount of information that an observable random variable \( X \) carries about an unknown parameter \( \theta \). It is defined as:
    </p>
    <p style="font-size: 16px; color: black; margin-left: 40px;">
        \[
        I(\theta) = \mathbb{E} \left[ \left( \frac{\partial}{\partial \theta} \log f(X; \theta) \right)^2 \right]
        \]
    </p>
    <p style="font-size: 16px; color: black;">
        where \( f(X; \theta) \) is the probability density function (pdf) of \( X \) given \( \theta \), and \( \mathbb{E} \) denotes the expectation with respect to the distribution of \( X \).
    </p>
    <h3 style="color: black;">Implications of the Cramér-Rao Bound</h3>
    <p style="font-size: 16px; color: black;">
        The CRB has several important implications:
    </p>
    <ul style="font-size: 16px; color: black;">
        <li style="margin-bottom: 10px;"><b style="color: purple;">Efficiency:</b> An estimator that achieves the CRB is considered efficient, as it has the smallest possible variance among all unbiased estimators.</li>
        <li style="margin-bottom: 10px;"><b style="color: purple;">Benchmarking:</b> The CRB provides a benchmark to compare the performance of different estimators.</li>
        <li style="margin-bottom: 10px;"><b style="color: purple;">Optimality:</b> Maximum likelihood estimators (MLEs) often achieve the CRB asymptotically, meaning as the sample size increases, their variance approaches the bound.</li>
    </ul>
</div>


<div style="border: 2px dashed black; border-radius: 10px; padding: 15px; background-color: #E5E4E2;">
    <h2 style="color: blue;">Hessian Matrix</h2>
    <p style="font-size: 16px; color: black;">
        The <b style="color: purple;">Hessian matrix</b> is a square matrix of second-order partial derivatives of a scalar-valued function. It describes the local curvature of the function and is used extensively in optimization problems, including Maximum Likelihood Estimation (MLE).
    </p>
    <h3 style="color: black;">Mathematical Definition</h3>
    <p style="font-size: 16px; color: black;">
        For a scalar-valued function \( f(\theta) \) where \( \theta \) is a vector of parameters \( (\theta_1, \theta_2, \ldots, \theta_n) \), the Hessian matrix \( H \) is defined as:
    </p>
    <p style="font-size: 16px; color: black; margin-left: 40px;">
        \[
        H(f(\theta)) = \begin{bmatrix}
        \frac{\partial^2 f}{\partial \theta_1^2} & \frac{\partial^2 f}{\partial \theta_1 \partial \theta_2} & \cdots & \frac{\partial^2 f}{\partial \theta_1 \partial \theta_n} \\
        \frac{\partial^2 f}{\partial \theta_2 \partial \theta_1} & \frac{\partial^2 f}{\partial \theta_2^2} & \cdots & \frac{\partial^2 f}{\partial \theta_2 \partial \theta_n} \\
        \vdots & \vdots & \ddots & \vdots \\
        \frac{\partial^2 f}{\partial \theta_n \partial \theta_1} & \frac{\partial^2 f}{\partial \theta_n \partial \theta_2} & \cdots & \frac{\partial^2 f}{\partial \theta_n^2}
        \end{bmatrix}
        \]
    </p>
    <h3 style="color: black;">Use of Hessian in Maximum Likelihood Estimation (MLE)</h3>
    <p style="font-size: 16px; color: black;">
        In the context of MLE, we aim to find the parameter vector \( \theta \) that maximizes the log-likelihood function \( \ell(\theta; \mathbf{x}) \). The Hessian matrix of the log-likelihood function provides valuable information for this optimization process.
    </p>
    <p style="font-size: 16px; color: black;">
        Specifically, the Hessian is used to:
    </p>
    <ul style="font-size: 16px; color: black;">
        <li style="margin-bottom: 10px;"><b style="color: purple;">Determine the curvature:</b> The Hessian helps us understand the curvature of the log-likelihood function around the maximum likelihood estimate. A positive definite Hessian indicates a local maximum.</li>
        <li style="margin-bottom: 10px;"><b style="color: purple;">Estimate the variance:</b> The inverse of the Hessian matrix at the maximum likelihood estimate provides an estimate of the covariance matrix of the parameter estimates. This is crucial for constructing confidence intervals and hypothesis tests.</li>
        <li style="margin-bottom: 10px;"><b style="color: purple;">Optimize the log-likelihood function:</b> Numerical optimization algorithms, such as Newton-Raphson, use the Hessian to iteratively find the parameter values that maximize the log-likelihood function.</li>
    </ul>
</div>


<div style="border: 2px dashed black; border-radius: 10px; padding: 15px; background-color: #E5E4E2;">
    <h2 style="color: blue;">Unbiased Estimators</h2>
    <p style="font-size: 16px; color: black;">
        An <b style="color: purple;">unbiased estimator</b> is a statistical estimator whose expected value is equal to the true value of the parameter being estimated. This property ensures that the estimator does not systematically overestimate or underestimate the parameter.
    </p>
    <h3 style="color: black;">Mathematical Definition</h3>
    <p style="font-size: 16px; color: black;">
        Let \( \theta \) be the true value of a parameter and let \( \hat{\theta} \) be an estimator of \( \theta \). The estimator \( \hat{\theta} \) is unbiased if:
    </p>
    <p style="font-size: 16px; color: black; margin-left: 40px;">
        \[
        \mathbb{E}[\hat{\theta}] = \theta
        \]
    </p>
    <p style="font-size: 16px; color: black;">
        where \( \mathbb{E}[\hat{\theta}] \) denotes the expected value of \( \hat{\theta} \).
    </p>
    <h3 style="color: black;">Use of Unbiased Estimators in Maximum Likelihood Estimation (MLE)</h3>
    <p style="font-size: 16px; color: black;">
        While unbiasedness is a desirable property in an estimator, MLEs are not always unbiased. However, MLEs have several important properties that make them widely used in practice:
    </p>
    <ul style="font-size: 16px; color: black;">
        <li style="margin-bottom: 10px;"><b style="color: purple;">Asymptotic Unbiasedness:</b> As the sample size increases, the bias of the MLE typically diminishes, making the MLE asymptotically unbiased.</li>
        <li style="margin-bottom: 10px;"><b style="color: purple;">Efficiency:</b> MLEs often achieve the Cramér-Rao bound asymptotically, meaning they have the lowest possible variance among all unbiased estimators for large sample sizes.</li>
        <li style="margin-bottom: 10px;"><b style="color: purple;">Consistency:</b> MLEs are consistent estimators, meaning they converge in probability to the true parameter value as the sample size tends to infinity.</li>
    </ul>
    <p style="font-size: 16px; color: black;">
        Understanding the concept of unbiased estimators helps in appreciating the properties and performance of MLEs in statistical inference.
    </p>
</div>


<div style="border: 2px dashed black; border-radius: 10px; padding: 15px;  background-color: #E5E4E2;">
    <h2 style="color: blue;">Completeness (Statistics)</h2>
    <p style="font-size: 16px; color: black;">
        <b style="color: purple;">Completeness</b> is a fundamental concept in statistics that describes the ability of a statistic to capture all the information about a parameter contained in the data. A statistic is said to be complete if it provides a unique estimate of the parameter and does not lose any information.
    </p>
    <h3 style="color: black;">Mathematical Definition</h3>
    <p style="font-size: 16px; color: black;">
        Let \( T(\mathbf{X}) \) be a statistic, where \( \mathbf{X} = (X_1, X_2, \ldots, X_n) \) is a random sample from a probability distribution with parameter \( \theta \). The statistic \( T(\mathbf{X}) \) is said to be complete if, for any measurable function \( g \), the following implication holds:
    </p>
    <p style="font-size: 16px; color: black; margin-left: 40px;">
        If \( \mathbb{E}[g(T)] = 0 \) for all \( \theta \), then \( g(T) = 0 \) almost surely.
    </p>
    <p style="font-size: 16px; color: black;">
        In simpler terms, if the expected value of any function of the statistic is zero for all possible values of the parameter, then that function must be zero almost surely.
    </p>
    <h3 style="color: black;">Importance and Example</h3>
    <p style="font-size: 16px; color: black;">
        Completeness is a desirable property in statistics because it ensures that estimators derived from the statistic are not biased and fully utilize the available information in the data. A complete statistic provides efficient and unbiased estimates of parameters.
    </p>
    <p style="font-size: 16px; color: black;">
        An example of a complete statistic is the sample mean \( \bar{X} \) for a random sample from a normal distribution with known variance \( \sigma^2 \). The sample mean is complete for estimating the mean parameter \( \mu \) because it contains all the information about \( \mu \) present in the data.
    </p>
    <p style="font-size: 16px; color: black;">
        On the other hand, the sample range \( R = \max(X) - \min(X) \) for a random sample from a normal distribution is not complete for estimating the standard deviation \( \sigma \). This is because the range does not capture all the information about \( \sigma \) present in the data, leading to biased estimators.
    </p>
    <p style="font-size: 16px; color: black;">
        Understanding completeness helps in selecting appropriate statistics for parameter estimation and assessing the efficiency and reliability of estimators.
    </p>
    <h3 style="color: black;">Example Violating Completeness</h3>
    <p style="font-size: 16px; color: black;">
        Consider a sufficient statistic \( T(\mathbf{X}) \) for a parameter \( \theta \). Let \( g(T) = T^2 - \mathbb{E}[T^2] \). If \( T(\mathbf{X}) \) is complete, then \( g(T) = 0 \) almost surely for all values of \( \theta \). However, this is not the case for \( g(T) = T^2 - \mathbb{E}[T^2] \), as it can be non-zero even when \( \mathbb{E}[T^2] = 0 \).
    </p>
</div>
