Maximum Likelihood Estimation (MLE) and Least Squares Estimation (LSE) are two fundamental approaches in statistical estimation and inference, each with its principles and applications. While both are used to estimate the parameters of a model given a set of observations, they differ in their methodologies, assumptions, and the contexts in which they are most effectively applied.

### Maximum Likelihood Estimation (MLE)

1. **Principle**: MLE seeks the parameter values that maximize the likelihood function, which represents the probability of observing the given data under a specific statistical model. The likelihood is a function of the parameters of the model.

2. **Assumptions**: It requires assumptions about the statistical distribution that the data follows. The choice of distribution (e.g., normal, binomial) is crucial for the formulation of the likelihood function.

3. **Application**: MLE is widely used in various fields for parameter estimation when the underlying distribution of the data is known or assumed. It is especially powerful in complex models, including those where parameters are not linearly related to the predictors.

4. **Properties**: Under certain regularity conditions, MLE estimators are asymptotically unbiased, consistent, and efficient (i.e., they achieve the lowest possible variance among all unbiased estimators).

### Least Squares Estimation (LSE)

1. **Principle**: LSE seeks the parameter values that minimize the sum of the squared differences (residuals) between the observed values and the values predicted by the model. It focuses on minimizing the error in terms of the Euclidean distance between observed and predicted values.

2. **Assumptions**: LSE often assumes that the relationship between the model parameters and the response variable is linear. However, nonlinear least squares methods also exist. The basic form assumes homoscedasticity (constant variance of errors) and independence of errors.

3. **Application**: LSE is the foundation of linear regression and is extensively used in fitting linear models. It is preferred for its simplicity and computational efficiency in problems where the relationship between variables is expected to be linear or approximated as linear.

4. **Properties**: LSE estimators are unbiased in linear regression models with normally distributed errors. They are also consistent and efficient under the Gauss-Markov theorem, which does not require the normality assumption.

### Key Differences

- **Foundation**: MLE is based on probability and focuses on maximizing the likelihood of observing the data given the parameters, while LSE is based on geometry/minimization of error and focuses on minimizing the discrepancy between observed values and those predicted by the model.
- **Assumptions**: MLE requires assumptions about the entire probability distribution of the data, whereas LSE typically revolves around the assumption of a linear relationship and homoscedasticity.
- **Application Contexts**: MLE is more versatile in handling complex models and distributions, making it suitable for a wide range of statistical models beyond linear regression. LSE is predominantly used in linear models and for problems where minimizing the error sum of squares is directly relevant.

Both methods have their strengths and are chosen based on the specifics of the statistical problem at hand, including the underlying assumptions about the data and the model.

To illustrate both Maximum Likelihood Estimation (MLE) and Least Squares Estimation (LSE) in Julia, let's consider simple examples: estimating the parameters of a normal distribution using MLE and fitting a linear model using LSE.

### Example 1: Maximum Likelihood Estimation (MLE) for a Normal Distribution

Suppose we have a sample from a normal distribution, and we want to estimate its mean ($\mu$) and standard deviation ($\sigma$) using MLE.



In [1]:
using Statistics

# Sample data
data = [1.2, 2.3, 2.1, 1.8, 2.5, 1.9]

# MLE estimations for normal distribution
μ_hat = mean(data)
σ_hat = std(data)

println("MLE Estimates:")
println("Mean (μ) estimate: ", μ_hat)
println("Standard Deviation (σ) estimate: ", σ_hat)


MLE Estimates:
Mean (μ) estimate: 1.9666666666666666
Standard Deviation (σ) estimate: 0.4546060565661952


This code calculates the MLE estimates for the parameters of a normal distribution directly, as the MLE for a normal distribution's mean and standard deviation are the sample mean and standard deviation, respectively.

### Example 2: Least Squares Estimation (LSE) for Linear Regression

For LSE, we'll fit a simple linear model to data points. We'll use Julia's `GLM` package for linear regression, which inherently uses the least squares method for estimating the coefficients.


In [9]:
using GLM
using DataFrames

# Sample data
data = DataFrame(X = [1, 2, 3, 4, 5], Y = [2, 4, 5, 4, 5])

# Fit a linear model
model = lm(@formula(Y ~ X), data)

println("LSE Linear Regression Coefficients:")
println(coef(model))


LSE Linear Regression Coefficients:
[2.1999999999999993, 0.6000000000000003]


Before running this code, ensure you have the necessary packages installed:

```julia
using Pkg
Pkg.add("GLM")
Pkg.add("DataFrames")
```

This example demonstrates fitting a linear regression model to the `data` DataFrame using the `lm` function from the `GLM` package, which implements the LSE method. The model estimates the relationship between `X` and `Y` by minimizing the sum of squared residuals between the observed and predicted `Y` values.