This notebook will demonstrate how to use Python and some of its modules to estimate parameters of a normal distribution for a randomly-generated set of data.  It will use the maximum likelihood estimation (MLE) approach to find the parameters.

Created on August 24, 2022 by Kevin Spradlin, Jr.

First, import some Python modules.

In [6]:
import typing
import math
import numpy as np
import scipy.stats as stats
import scipy.optimize as optimize

Next, define a function for generating random standard normal variates.

In [4]:
def std_normal_leva():
    """
    This function returns a random variate from a standard normal distribution.
    It uses the Leva ratio of uniforms method.
    
    "A Fast Normal Random Number Generator", ACM Transactions on Mathematical Software, 18, 449-455, 1992.
    
    Created on August 24, 2022
    """
    
    uniform_variates: np.ndarray = np.array(2, dtype=np.float32)
    interm_v2: float = 0.0
    param_s: float = 0.449871
    param_t: float = -0.386595
    param_a: float = 0.19600
    param_b: float = 0.25472
    param_r1: float = 0.27597
    param_r2: float = 0.27846
    param_ratio: float = 0.857764    # Sqrt(2.0 / e)

    exit_loop: int = 0    

    
    while exit_loop < 1:
        uniform_variates: np.ndarray = np.random.rand(2)
#        uniform_1 = (<double> base_rng.rand_internal()) / base_rng.base_dbl
#        uniform_2 = (<double> base_rng.rand_internal()) / base_rng.base_dbl

        interm_v2 = (2.0 * uniform_variates[1] - 1.0) * param_ratio

        interm_x: float = uniform_variates[0] - param_s

        interm_y: float = math.fabs(interm_v2) - param_t

        interm_q: float = (param_a * interm_y) - (param_b * interm_x)
        interm_q = (interm_q * interm_y) + (interm_x * interm_x)

        if uniform_variates[0] > 0.00000001:
            if interm_q < param_r1:
                exit_loop = 1
                break
            elif interm_q < param_r2:
                test_value: float = math.log(uniform_variates[0])
                test_value = (test_value * -4.0 * uniform_variates[0] * uniform_variates[0]) - (interm_v2 * interm_v2)

                if test_value > 0.0:
                    exit_loop = 1
                    break
      
    return (interm_v2 / uniform_variates[0])    

Now, create a set of random normal variates using the ***std_normal_leva*** function.

In [26]:
# let the user define the mean and standard deviation of the normal distribution
mean: float = -0.0145
std_dev: float = 0.245
    

# generate 10,000 random normal variates
random_sample: np.ndarray = np.asarray([mean + std_dev * std_normal_leva() for __ in range(10000)], dtype=np.float32)


Before we use the MLE approach, let's calculate the basic statistics of the sample of random normal variates.  They should show that the mean and standard deviation of the sample are close to the values you defined, since the parameters that maximize the likelihood function are the sample mean and biased standard deviation.  The sample statistics should also show that the skewness and excess kurtosis are close to zero.

In [27]:
results = stats.describe(random_sample)
print(f"Sample mean: {results[2]:12.4f}")
print(f"Sample standard deviation: {results[3]**0.5:12.4f}")
print(f"Sample skewness: {results[4]:12.4f}")
print(f"Sample excess kurtosis: {results[5]:12.4f}")

Sample mean:      -0.0155
Sample standard deviation:       0.2454
Sample skewness:      -0.0164
Sample excess kurtosis:      -0.0346


Now, we'll use the MLE approach.
First define a function that calculates the log-likelihood function.  Then find the parameters that maximize the log-likelihood function.

In [28]:
def log_likelihood_norm(parameters: np.ndarray, variates: np.ndarray) -> float:
    """
    Returns the negative of the log-likelihood function.  The scipy.optimize module only
     has a 'minimize' function, so you need to use it to minimize the negative of the
     log-likelihood function, which will maximize the positive of the log-likelihood
     function.

    Expect parameters[0] to be the mean and parameters[1] to be the standard deviation.

    Created on August 24, 2022
    """
    
    temp_sum: float = -np.sum(stats.norm.logpdf(variates, loc=parameters[0], scale=parameters[1]))

    return temp_sum

In [29]:
init_parameters: np.ndarray = np.ones(2, dtype=np.float32)

opt_results = optimize.minimize(log_likelihood_norm, x0=init_parameters, args=(random_sample), method='Nelder-Mead')

print(f"MLE - mean: {opt_results.x[0]:12.4f}")
print(f"MLE - standard deviation: {opt_results.x[1]:12.4f}")

MLE - mean:      -0.0155
MLE - standard deviation:       0.2454
