### Lilliefors Test for Normality

The Lilliefors test determines if a dataset comes from a normal distribution. It is very similar to the Kolmogorov-Smirnov test, except with the Lilliefors test doesn’t require population parameters like the mean and standard deviation. 

#### How it works
The basic idea is that it computes the empirical cumulative distribution function (ECDF) of the data to then estimate the mean and standard deviation from the sample. It creates a theoretical normal distribution with the calculated mean and standard deviation and compares the ECDF with the cumulative distribution function (CDF) of the normal distribution. The maximum vertical difference between the SCDF and CDF is the test statistic (D). From this difference, one can obtain the p-value by simulations or a table. If the p-value is small (< 0.05) the data is not normally distributed, and if the p-value is large, then it is likely normally distributed. 

### Mathematical Definition of the Lilliefors Test

Let:

- $ X_1, X_2, \dots, X_n $ be your sample data  
- $ \bar{X} $ = sample mean  
- $ s $ = sample standard deviation  



**1. Compute the Empirical Cumulative Distribution Function (ECDF):**

$$
F_n(x) = \frac{1}{n} \sum_{i=1}^{n} \mathbf{1}_{X_i \leq x}
$$



**2. Construct the theoretical CDF of a normal distribution using sample mean and std dev:**

$$
F(x) = \Phi\left( \frac{x - \bar{X}}{s} \right)
$$

Where $\Phi $ is the standard normal CDF.



**3. Compute the Lilliefors test statistic $( D )$ as:**

$$
D = \sup_x \left| F_n(x) - F(x) \right|
$$



**4. The p-value** is obtained via Monte Carlo simulation or lookup table for the statistic $( D )$.

#### Real-world use case:
The Lilliefors test is used when a researcher wants to know whether a dataset has a normal distribution or not but does not have the population mean or the standard deviation. Situations like psychology experiments and clinical trials are prime examples of when only the sample data is available, and they need to know if the dataset is normal. A specific example is if a bioengineer obtains patient reaction time data, they might want to use the Lilliefors test to assess normality before doing any other statistical analyses since a lot of statistical methods require you to know whether the data is normal or not.

![Illustration of the Lilliefors test comparing ECDF and normal CDF](Lynd_Figure_D3.png)

Figure 1: This figure shows how the Lilliefors test compares the sample data (blue step curve) to a normal distribution (black curve). The vertical line labeled D marks the largest difference between the two curves. This difference helps determine if the sample data is likely to come from a normal distribution.

In [7]:
# %pip install statsmodels


import importlib.util
import sys
import os

# Dynamically load BIOM480Tests.py from current directory
module_path = os.path.abspath("BIOM480Tests.py")
spec = importlib.util.spec_from_file_location("BIOM480Tests", module_path)
BIOM480Tests = importlib.util.module_from_spec(spec)
sys.modules["BIOM480Tests"] = BIOM480Tests
spec.loader.exec_module(BIOM480Tests)


import numpy as np
from BIOM480Tests import lilliefors

# Generate sample data
data = np.random.normal(loc=0, scale=1, size=100)

# Run the test
stat, p_value = lilliefors(data)
print("Test Statistic:", stat)
print("P-Value:", p_value)

Test Statistic: 0.03999058396053001
P-Value: 0.9609794795752092
