# D’Agostino and Pearson’s test
D’Agostino and Pearson’s test is a statistical normality test that evaluates a dataset following a Gaussian distribution (normal shape). Unlike simpler normality tests, it can be used to examine both skewness and kurtosis, assessing deviation from normality. Skewness and kurtosis are statistical measures which describes different aspects of a data distribution's shape. Skewness value represents to the asymmetry of a distribution: zero (perfectly normal distribution), positive (longer tail on the right side), and negative (longer tail on the left side). Kurtosis can represent the tailedness and peak sharpness of a distribution: Mesokurtic (= 3, normal), Leptokurtic (> 3, heavy tails and sharp peak), and Platykurtic (< 3, light tails and flatter peak). Here are mathematical definitions.

**Skewness ($g_1$):**
$$Skewness (g_1) = \frac{\frac{1}{n} \sum_{i=1}^{n}(x_i - \bar x)^3}{(\frac{1}{n} \sum_{i=1}^{n}(x_i - \bar x)^2)^{3/2}}$$

**Kurtosis ($g_2$) and Excess Kurtosis:**
$$Kurtosis (g_2) = \frac{\frac{1}{n} \sum_{i=1}^{n}(x_i - \bar x)^4}{(\frac{1}{n} \sum_{i=1}^{n}(x_i - \bar x)^2)^2}$$
$$Excess Kurtosis = Kurtosis (g_2) - 3$$

**Transform Skewness and Kurtosis to Normal Variable ($Z_1$ and $Z_2$):**
$$Z_1 = \frac{\sqrt{n(n-1)}}{n-2} . \frac{g_1}{\sqrt{6/n}}$$
$$Z_2 = \frac{g_2}{\sqrt{24/n}}$$

**Compute Test Statistic ($K_2$):**
$$K_2 = Z_1^2 + Z_2^2$$ 

**Compute the p-value:**
$$p = P(χ_2^2 > K^2)$$
A p-value is then calculated from the χ2 distribution. If the p-value is below a significance level (commonly 0.05), we reject the null hypothesis and conclude that the data are not normally distributed.\
Additionally, the below figure shows how the D'Agostino and Pearson's test evaluates the normality of data distributions by analyzing both skewness and kurtosis. The distribution of each dataset is evaluated, and the resulting p-values reveal whether the data significantly deviates from normality. Features marked in red indicate where the null hypothesis of normality is rejected, highlighting the test’s effectiveness in detecting non-normal distributions.

<img src="https://www.biorxiv.org/content/biorxiv/early/2022/12/01/2022.12.01.518717/F9.large.jpg" alt="Description of the image"> \
Figure 1: Histogram of color features annotated with p-values for D'Agostino and Pearson's normality test. Features with statistically significant results (indicating non-normality) are highlighted in red. [1]

**What situation might someone want to perform this test:** \
The D’Agostino and Pearson’s test is most useful when a researcher would like to determine a dataset follows a normal distribution. It contains some tests i.e. t-test, ANOVA, and linear regression. This test is particularly well-suited for moderate to large sample sizes. In biological research, it is useful for determining whether experimental measurements such as protein expression levels or fluorescence intensity, following a normal distribution before applying parametric statistical models. Additionally, the test early in the analyzation step can help to ensure the validity and reliability of subsequent inferences.

References: \
[1] Radulescu, A., van Opheusden, B., Callaway, F., & Hillis, J. M. (2022). Modeling human eye movements during immersive visual search. Doi: 10.1101/2022.12.01.518717 Links to an external site. \
[2] Weisstein, Eric W. "D'Agostino-Pearson Test." From MathWorld--A Wolfram Web Resource. https://mathworld.wolfram.com/search/?q=DAgostino+Pearson+Test \
[3] D'Agostino, R. B., & Pearson, E. S. (1973). Tests for Departure from Normality: Empirical Results for the Distributions of b2 and √b1. Biometrika, 60(3), 613–622. https://doi.org/10.1093/biomet/60.3.613 Links to an external site. \
[4] SciPy Community. (n.d.). scipy.stats.normaltest. SciPy v1.15.2 Manual. https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.normaltest.html Links to an external site.

In [2]:
import numpy as np
from scipy.stats import skew, kurtosis, skewtest, kurtosistest, normaltest

def dagostino(data):
    '''
    Perform D'Agostino and Pearson's test for normality and return detailed diagnostics.

    Parameters:
    data : array_like
        A one-dimensional array of sample data.

    Returns:
    results : dict
        Dictionary containing:
        - 'k2_statistic': combined test statistic
        - 'p_value': p-value of the test
        - 'skewness': sample skewness
        - 'kurtosis': sample excess kurtosis
        - 'z_skew': skewness Z-score
        - 'z_kurt': kurtosis Z-score

    Example:
    >>> data = [-1.83, -0.58, 0.70, -1.35, 0.89, -0.18, 2.88, 0.94, -0.43, 0.27]
    >>> result = dagostino(data)
    >>> for key, val in result.items():
            print(f"{key}: {val:.4f}")

    Notes:
    This function evaluates both skewness and kurtosis to test for normality using the 
    D'Agostino and Pearson test. It provides individual shape measures as well 
    as the combined test statistic and p-value.
    '''
    # Calculate skewness and kurtosis
    skew_val = skew(data)
    kurt_val = kurtosis(data)  # excess kurtosis by default

    # Get Z-scores for skewness and kurtosis
    z_skew, _ = skewtest(data)
    z_kurt, _ = kurtosistest(data)

    # Combined K² test
    k2_stat, p_value = normaltest(data)

    # Return all results as a dictionary
    return {
        'K2_statistic': k2_stat,
        'p_value': p_value,
        'Skewness': skew_val,
        'Excess Kurtosis': kurt_val,
        'Z-score (skewness)': z_skew,
        'Z-score (kurtosis)': z_kurt
    }

## Example: How to use this function ##
data = [-1.83938409, -0.58388602, 0.70308452, -1.35034035, 0.89174939,
        -0.18414155, 2.88128236, 0.94245998, -0.42950917, 0.27018622]
result = dagostino(data)
for key, val in result.items():
    print(f"{key}: {val:.4f}")

K2_statistic: 1.5939
p_value: 0.4507
Skewness: 0.5134
Excess Kurtosis: 0.0056
Z-score (skewness): 0.9098
Z-score (kurtosis): 0.8753
