In [1]:
import pandas as pd
import numpy as np
from scipy import stats
from scipy.stats import levene

pd.options.display.float_format = "{:.2f}".format

In [2]:
# Read case data
df = pd.read_excel("4. Case 3 - Two-sample t-test.xlsx")

In [3]:
df.head()

Unnamed: 0,Store Id,Display Type,Sales Volume
0,3846186,Old_display_type,2038.31
1,1083410,Old_display_type,2017.29
2,4278951,Old_display_type,1746.47
3,6670048,Old_display_type,2061.78
4,5054220,Old_display_type,2253.76


In [4]:
# Create arrays with units sold for 'Old_display_type' and 'New_display_type'
data_o = df[df["Display Type"] =="Old_display_type"]["Sales Volume"].values
data_n = df[df["Display Type"] =="New_display_type"]["Sales Volume"].values

The **scipy.stats.levene()** function in SciPy conducts Levene's test to assess whether two or more groups have equal variances, an essential assumption in various statistical tests. 

The following is a breakdown of the main parameter and how to interpret the test.

**Main Parameter**

center: This parameter selects the measure of central tendency for the test. Available options are 'median,' 'mean,' and 'trimmed': 

* **median**: Ideal for skewed distributions, this setting reduces outlier influence, enhancing robustness in non-normal data. 
* **mean**: Suitable for symmetric, moderately tailed data, the mean is preferable when the distribution approximates normality. 
* **trimmed**: Perfect for heavy-tailed distributions, this choice trims the data to lessen the impact of extreme values.


**Null Hypothesis (H0)**: There’s no significant difference in variances across the groups.

If p-value > alpha: We do not reject H0, suggesting that the sample variances are roughly equal or not significantly different.

If p-value < alpha: We reject H0, indicating significant differences in variances among the groups, implying that at least one group's variance is distinct.

Additionally, you can calculate the variances of individual samples directly using the **.var()** method on DataFrame columns to get a preliminary sense of the data's dispersion before conducting Levene's test.

For this analysis, we'll apply a significance level (alpha) of 0.05 to determine the statistical significance of our findings.

In [5]:
# Levene's test centered at the mean
print(stats.levene(data_o, data_n, center='mean'))

LeveneResult(statistic=0.402449870750646, pvalue=0.5273048920979523)


The output indicates that the p-value is 0.5273, greater than 0.05. Based on this result, we do not reject the null hypothesis, suggesting that the variances of the two sales datasets are equal. 

This implies that the variances in sales volume between the old and new display types are not significantly different at the 0.05 significance level. Consequently, we can proceed with future statistical tests assuming equal variances.