In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
from scipy.stats import shapiro

pd.options.display.float_format = "{:.2f}".format

In [2]:
df = pd.read_excel("3. Case 2 - One-sample t-test.xlsx")

In [3]:
# Print data head
df.head()

Unnamed: 0,Product ID,Product Name,Price,Volume,Volume_in_liters,Price_per_EQ
0,5113468,Apple Pulse,4.15,500,0.5,8.3
1,1906188,Banana Pulse,4.75,500,0.5,9.5
2,5709623,Banana Rumble,4.3,500,0.5,8.6
3,2642672,Berry Ride,4.55,500,0.5,9.1
4,1997170,Blackberry Sizzle,4.2,500,0.5,8.4


In [4]:
df["Price_per_EQ"].describe()

count   100.00
mean      6.43
std       1.37
min       3.02
25%       5.42
50%       6.24
75%       7.43
max       9.50
Name: Price_per_EQ, dtype: float64

In [5]:
# Variable data to array
data = df["Price_per_EQ"].values

The scipy.stats.ttest_1samp() function from SciPy is utilized to execute a one-sample t-test. This test is designed to assess if the mean of a given sample significantly deviates from a known population mean. The main parameters of the function include:

- **a**: This parameter represents the sample data as an array-like object (list, tuple, NumPy array, etc.). It’s the set of observations for which you want to perform the one-sample t-test. The data should be a random sample from the population for comparison.

- **popmean**: This is the known or hypothesized population mean value against which the sample's mean is evaluated. The t-test determines if there is a significant discrepancy between the sample's mean and this population mean.

- **alternative**: This parameter specifies the type of alternative hypothesis to be tested. The default value is 'two-sided,' which means the test will check for a significant difference in either direction—whether the sample mean is greater or smaller than the population mean. 

Other options include 'greater' and 'less,' which test whether the sample mean is significantly greater or smaller than the population mean. We defined the alternative hypothesis: "The sample mean PPV is different from the population mean (6.21)." We can, therefore, omit the parameter or set it to ‘two-sided.’

Before running the test, set the significance level alpha to 0.05.

In [6]:
# Perform one-sample t-test
stats.ttest_1samp(a=data, popmean=6.21, alternative = "two-sided")

TtestResult(statistic=1.5858764248796753, pvalue=0.11595645568615834, df=99)

The output contains two values: the t-statistic and the p-value. 

•	**statistic (t-statistic)**: The t-statistic measures how far the sample mean is from the population mean relative to the standard error of the sample. A considerable absolute value of the t-statistic indicates that the sample mean is far from the population mean—suggesting that there may be a significant difference between the two.

•	**pvalue (p-value)**: The p-value measures the probability of observing a test statistic as extreme or more extreme than the one obtained, assuming that the null hypothesis—i.e., no difference between the sample mean and population mean—is true. A small p-value indicates that the observed difference between the sample mean and population mean is unlikely to have occurred by chance alone. Therefore, we can reject the null hypothesis in favor of the alternative hypothesis.

The p-value of 0.1160 represents the probability of observing a t-statistic as extreme as 1.5859 (or its negative counterpart), assuming that the population mean is 6.21. Since the p-value is above the significance level of 0.05, it doesn’t provide strong evidence against the null hypothesis.

Therefore, based on this one-sample t-test and at a 0.05 significance level, we do not have enough evidence to reject the null hypothesis. We cannot conclude that the data sample mean significantly differs from the population mean of 6.21.

In other words, since the p-value (0.1160) is greater than the alpha (0.05), we fail to reject the null hypothesis. Furthermore, we conclude that the mean price per volume of the products in the sample (6.43)—although slightly greater than this population mean (6.21)—is not *significantly* different from the population mean at 0.05.

Notice that for the one-sample t-test, the formula to determine degrees of freedom is as follows:

Notice that for the one-sample t-test, the formula to determine degrees of freedom is: 

Df = N-1

where:
- Df = degrees of freedom
- N = sample size

Therefore, in this case, Df = 99