## t-test
The code performs a two-sample t-test and calculates a one-sided p-value.
This code is using SciPy's stats module to compare session times between two different pages (Page A and Page B).

A t-test is a statistical test that helps us determine if there's a significant difference between the means of two groups. In this case, we're comparing the time users spend on Page A versus Page B.

a t-test is a statistical test that helps us determine if there's a significant difference between the means of two groups. In this case, we're comparing the time users spend on Page A versus Page B.

In [1]:
from pathlib import Path
import random

import pandas as pd
import numpy as np

from scipy import stats
import statsmodels.api as sm
import statsmodels.formula.api as smf
from statsmodels.stats import power

import matplotlib.pylab as plt

In [13]:
session_times = pd.read_csv('web_page_data.csv')
session_times.head()

Unnamed: 0,Page,Time
0,Page A,0.21
1,Page B,2.53
2,Page A,0.35
3,Page B,0.71
4,Page A,0.67


In [29]:

    # Splitting the data into two groups using their "Page" index, storing all the "Time" values in an dataframe for each page:

group_a = session_times[session_times.Page == 'Page A'].Time  # Time values for Page A
group_b = session_times[session_times.Page == 'Page B'].Time  # Time values for Page B


# With SciPy module

    # Performing the t-test
    # The "ttest_ind" function performs an independent t-test, with these key parameters:
    #  - The first two arguments are the time values for each group we're comparing
    #  - equal_var=False tells the function to use Welch's t-test, which doesn't assume equal variances between groups 
    #    (this is generally safer than the standard t-test)

res = stats.ttest_ind(group_a, group_b, equal_var=False)

    
    # This converts the two-sided p-value to a one-sided p-value by dividing by 2. This is important because
    # a two-sided test looks for differences in either direction (A > B or A < B)
    # A one-sided test looks for differences in only one direction (e.g., only testing if A > B)
    # We divide by 2 because if we're only interested in one direction, we only need half of the two-sided p-value

    # The resulting p-value tells us the probability of seeing such extreme differences between the groups 
    # if there were truly no underlying difference. 
    # A small p-value (typically < 0.05) suggests the difference between pages is statistically significant.

print(f'p-value for single sided test: {res.pvalue / 2:.4f}')


p-value for single sided test: 0.1408


In [33]:
        # With StatsModel module (this returns 3 values)
     
    # The most significant difference is how the one-sided test is handled. 
    # In the previous SciPy version, we had to manually divide the p-value by 2 to get a one-sided result. 
    # Here, statsmodels provides a more explicit way to specify this through the alternative='smaller' parameter. 
    # This parameter tells the function we're specifically testing if Page A times are smaller than Page B times.
    
    # The usevar='unequal' parameter serves the same purpose as equal_var=False did in the SciPy version – 
    # it tells the function to use Welch's t-test, which doesn't assume equal variances between the groups. 
    # This is generally considered a safer choice since real-world data often has different variances between groups.

    # The main advantage of the statsmodels version is that it's more explicit about what you're testing – 
    # instead of having to remember to divide by 2 for a one-sided test, 
    # you can directly specify what you're looking for through the alternative parameter. 
    # This makes the code more self-documenting and less prone to errors.

tstat, pvalue, df = sm.stats.ttest_ind(
    session_times[session_times.Page == 'Page A'].Time, 
    session_times[session_times.Page == 'Page B'].Time,
    usevar='unequal', alternative='smaller')
print(f'p-value: {pvalue:.4f}')

p-value: 0.1408


pandas.core.series.Series