In [10]:
import pandas as pd
import numpy as np

# import custom module
import sys
sys.path.append('../Modules')
import inference as inf

import warnings
warnings.filterwarnings("ignore")

**Scenario**: A manager wants to evaluate the average response time to customer emails. The company has historically recorded the variance in response times as 10 minutes². Using recent sample data of response times, the manager calculates whether the average response time meets the company's target of 30 minutes.

**Practical Use**: Ensuring that customer response times meet service level agreements and identifying areas for improvement.

In [11]:
# Inference on the mean when the variance is known
n = 50 
df = pd.DataFrame({
    'Response Time (mins)': np.random.normal(loc=32, scale=np.sqrt(10), size=n)
})
display(df.head())

Unnamed: 0,Response Time (mins)
0,27.231444
1,35.366582
2,30.969213
3,34.928073
4,33.896805


In [12]:
mean = df["Response Time (mins)"].mean()
var = df["Response Time (mins)"].var()

In [13]:
z_score, p_value = inf.z_test_known_variance(df, population_mean=mean, known_variance=var)
print("Z Score:", z_score)
print("p-value:", p_value)

Z Score: 0.0
p-value: 1.0


**Scenario**: A project manager assesses the average number of tasks completed by employees per week. As the variance in task completion is not well-documented, the manager samples current data to infer the population mean and determine if productivity has changed due to a new workflow system.

**Practical Use**: Evaluating the effectiveness of new processes and setting realistic productivity expectations.

In [14]:
# Inference on the mean when the variance is unknown
n = 30
df = pd.DataFrame({
    'Tasks Completed': np.random.normal(loc=28, scale=4, size=n) # Sample mean of 28, SD unknown
})
display(df.head())

Unnamed: 0,Tasks Completed
0,26.050377
1,25.757384
2,28.801116
3,26.58348
4,21.654541


In [15]:
mean = df["Tasks Completed"].mean()

In [16]:
t_score, p_value = inf.t_test_unknown_variance(df, population_mean=mean)
print("t Score:", t_score)
print("p-value:", p_value)

t Score: Tasks Completed    0.0
dtype: float64
p-value: 1.0


**Scenario**: An operations analyst examines the consistency of daily report generation times. If the variance in report generation time increases beyond acceptable levels, it may indicate inefficiencies or technical issues.

**Practical Use**: Identifying process inconsistencies and potential operational bottlenecks requiring attention.

In [17]:
# Inference on the variance or standard deviation
n = 40
df = pd.DataFrame({
    'Report Generation Time (mins)': np.random.normal(loc=10, scale=3, size=n)
})
display(df.head())

Unnamed: 0,Report Generation Time (mins)
0,7.8184
1,9.546044
2,9.278505
3,9.502205
4,11.887771


In [18]:
chi_square_stat, p_value = inf.chi_square_variance_test(df, hypothesized_variance=0.25)
print("Chi-squared Statistic:", chi_square_stat)
print("p-value:", p_value)

Chi-squared Statistic: Report Generation Time (mins)    923.487416
dtype: float64
p-value: [0.]


**Scenario**: A human resources coordinator compares the average hours worked per week between two departments. Assuming similar variability in work hours, this comparison helps determine if one department is potentially overburdened.

**Practical Use**: Ensuring fair workload distribution across departments and maintaining employee satisfaction.

In [19]:
# Inference on the means of two samples when their standard deviations are unknown
# The Common Variance Case

n = 35
df = pd.DataFrame({
    'Dept A': np.random.normal(loc=40, scale=5, size=n),
    'Dept B': np.random.normal(loc=42, scale=5, size=n)
})
display(df.head())

Unnamed: 0,Dept A,Dept B
0,42.052414,35.575648
1,37.226937,38.146633
2,45.811075,38.050858
3,37.720732,46.219098
4,34.23383,39.741047


**Scenario**: A marketer measures employee engagement levels before and after implementing a new office layout intended to promote collaboration. Each employee's engagement score is recorded before and after the change.

**Practical Use**: Evaluating the effectiveness of workplace environment changes on employee engagement and morale.

In [None]:
# Inference on the means of two samples
# Paired Observations Case

n = 20
df = pd.DataFrame({
    'Before Layout Change': np.random.normal(loc=75, scale=10, size=n),
    'After Layout Change': np.random.normal(loc=78, scale=10, size=n)
})
display(df.head())

Unnamed: 0,Before Layout Change,After Layout Change
0,69.206718,78.871085
1,74.711407,65.198257
2,60.262622,82.841091
3,59.358189,71.525173
4,87.397679,71.153638


**Scenario**: A financial analyst examines the variability in expenses between two budgeting periods to assess the impact of fiscal policy changes. Variability can indicate financial control effectiveness or lapses.

**Practical Use**: Understanding financial fluctuations and implementing strategic budgeting practices.

In [None]:
# Inference on the variance or standard deviations of two samples

n = 50
df = pd.DataFrame({
    'Period 1': np.random.normal(loc=1000, scale=100, size=n),
    'Period 2': np.random.normal(loc=1000, scale=120, size=n)
})
display(df.head())

Unnamed: 0,Period 1,Period 2
0,1054.437654,1106.163331
1,981.756747,1001.071999
2,781.728279,946.348358
3,1195.428291,870.393958
4,1046.048175,911.395148
