Use Case:
Analyzing the average salary of employees in a company. We have a sample of 20 employees, and we want to make inferences about the average salary for the entire company.

Key Inferential Statistics Concepts:
Confidence Interval: A range of values that is likely to contain the population parameter (e.g., population mean) with a certain level of confidence.

Hypothesis Testing: Testing whether a certain assumption about a population parameter is true.

T-test: A statistical test used to compare the means of two groups when the sample size is small and the population standard deviation is unknown.

Step 1: Confidence Interval
A confidence interval is a range that likely contains the true population mean. We calculate it using the sample mean and standard error.


In [1]:
import numpy as np
import scipy.stats as stats

# Sample data (salaries in thousands)
salaries = [50, 55, 48, 60, 62, 51, 58, 57, 63, 54,
            52, 59, 56, 65, 64, 67, 60, 68, 61, 55]

# Calculate sample mean and sample standard deviation
sample_mean = np.mean(salaries)
sample_std = np.std(salaries, ddof=1)  # Sample standard deviation (ddof=1 for sample)
sample_size = len(salaries)

# Confidence level (95%)
confidence_level = 0.95
alpha = 1 - confidence_level
t_critical = stats.t.ppf(1 - alpha/2, df=sample_size - 1)  # t-critical value for the sample

# Calculate margin of error
margin_of_error = t_critical * (sample_std / np.sqrt(sample_size))

# Calculate the confidence interval
confidence_interval = (sample_mean - margin_of_error, sample_mean + margin_of_error)

print(f"95% Confidence Interval for the mean salary: {confidence_interval}")


95% Confidence Interval for the mean salary: (55.594358723355, 60.905641276645)


Step 2: Hypothesis Testing (One-Sample t-test)
Now, we want to test the hypothesis that the average salary for employees in the company is 55k (population mean).

Hypotheses:
Null Hypothesis (H₀): The average salary is 55k (i.e., population mean = 55k).

Alternative Hypothesis (H₁): The average salary is not 55k (i.e., population mean ≠ 55k).

In [3]:
# Population mean (known value)
population_mean = 55

# Perform a one-sample t-test
t_statistic, p_value = stats.ttest_1samp(salaries, population_mean)

# Set significance level (alpha = 0.05 for 95% confidence)
alpha = 0.05

# Determine if we reject the null hypothesis
if p_value < alpha:
    result = "Reject the null hypothesis: The sample mean is significantly different from 55k."
else:
    result = "Fail to reject the null hypothesis: The sample mean is not significantly different from 55k."

print(f"T-statistic: {t_statistic}")
print(f"P-value: {p_value}")
print(result)


T-statistic: 2.5614634915678702
P-value: 0.019088847933522755
Reject the null hypothesis: The sample mean is significantly different from 55k.
