# Lab | Hypothesis Testing

In [None]:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import ttest_1samp, t


## Data Loading and Exploration

In [None]:

# Load the provided dataset
employee_data = pd.read_csv('/mnt/data/Current_Employee_Names__Salaries__and_Position_Titles.csv')

# Display the first few rows of the dataset to understand its structure
employee_data.head()



## Hypothesis Formulation

**Research Question**:
"Is the average annual salary of employees greater than $75,000?"

**Hypotheses**:
- \( H_0 \): The average annual salary of employees is $75,000 (null hypothesis).
- \( H_a \): The average annual salary of employees is greater than $75,000 (alternative hypothesis).


## Hypothesis Testing

In [None]:

# Filter out null values from the 'Annual Salary' column
annual_salaries = employee_data['Annual Salary'].dropna()

# Conduct a one-sample t-test
t_stat, p_value = ttest_1samp(annual_salaries, 75000)

t_stat, p_value/2  # We divide p_value by 2 for a one-tailed test


## Confidence Intervals Construction

In [None]:

# Calculate the sample mean and standard error
mean_salary = annual_salaries.mean()
stderr = annual_salaries.sem()

# Degrees of freedom
df = len(annual_salaries) - 1

# Confidence level and alpha
confidence_level = 0.95
alpha = 1 - confidence_level

# Calculate the t critical value for a 95% confidence interval
t_critical = t.ppf(1 - alpha/2, df)

# Calculate the margin of error
margin_error = t_critical * stderr

# Calculate the confidence interval
conf_interval = (mean_salary - margin_error, mean_salary + margin_error)

conf_interval



## Interpretation and Conclusions

The p-value obtained from the t-test is effectively zero, indicating strong evidence against the null hypothesis. This suggests that the average annual salary of employees is significantly greater than $75,000.

Furthermore, the 95% confidence interval for the average annual salary is approximately between $88,802.56 and $89,321.03. This interval provides a range in which we are confident the true average annual salary lies. Given that this entire range is above the $75,000 threshold posited in our hypothesis, it confirms our earlier conclusion.
