# **HYPOTHESIS TESTING**

## ***Introduction***
This notebook performs hypothesis testing and A/B testing on job market trends using `cleaned_jobs.csv`. 
We aim to test salary variations, skill demand, and experience impact using statistical methods.

In [1]:
!conda install --yes --file requirements.txt

Retrieving notices: done


CondaFileIOError: 'requirements.txt'. [Errno 2] No such file or directory: 'requirements.txt'






## **Importing Libraries**

In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import scipy.stats as stats

## **Step-1: *Load Data***

> Load & Prepare Data

We will load the cleaned dataset and ensure it is ready for hypothesis testing.

In [4]:
df = pd.read_csv('../Data/cleaned_jobs.csv')
df.head()

Unnamed: 0,Job Title,Company,Location,Skills,Experience Required,Salary,Date Posted
0,Data Scientist,Amazon,"Mumbai, India","Tableau, Excel, R",6,₹10L per annum,Posted 9 days ago
1,Data Scientist,Google,"Chennai, India","Data Wrangling, Pandas, Numpy",6,₹17L per annum,Posted 13 days ago
2,Data Scientist,Flipkart,"Chennai, India","Machine Learning, Deep Learning",9,₹9L per annum,Posted 7 days ago
3,Machine Learning Engineer,Infosys,"Pune, India","Machine Learning, Deep Learning",4,₹19L per annum,Posted 5 days ago
4,Machine Learning Engineer,Deloitte,"Pune, India","Python, Sql, Power Bi",3,₹6L per annum,Posted 9 days ago


## **Step-2: *Hypothesis Testing:***

### **Hypothesis 1: *Salary Differences by Location (T-test)***

> ***Question:*** Are Data Scientist salaries in Bangalore higher than in Mumbai?

We compare salaries between Bangalore and Mumbai using an independent t-test.

- ***Test Used:*** Independent T-test
- ***Null Hypothesis (H₀):*** Salaries in Bangalore and Mumbai are the same.
- ***Alternative Hypothesis (H₁):*** Salaries in Bangalore are higher.

In [4]:
# Filter data for Bangalore and Mumbai
bangalore_salaries = df[df["Location"].str.contains("Bangalore", case=False)]["Salary"]
mumbai_salaries = df[df["Location"].str.contains("Mumbai", case=False)]["Salary"]

# Perform Independent T-test
t_stat, p_value = stats.ttest_ind(bangalore_salaries, mumbai_salaries, nan_policy="omit")

print(f"T-Statistic: {t_stat:.2f}")
print(f"P-Value: {p_value:.5f}")

# Interpret results
alpha = 0.05
if p_value < alpha:
    print("✅ Reject Null Hypothesis: Salaries in Bangalore and Mumbai are significantly different.")
else:
    print("❌ Fail to Reject Null Hypothesis: No significant salary difference between Bangalore and Mumbai.")


NameError: name 'df' is not defined

### **Hypothesis 2: *Skill Demand (Chi-Square Test)***

> ***Question:*** Are Python and SQL equally in demand?

We check if Python and SQL have the same demand in job listings.
* ***Test Used:*** Chi-Square Test
* ***Null Hypothesis (H₀):*** Python and SQL appear equally in job listings.
* ***Alternative Hypothesis (H₁):*** One skill is significantly more in demand.

### **Hypothesis 3: *Experience vs. Salary Correlation (Pearson Correlation)***
> ***Question:*** Does more experience lead to higher salaries?

We analyze whether experience has a strong correlation with salary.
- ***Test Used:*** Pearson Correlation
- ***Null Hypothesis (H₀):*** There is no correlation between experience and salary.
- ***Alternative Hypothesis (H₁):*** There is a positive correlation.

### **Hypothesis 4: *A/B Testing on Job Salaries***
> ***Question:*** Do Data Scientists earn more than ML Engineers?

We compare salaries between Data Scientists and ML Engineers using A/B testing.
- ***Test Used:*** Two-Sample T-test (A/B Testing)
- ***Null Hypothesis (H₀):*** Data Scientists and ML Engineers have similar salaries.
- ***Alternative Hypothesis (H₁):*** Data Scientists earn significantly more.

## **Conclusion & Insights**

Summarizing the key insights from our hypothesis testing results.