# Day 24 ‚Äî Statistics Basics for Data Analysis

---

## üìå Objectives
- Understand why statistics is important in data analysis
- Learn descriptive statistics
- Calculate mean, median, mode
- Understand variance & standard deviation
- Analyze salary data statistically
- Detect outliers using IQR

---

## üìÇ Dataset Used
- employee_salary.csv


In [None]:
# 1Ô∏è‚É£ Import Required Libraries
import pandas as pd
import numpy as np


In [None]:
# 2Ô∏è‚É£ Load Dataset
df = pd.read_csv("datasets/employee_salary.csv")

# Rename column
df.rename(columns={'Experience (Years)': 'Experience_Years'}, inplace=True)

df.head()


## 3Ô∏è‚É£ What is Statistics?


Statistics helps us:
- Summarize data
- Identify patterns
- Make data-driven decisions
- Understand variability and distribution


## 4Ô∏è‚É£ Measures of Central Tendency
Mean, Median, Mode


In [None]:
mean_salary = df['Salary'].mean()
median_salary = df['Salary'].median()
mode_salary = df['Salary'].mode()[0]

print("Mean Salary:", mean_salary)
print("Median Salary:", median_salary)
print("Mode Salary:", mode_salary)


## 5Ô∏è‚É£ Measures of Dispersion
Range, Variance, Standard Deviation


In [None]:
salary_range = df['Salary'].max() - df['Salary'].min()
variance = df['Salary'].var()
std_dev = df['Salary'].std()

print("Salary Range:", salary_range)
print("Variance:", variance)
print("Standard Deviation:", std_dev)


## 6Ô∏è‚É£ Salary Distribution Summary


In [None]:
df['Salary'].describe()


## 7Ô∏è‚É£ Experience Statistics


In [None]:
df['Experience_Years'].describe()


## 8Ô∏è‚É£ Salary Analysis by Gender (Statistics)


In [None]:
gender_stats = df.groupby('Gender')['Salary'].agg(
    Mean='mean',
    Median='median',
    Std_Dev='std',
    Min='min',
    Max='max'
)

gender_stats


## 9Ô∏è‚É£ Salary Analysis by Position (Statistics)


In [None]:
position_stats = df.groupby('Position')['Salary'].agg(
    Mean='mean',
    Median='median',
    Std_Dev='std'
).sort_values(by='Mean', ascending=False)

position_stats


## üîü Correlation
Relationship between Experience and Salary


In [None]:
correlation = df['Experience_Years'].corr(df['Salary'])
print("Correlation between Experience and Salary:", correlation)


üìå Interpretation:
- Value close to +1 ‚Üí strong positive relationship
- Value close to 0 ‚Üí weak relationship
- Value close to -1 ‚Üí negative relationship


## 1Ô∏è‚É£1Ô∏è‚É£ Outlier Detection (IQR Method)


In [None]:
Q1 = df['Salary'].quantile(0.25)
Q3 = df['Salary'].quantile(0.75)
IQR = Q3 - Q1

lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR

outliers = df[(df['Salary'] < lower_bound) | (df['Salary'] > upper_bound)]
outliers


## 1Ô∏è‚É£2Ô∏è‚É£ Z-Score (Optional Intro)


In [None]:
df['Salary_Zscore'] = (df['Salary'] - mean_salary) / std_dev
df[['Salary', 'Salary_Zscore']].head()


## 1Ô∏è‚É£3Ô∏è‚É£ Key Statistical Insights


- Average salary is higher than median ‚Üí right-skewed distribution
- Salary increases with experience (positive correlation)
- Some positions have high salary variance
- Outliers exist at very high salary levels

## 1Ô∏è‚É£4Ô∏è‚É£ Practice Exercises

1. Calculate mean & median experience  
2. Find variance of experience years  
3. Detect outliers in experience  
4. Compare salary standard deviation by gender  
5. Interpret correlation value in words  



## ‚úÖ End of Day 24 ‚Äî Statistics Basics
