# 📊 Project: HR Employee Attrition Analysis

**Objective:** Perform a complete HR analytics project to identify attrition patterns, key drivers, and provide insights for employee retention.  
This project is designed for one person (Computer Science graduate) applying for a **Data Analyst** role in Tamheer.

**Files:**  
- `HR_Employee_Attrition.csv` → dataset  
- `HR_Employee_Attrition_Analysis.ipynb` → analysis notebook


## ✅ What you will do in this notebook

1. Load and explore the dataset.  
2. Perform data cleaning and feature engineering.  
3. Calculate descriptive statistics and KPIs (attrition rate, average income, average satisfaction, etc.).  
4. Explore attrition by department, role, and overtime.  
5. Visualize relationships (e.g., satisfaction vs attrition).  
6. Build a simple **Logistic Regression model** to predict attrition.  
7. Provide actionable **recommendations** to HR management.


In [1]:
# === Setup ===
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix

# Load data
csv_path = r"/mnt/data/HR_Employee_Attrition.csv"
df = pd.read_csv(csv_path)

print("Shape:", df.shape)
df.head()


FileNotFoundError: [Errno 2] No such file or directory: '/mnt/data/HR_Employee_Attrition.csv'

## 🧼 Data Cleaning & Feature Engineering

- Check missing values and datatypes.  
- Encode categorical variables.  
- Create binary target column for attrition (1=Yes, 0=No).


In [None]:
print(df.isna().sum())
print(df.dtypes)

# Convert Attrition to categorical string
df['Attrition'] = df['Attrition'].map({1:'Yes',0:'No'})

# Encode categorical variables
df_enc = pd.get_dummies(df, drop_first=True)

df_enc.head()


## 📈 Key HR Metrics (KPIs)

- Overall attrition rate.  
- Average monthly income.  
- Average job satisfaction.  
- Attrition by department and overtime.


In [None]:
# KPIs
attrition_rate = (df['Attrition']=='Yes').mean()
avg_income = df['MonthlyIncome'].mean()
avg_satisfaction = df['JobSatisfaction'].mean()

print(f"Attrition Rate: {attrition_rate:.2%}")
print(f"Average Monthly Income: {avg_income:,.0f}")
print(f"Average Job Satisfaction: {avg_satisfaction:.2f}")

# Attrition by department
dept_attrition = df.groupby('Department')['Attrition'].value_counts(normalize=True).unstack().fillna(0)

dept_attrition.plot(kind='bar', stacked=True)
plt.title("Attrition by Department")
plt.ylabel("Proportion")
plt.show()

# Attrition by overtime
ot_attrition = df.groupby('OverTime')['Attrition'].value_counts(normalize=True).unstack().fillna(0)
ot_attrition.plot(kind='bar', stacked=True)
plt.title("Attrition by OverTime")
plt.ylabel("Proportion")
plt.show()


## 🤖 Predicting Attrition with Logistic Regression

We will train a simple Logistic Regression model using encoded features to predict attrition probability.


In [None]:
X = df_enc.drop('Attrition_Yes', axis=1)
y = df_enc['Attrition_Yes']

X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2,random_state=42,stratify=y)

model = LogisticRegression(max_iter=500)
model.fit(X_train,y_train)

y_pred = model.predict(X_test)

print("Classification Report:\n", classification_report(y_test,y_pred))
sns.heatmap(confusion_matrix(y_test,y_pred), annot=True, fmt='d', cmap='Blues')
plt.title("Confusion Matrix")
plt.show()


## 🧠 Insights & Recommendations

- High attrition is strongly associated with **OverTime = Yes** → Consider monitoring workload and overtime policies.  
- Certain departments (e.g., Sales/IT) show higher attrition → Implement targeted engagement/retention programs.  
- Employees with **lower job satisfaction** and **lower income** are more likely to leave → Suggest salary benchmarking and satisfaction surveys.  
- Commute time may also play a role → Consider flexible/remote options.


## 📂 Deliverables

- `HR_Employee_Attrition.csv` → dataset file  
- `HR_Employee_Attrition_Analysis.ipynb` → analysis notebook  
- Suggested: Upload to GitHub with README and visualizations.
