Welcome to the Employee Retention Analysis project repository. This project applies the data analysis process to understand and improve the retention rate of new employees within an organization, utilizing both quantitative and qualitative data from surveys.
Many organizations face high turnover rates among new hires. This project uses people analytics to analyze employee satisfaction and identify key factors that influence retention.
Goal: To improve retention by identifying actionable insights from employee feedback and process evaluation.
This project follows the 6-step data analysis process:
- Define project scope and success criteria.
- Collaborate with stakeholders (leaders, managers).
- Example Questions:
- What do new hires need to succeed?
- What causes dissatisfaction?
- What’s the desired retention increase?
- Create a 3-month timeline and progress report plan.
- Design and deploy an employee survey.
- Define data access rules (e.g., only summarized data available to stakeholders).
- Plan for data visualization and potential issues.
- Collect data ethically with employee consent.
- Ensure transparency in data usage and storage.
- Process steps:
- Restrict raw data access.
- Clean data for accuracy and completeness.
- Upload raw data securely to an internal data warehouse.
- Discover patterns and insights.
- Key Findings Example:
- Long hiring process → Higher turnover.
- Transparent evaluations → Higher retention.
- Use appropriate data analysis tools (Python, SQL, etc.).
- Share summarized reports with managers.
- Managers deliver results with context to teams.
- Encourage team discussions on improving engagement.
- Implement process improvements.
- Repeat survey annually for comparison.
- Measure success via retention rate increase.
Question | Type | Data Type |
---|---|---|
Hiring satisfaction (1-10) | Quantitative | Integer |
Hiring duration (weeks) | Quantitative | Float |
Onboarding rating (1-5) | Quantitative | Integer |
Recommend company (1-10) | Quantitative | Integer |
Current job satisfaction (1-10) | Quantitative | Integer |
Challenges during hiring | Qualitative | String |
Suggestions for onboarding | Qualitative | String |
Reason for leaving | Qualitative | String |
Improvements for satisfaction | Qualitative | String |
- Tools: Python (pandas, matplotlib), Excel, SQL
- Techniques:
- Descriptive statistics (mean, median)
- Box plots for hiring duration vs retention
- Correlation matrices
- Bar/line charts for trends across teams
- Tools: LLMs (e.g., GPT), spaCy, NLTK
- Techniques:
- LLM-based categorization of open text (e.g., reasons for leaving: Compensation, Management)
- Sentiment analysis
- Word clouds and topic modeling for key themes
- Box plot: Hiring duration vs retention
- Bar chart: Average onboarding score by department
- Pie chart: Categorized reasons for leaving
- Word cloud: Common suggestions from new hires
- Survey deployment: Month 1
- Data collection and processing: Month 2
- Analysis and reporting: Month 3
- Survey Tools: Google Forms
- Analysis: Python, SQL
- Visualization: Matplotlib, Tableau
- Storage: Internal Data Warehouse (SQL-based)
For questions or contributions, reach out to Harsh Indoria via GitHub Issues or email at harsh.ind.coder@gmail.com.