# PROJECT SUMMARY

### Project Title : Employee Performance Analysis

### Client : INX Future Inc.

### Objective :

To identify underlying factors contributing to declining employee performance and provide actionable insights while minimizing negative impacts on employee morale.

INX Future Inc., a leader in data analytics and automation solutions, has been facing a dip in employee performance and client satisfaction. The CEO initiated this project to analyze employee data, identify non-performing employees without broadly affecting morale, and recommend data-driven solutions to improve overall performance and productivity.


# 1. REQUIREMENT

### Data Source :

The data used for the analysis was provided by INX’s HR department. It contained 28 features, including employee demographics, work experience, satisfaction levels, and performance-related metrics. The dataset had 1200 records with no null values or duplicates

### Numerical Features

- Age
- DistanceFromHome
- EmpHourlyRate
- NumCompaniesWorked
- EmpLastSalaryHikePercent
- TotalWorkExperienceInYears
- TrainingTimesLastYear
- ExperienceYearsAtThisCompany
- ExperienceYearsInCurrentRole
- YearsSinceLastPromotion
- YearsWithCurrManager

### Categorical Features

- EmpNumber
- Gender
- EducationBackground
- MaritalStatus
- EmpDepartment
- EmpJobRole
- BusinessTravelFrequency
- OverTime
- Attrition

### Discrete Features

- EmpEducationLevel
- EmpEnvironmentSatisfaction
- EmpJobInvolvement
- EmpJobLevel
- EmpJobSatisfaction
- EmpRelationshipSatisfaction
- EmpWorkLifeBalance
- PerformanceRating

# 2. ANALYSIS

The project workflow involved a step-by-step process of data preprocessing, exploratory data analysis (EDA), model selection, and actionable recommendations.


## 2.1 Data Processing Techniques

### Outlier Handling

- Outliers were identified using box plots and statistical thresholds.

- For most features, capping was applied to limit the influence of extreme values while preserving data integrity.

- For "YearsSinceLastPromotion," a log transformation was employed to address high skewness, effectively reducing it to 0.635, which improved the feature's suitability for modeling.


### Encoding Categorical Features

- The dataset included 8 categorical columns that required transformation.

- Label encoding and manual encoding were applied

### Scaling

- Continuous features like Age, EmpHourlyRate, and DistanceFromHome were standardized using StandardScaler to ensure uniform scaling and improve model performance.



### Balancing the Dataset

- The target variable, PerformanceRating, exhibited class imbalance, which could have biased the models.

- To address this, SMOTE (Synthetic Minority Oversampling Technique) was applied to the training dataset, ensuring equal representation of all performance categories.



### Correlation Check

- A correlation matrix was generated to examine relationships among features.

- No multicollinearity was detected, confirming that all features could be retained for modeling.

## 2.2 Exploratory Data Analysis(EDA)

EDA was conducted to explore the dataset, identify patterns, and gain actionable insights:

##### Univariate Analysis

The primary goal of univariate analysis was to understand the distribution and composition of individual features:

###### 1. Continous Features

- Histogram Plots were used to visualize the distribution of variables like Age, EmpHourlyRate, and DistanceFromHome.

- Most continuous features followed normal or slightly skewed distributions, indicating the need for transformation in some cases (e.g., YearsSinceLastPromotion).

###### 2. Categorical and Discrete Features

- Count Plots were used to analyze categorical and discrete variables like Gender, EducationBackground, and NumCompaniesWorked.

##### Bivariate Analysis

Bivariate analysis examined relationships between individual features and the target variable, **PerformanceRating**

###### 1. Continous Features

- Histograms with Hue (grouped by PerformanceRating) revealed trends like:

Employees with longer TotalWorkExperienceInYears generally had higher performance ratings.

DistanceFromHome showed no significant impact on performance, contrary to initial expectations.

###### 2. Categorical and Discrete Features

- Count Plots with Hue showed performance distributions across categories:

Employees working OverTime had lower performance ratings, suggesting burnout.

Higher JobSatisfaction was associated with better performance across all departments.

##### Multivariate Analysis

 Multivariate analysis was conducted to explore interactions between multiple continuous variables and their collective impact on performance:

 Pair Plots were generated for select continuous features like Age, EmpHourlyRate, YearsSinceLastPromotion, and TotalWorkExperienceInYears.

## 2.3 Model Selection and Evaluation

To build a predictive model for employee performance, several machine learning algorithms were tested and evaluated. The process included selecting models, tuning hyperparameters, and evaluating their performance using key metrics like accuracy, precision, recall and F1 score.

### Model Selection Process

###### 1. Algorithms Applied : 

A total of 12 algorithms were tested, including:

- Logistic Regression (Multinomial)

- Support Vector Machine (SVM)

- Decision Tree Classifier

- Random Forest Classifier

- Gradient Boosting Classifier

- K-Nearest Neighbors (KNN)

- AdaBoost Classifier

- CatBoost Classifier

- Multi-Layer Perceptron (MLP) Classifier

- Extra Trees Classifier

- Linear Discriminant Analysis (LDA)

- Quadratic Discriminant Analysis (QDA)

###### 2. Evaluation Metrics :

Models were evaluated based on the following metrics:

- Accuracy: Overall percentage of correct predictions.

- Precision: Ability to correctly identify positive instances.

- Recall: Ability to capture all positive instances.

- F1-Score: Balance between precision and recall.


## Best-Performing Models

Among the 12 models, the following two algorithms demonstrated the best performance :

## 1. Random Forest Classifier

- **Untuned Accuracy : 93.3%**
- **Tuned Accuracy : 91.6%**

## 2. Gradient Boosting Classifier

- **Untuned Accuracy : 92.5%**
- **Tuned Accuracy : 92.5%**(consistent performance)

# Final Model Selection

The **Random Forest Classifier** was chosen as the final model due to its superior performance, interpretability, and robustness in handling the dataset's features.

# 3. SUMMARY

The project focused on achieving four main goals:

### 1. Department Wise Performance

- Data Science and Development have the highest ratings, indicating strong performance in these areas.
- Research & Development and Human Resources also show good performance.
- Finance and Sales have the lowest ratings, suggesting potential areas for improvement.
- Overall, the company maintains a consistent level of performance across departments.

### 2. Top 3 Factors Impacting Performance

**EmpLastSalaryHikePercent   
EmpEnvironmentSatisfaction    
YearsSinceLastPromotion**    
These features significantly influence the model's predictions. Other features like EmpJobRole and ExperienceYearsInCurrentRole also contribute, but to a lesser extent. Many features at the bottom have negligible importance.

### 3. Developing a Predictive Model

Built and evaluated 12 machine learning models to predict employee performance, with the **Random Forest Classifier** achieving the best accuracy of 93.3%

### 4.Recommendations

######  Enhance Employee Satisfaction and Engagement :

- Regular Feedback and Recognition: Implement a system for frequent feedback and recognition to boost morale and motivation.
- Work-Life Balance Initiatives: Promote work-life balance through flexible work arrangements and wellness programs.
- Employee Development Programs: Invest in training and development opportunities to enhance skills and knowledge.
- Transparent Communication: Foster open communication channels to address concerns and build trust.

###### Optimize Performance Management :

- Clear Performance Expectations: Set clear and measurable performance goals.
- Regular Performance Reviews: Conduct regular performance reviews to provide feedback and identify areas for improvement.
- Performance-Based Incentives: Implement performance-based incentives to reward top performers.

###### Foster a Postivie Work Culture:

- Strong Leadership: Provide strong leadership and mentorship to guide and inspire employees.
- Team Building Activities: Organize team-building activities to improve collaboration and teamwork.
- Positive Workplace Environment: Create a positive and supportive work environment.

By implementing these strategies, INX Future Inc. can significantly boost employee performance and maintain its reputation as a top employer while improving client satisfaction and overall organizational success.