GitHub

Problem Overview

The Garment Industry is one of the key examples of the industrial globalization of this modern era. It is a highly labour-intensive industry with lots of manual processes. Satisfying the huge global demand for garment products is mostly dependent on the production and delivery performance of the employees in the garment manufacturing companies. The main problem often faced by the garment industry is that employees cannot meet the productivity that has been targeted, causing losses such as the number of good products produced is less than the target, delays in completing products from a predetermined time period, and the level of customer satisfaction decreases. So, it is highly desirable among the decision makers in the garments industry to track, analyse and predict the productivity performance of the working teams in their factories using data driven approach. Through the data approach, predictive models can be made using one of the machine learning algorithms, namely regression. With this predictor of employee productivity, management can carry out production planning more accurately and efficiently in terms of costs.

Objective & Method

Create garment employee's productivity predictor

Create predictive model using regression machine learning algorithm. In this project we will use several method of regression, compare their performance, and choose one of the best model. The method that we use are :
- Multiple Linear Regression
- Ridge Regression
- Lasso Regression
- Random Forest Regression
- Deep Learning
Determine effective and efficient variable standard to achieve high productivity (> targeted productivity)

Business Benefit

Predict garment employee's productivity with an accuracy rate more than 90% by taking consideration of various factors that determine employee productivity
Productivity determinant variable planning can be determined effectively and efficiently so it can minimize labor costs, overtime costs, and incentive costs by more than 50%

Dataset

This project using productivity prediction of garment employee dataset from kaggle (https://www.kaggle.com/datasets/ishadss/productivity-prediction-of-garment-employees). This dataset contains 1197 rows and 15 columns. This dataset includes important attributes of the garment manufacturing process and the productivity of the employees which had been collected manually and also been validated by the industry experts. This dataset taken from one of the garment industry in Bangladesh.Here is features description :

date : date in MM-DD-YYYY
quarter : a portion of the month (A month was devided into five quarters)
department : associated department with the instance
day : day of the week
team : associated team number with the instance
targeted_productivity : targeted productivity set by the authority for each team for each day
smv : standard minute value, it is the allocated time for a task
wip : work in progress. includes the number of unfinished items for products
over_time : represents the amount of overtime by each team in minutes
incentives : represents the amount of financial incentive (in BDT) that enables or motivates a particular course of action
idle_time : the amount of time when the production was interrupted due to several reasons
idle_men : the number of workers who were idle due to production interruption
no_of_style_change : number of changes in the style of a particular product
no_of_worker : number of workers in each team
actual_productivity : The actual % of productivity that was delivered by the workers. It ranges from 0-1

Requirement Packages

pandas, numpy, sklearn, statsmodels, seaborn, matplotlib

Analysis Flow

Data Profiling
- Categorical data consists of quarter, department, and day features
- Numerical data consists of team, targeted_productivity, smv, wip, over_time, incentive, idle_time, idle_men, no_of_style_change, no_of_workers, actual_productivity features
- No_of_workers feature type is float and we find some abnormal value on it (example : 30.5)
- Minimum and maximum values for all column seem reasonable because there are no negative values, but we can find an outlier in maximum values of actual productivity (> 1)
- There is a wrong type in value counts of department column 'sweing'. We can change it into sewing
- Department of finishing separated into 2 value counts. We can combine the data
- Extract the month of the date feature. All of data were taken in 2015 (not in different year), so we can ignore it and we have already quarter and day features to replace the day of the date feature
Feature Engineering
- Extract month from date feature
- Correcting the wrong type
- Round up the value of no_of_workers & change the data type into integer
- Change the value of actual_productivity > 1 become 1
Data Cleaning
- There is a missing value in wip features with percentage 42,27%. We can handle it by drop the wip column
- There is no duplicated value
Exploratory Data Analysis
Preprocessing Model
- Encoding categorical data
- Split data
- Multicollinearity
- Drop redundant features
Modelling
- Multiple linear regression
- Ridge regression with hyperparameter tuning
- Lasso regression with hyperparameter tuning
- Random forest regression
- Deep learning
Evaluate The Model
Choose The Best Model
Deep Dive Analysis

The objective is to determine effective & efficient standard variable to achieve high productivity with smv based. here is the steps :
- filter data with positive margin value and actual productivity > 0,90
- find the unique value of the smv, we will determine effective & efficient standard variable with smv based
- filter the data by it's smv value, compare the data that have smv value
- choose the data that has the smallest value of number of workers, over time, & incentive

Project Result

The best model has smallest R2, RMSE, MAE, MAPE values. from the several regression methods used, multiple linear regression is the best model

Metric	Multiple Linear Regression	Lasso Regression	Ridge Regression	Random Forest Regression	Deep Learning
R2	1	0,999	0,999	0,9983	-
RMSE	5,4517e^-13	0,0001517	2,0053e^-8	0,0173518	0,0085
MAE	4,9505e^-13	0,0001108	1,4063e^-8	0,0055825	0,0733
MAPE	7,1979e^-13	0,0001787	2,2657e^-8	0.0097527	11,4380

Effective and efficient variable standard to achieve high productivity (> targeted productivity) with smv based

Sewing Department

SMV (minutes)	Number of Worker	Over Time (minutes)	Incentive (BDT)
2,9	8	960	0
3,9	8	960	0
3,94	8	960	0
4,08	9	1080	0
4,15	12	1440	0
4,6	8	960	0
5,13	8	960	0
18,79	52	6240	56
22,52	58	0	90
22,94	59	3060	113
25,90	57	10170	70
26,16	59	7080	98
26,82	59	7080	75

Finishing Department

SMV (minutes)	Number of Worker	Over Time (minutes)
2,9	9	1620
3,94	2	240
4,08	9	1080
4,15	8	960
4,30	10	1200

Conclusion & Recommendation

Compared to the other four models, the multiple linear regression method has the smallest error value and a perfect R2 score of 1. It means that 100% of variability of actual_productivity is perfectly explained using all the features in the model. The standard deviation of prediction errors is 5.4517e^-13. From the regression line, the residuals mostly deviate between +- 5.4517e^-13. On average, our prediction using multiple linear regression model deviates the true actual_productivity by 4.950e^-13 Moreover, this by 4.950e^-13 is equivalent to 7,197e^-11 % deviation relative to the true actual_productivity. So I recommend to use this model as prediction model.
We can define standard variables to determine the amount of over time, incentives, number of employees, idle time, idle men, number of style changes that are effective , efficient, and have high productivity based on a certain smv. We can use predictor models to revalidate these standard variable to achieve more optimal, effective, and efficient standard variable.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
Employee_Productivity_Predictor.ipynb		Employee_Productivity_Predictor.ipynb
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Problem Overview

Objective & Method

Business Benefit

Dataset

Requirement Packages

Analysis Flow

Project Result

Conclusion & Recommendation

About

Releases

Packages

Languages

License

izzahlux/Employee_Productivity_Predictor

Folders and files

Latest commit

History

Repository files navigation

Problem Overview

Objective & Method

Business Benefit

Dataset

Requirement Packages

Analysis Flow

Project Result

Conclusion & Recommendation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages