# **STINTSY S11 Group 1 Major Course Output**
Members:<br>
Andres, Donielle<br>
Limjoco, Jared Ethan<br>
Quinzon, Christopher Josh<br>
Uy, Shane Owen

## **Section 1: Introduction to the problem/task and dataset**

The garment industry plays a crucial role in today's industrial globalization as this labor-intesive sector relies heavily manual processes and in order to meet the global demands for clothing and textile products, the industry must ensure efficient production and delivery of the employees of garment manufacturing companies. Decision-makers or those of high positions in the garment industry makes this happen by systematically monitoring, analyzing, and predicting the productivity performance of the different teams that work within their factories. This makes it possible for companies in the industry to gain a competitive advantage in the market. <br><br>

In order to systematically monitor, analyze, and predict the productivity performance of the teams in factories, a comprehensive dataset has been built, covering important attributes related to the garment manufacturing process and the productivity of employees. This dataset serves as a valuable resource for researchers, data scientists, and industry professionals to understand the industry on a deeper level by looking at different factors contributing to the productivity of the industry. <br><br>

This project's primary objective is to predict and analyze the productivity performance of working teams in garment manufacturing companies. The target of this project is to develop models and generate insights that can predict the productivity of the teams in the garment industry based on the provided attributes. Being able to do so will greatly help decision-makers to make data-driven decision to improve the overall efficiency and competitiveness of their garment manufacturing operations

<br><br>
Comments:
- be more specific on the main task
- can do "predict and analyze the relationship between the given incentives to each worker to the percent of productivity that was delivered in each department per quarter" para linear regression nalang gamitin naten if ganito

## **Section 2: Description of the dataset**

In this section of the notebook, you must fulfill the following: <br>

State a brief description of the dataset.<br>
- Provide a description of the collection process executed to build the dataset.
<br> 

> Discuss the implications of the data collection method on the generated conclusions and insights. Note that you may need to look at relevant sources related to the dataset to acquire necessary information for this part of the project. 

- Describe the structure of the dataset file. <br>

> o What does each row and column represent? <br> o How many instances are there in the dataset? <br> o How many features are there in the dataset? <br> o If the dataset is composed of different files that you will combine in the succeeding steps, describe the structure and the contents of each file. <br>

- Discuss the features in each dataset file. What does each feature represent? All features, even those which are not used for the study, should be described to the reader. The purpose of each feature in the dataset should be clear to the reader of the notebook without having to go through an external link.

**Garments Features** <br>
- date – Date in MM-DD-YYYY
- quarter – A portion of the month. A month was divided into four quarters.
- department – Department associated with the instance.
- day – Day of the week
- team – Team number associated with the instance.
- targeted_productivity – Targeted productivity set by the authority for each team for each day.
- smv – Standard Minute Value; the allocated time for a task
- wip – Work in progress. Includes the number of unfinished items for products.
- over_time – Represents the amount of overtime by each team in minutes.
- incentive – Represents the amount of financial incentive that enables or motivates a particular course of action.
- idle_time – The amount of time when the production was interrupted due to several reasons.
- idle_men – The number of workers who were idle due to production interruption.
- no_of_style_change – Number of changes in the style of a particular product
- no_of_workers – Number of workers in each team
- actual_productivity – The actual % of productivity that was delivered by the workers. It ranges from 0-1.


## **Section 3: List of requirements**

List all the Python libraries and modules that you used.

In [1]:
# pandas dataframe
import pandas as pd

# numpy library
import numpy as np

# label encoder
from sklearn import preprocessing

## **Section 4: Data preprocessing and cleaning**

Perform the necessary steps before using the data. In this section of the notebook, please take note of the following: <br>

- If needed, perform preprocessing techniques to transform the data to the appropriate representation. This may include binning, log transformations, conversion to one-hot encoding, normalization, standardization, interpolation, truncation, and feature engineering, among others. There should be a correct and proper justification of the use of each preprocessing technique used in the project.
- Make sure that the data is clean, especially features that are used in the project. This may include checking for misrepresentations, checking the data type, dealing with missing data, dealing with duplicate data, and dealing with outliers, among others. There should be a correct and proper justification of the application (or non-application) of each data cleaning method used in the project. Clean only the variables utilized in the study

Comments:
- needs data cleaning
- change variable names and renaming(correct spelling)
- check missing values and correct data types
- one hot encoding and feature engineering

In [2]:
df = pd.read_csv('garments.csv')

print(df.info())
df.head(10)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1197 entries, 0 to 1196
Data columns (total 15 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   date                   1197 non-null   object 
 1   quarter                1197 non-null   object 
 2   department             1197 non-null   object 
 3   day                    1197 non-null   object 
 4   team                   1197 non-null   int64  
 5   targeted_productivity  1197 non-null   float64
 6   smv                    1197 non-null   float64
 7   wip                    691 non-null    float64
 8   over_time              1197 non-null   int64  
 9   incentive              1197 non-null   int64  
 10  idle_time              1197 non-null   float64
 11  idle_men               1197 non-null   int64  
 12  no_of_style_change     1197 non-null   int64  
 13  no_of_workers          1197 non-null   float64
 14  actual_productivity    1197 non-null   float64
dtypes: f

Unnamed: 0,date,quarter,department,day,team,targeted_productivity,smv,wip,over_time,incentive,idle_time,idle_men,no_of_style_change,no_of_workers,actual_productivity
0,01/01/2015,Quarter1,sweing,Thursday,8,0.8,26.16,1108.0,7080,98,0.0,0,0,59.0,0.940725
1,01/01/2015,Quarter1,finishing,Thursday,1,0.75,3.94,,960,0,0.0,0,0,8.0,0.8865
2,01/01/2015,Quarter1,sweing,Thursday,11,0.8,11.41,968.0,3660,50,0.0,0,0,30.5,0.80057
3,01/01/2015,Quarter1,sweing,Thursday,12,0.8,11.41,968.0,3660,50,0.0,0,0,30.5,0.80057
4,01/01/2015,Quarter1,sweing,Thursday,6,0.8,25.9,1170.0,1920,50,0.0,0,0,56.0,0.800382
5,01/01/2015,Quarter1,sweing,Thursday,7,0.8,25.9,984.0,6720,38,0.0,0,0,56.0,0.800125
6,01/01/2015,Quarter1,finishing,Thursday,2,0.75,3.94,,960,0,0.0,0,0,8.0,0.755167
7,01/01/2015,Quarter1,sweing,Thursday,3,0.75,28.08,795.0,6900,45,0.0,0,0,57.5,0.753683
8,01/01/2015,Quarter1,sweing,Thursday,2,0.75,19.87,733.0,6000,34,0.0,0,0,55.0,0.753098
9,01/01/2015,Quarter1,sweing,Thursday,1,0.75,28.08,681.0,6900,45,0.0,0,0,57.5,0.750428


In [7]:
df_copy = df

# remove null values
df_copy['wip'] = df_copy['wip'].fillna(0)
df_copy['wip'] = df_copy['wip'].astype(np.int64)

# replace misspelled words and remove whitespace
df_copy['department'] = df_copy['department'].str.replace('sweing', 'sewing')
df_copy['department'] = df_copy['department'].str.strip()

# removed Quarter5 and changed it to Quarter4
df_copy['quarter'] = df_copy['quarter'].str.replace('Quarter5', 'Quarter4')

# replace no of workers from float to int
df_copy['no_of_workers'] = df_copy['no_of_workers'].astype(np.int64)

# sets values of actual productivity that are higher than 1% to 1%
df_copy['actual_productivity'] = np.where(df_copy['actual_productivity'] > 1.0, 1.0, df_copy['actual_productivity'])

# drop date column (as question only needs per quarter basis)
df_copy = df.drop(columns = ['date'])

#encode quarter into numerals 
#(0 : Quarter 1, 1: Quarter 2, 2: Quarter 3, 3: Quarter 4)
label_enc = preprocessing.LabelEncoder()
label_enc.fit_transform(df_copy['quarter'])
df_copy['quarter'] = label_enc.transform(df_copy['quarter'])

df_copy.head(1000)

# changes so far
# - removed null values from work in progress
# - replaced "sweing" misspelling to "sewing"
# - removed "Quarter 5" and changed it to "Quarter 4"
# - replaced "number of workers" from float to int type
# - removed actual productivity values higher than 1% and changed it to 1%
# - dropped date column (since we are only looking at quarter and day of the week productivity)
# - label encoded quarters into numerals 

array([0, 1, 2, 3])

## **Section 5: Exploratory data analysis**

Perform exploratory data analysis comprehensively to gain a good understanding of your dataset. <br>In this section of the notebook, you must present relevant numerical summaries and
visualizations. Make sure that each code is accompanied by a brief explanation. The whole process should be supported with verbose textual descriptions of your procedures and findings.

## **Section 6: Model training**

Use machine learning models to accomplish your chosen task for the dataset. In this section of the notebook, please take note of the following:
- The project should train and evaluate at least 3 different kinds of machine learning models.
- Each model should be appropriate in accomplishing the chosen task for the dataset. There should be a clear and correct justification on the use of each machine learning model.
- Make sure that the values of the hyperparameters of each model are mentioned. At the minimum, the optimizer, the learning rate, and the learning rate schedule should be discussed per model.
- The report should show that the models are not overfitting nor underfitting.

## **Section 7: Hyperparameter tuning**

Perform grid search or random search to tune the hyperparameters of each model. In this section of the notebook, please take note of the following:<br>
- Make sure to elaborately explain the method of hyperparameter tuning.
- Explicitly mention the different hyperparameters and their range of values. Show the corresponding performance of each configuration.
- Report the performance of all models using appropriate evaluation metrics and visualizations.
- Properly interpret the result based on relevant evaluation metrics.

## **Section 8: Model selection**

Present a summary of all model configurations. Include each algorithm and the best set of values for its hyperparameters. Identify the best model configuration and discuss its advantage over other configurations.

## **Section 9: Insights and conclusions**

Clearly state your insights and conclusions from training a model on the data. Why did some models produce better results? Summarize your conclusions to explain the performance of the models. Discuss recommendations to improve the performance of the model.

## **Section 10: References**