# Machine Learning Project  

## Course Information  
**Course:** STINTSY / S14  
**Professor:** Mr. Emerico Aguilar  
**University:** De La Salle University (DLSU)  
**Term:** AY 2024-2025, Term 2  

## Group Members  
- **Alfaro, Nathaniel Luis V.**  
- **Kabiling, Simon Gabriel M.**  
- **Naling, Sebastien M.**  
- **Santos, Montgomery**  

## Submission Details  
**Deadline:** March 28, 2025 (Friday) 6:00 PM  
**Demo Schedule:** April 2 to 11, 2025  
**Project Weight:** 20% of Final Grade  

## Project Title  
**Predicting Employment Status using the Labor Force Survey (LFS) 2016 Dataset**  

# **Section 1: Introduction to the Problem/Task and Dataset**  

### **Problem Statement**
The **Labor Force Survey (LFS) 2016** dataset provides insights into employment trends in the Philippines. This project aims to analyze labor force participation and employment patterns.  

Our goal is to **predict whether an individual was employed in the past week**, based on demographic, educational, and occupational features.  

### **Machine Learning Task: Classification**
This is a **classification problem**, where the target variable is:  

- **PUFC11_WORK** (`1` = Worked in the past week, `0` = Did not work in the past week).  

### **Use Cases and Importance** 
Understanding employment patterns can help:  
- Identify **factors influencing employment status**.  
- Recognize **demographic and economic employment trends**.  
- Provide **insights for government labor policies** to improve workforce conditions.  

### **Dataset Source**  
The **LFS 2016 dataset** is sourced from the **Philippine Statistics Authority (PSA)** and can be accessed at:  [LFS 2016 Dataset](https://psada.psa.gov.ph/catalog/67/get-microdata) 

# **Section 2: Description of the Dataset** 

### **Overview**  
The **Labor Force Survey (LFS) 2016** is conducted by the **Philippine Statistics Authority (PSA)** to assess employment and unemployment trends in the Philippines. The dataset includes information about:  
- **Demographics** (age, sex, marital status, education).  
- **Employment status** (current job, industry type, working hours).  
- **Job search activity** (looking for work, method used, weeks unemployed).  

### **Data Collection Process**  
The **LFS 2016** was collected through **household surveys** conducted nationwide. Key details:  
- The survey is conducted **quarterly** with a **randomly selected** representative sample.  
- Data was collected through **interviews** with household members.  
- The **self-reported nature** of responses may introduce biases, especially in informal employment sectors.  

### **Implications of Data Collection Method**  
- **Underreporting**: Informal or freelance workers may not report their actual working status.  
- **Sampling Bias**: Rural and urban areas might have different employment patterns.  
- **Response Validity**: Participants may misreport their employment status due to recall errors.  

### **Structure of the Dataset**  
- **Each row (instance) represents**: A surveyed individual.  
- **Each column (feature) represents**: A demographic, employment, or job search characteristic.  
- **Number of Instances**: Not explicitly stated in the document but can be determined after loading the dataset.  
- **Number of Features**: **43 features** related to labor force participation.  

### **Complete List of Features**  

| **Feature Name**       | **Description** |
|----------------------|----------------|
| **PUFREG**          | Region |
| **PUFPRV**          | Province code |
| **PUFPRRCD**        | Province recode |
| **PUFHHNUM**        | Household unique sequential number |
| **PUFURB2K10**      | Urban / Rural in FIES 2010 survey |
| **PUFPWGTFIN**      | Final weight based on projection |
| **PUFSVYMO**        | Survey month |
| **PUFSVYYR**        | Survey year |
| **PUFPSU**          | PSU number |
| **PUFRPL**          | Replicate |
| **PUFHHSIZE**       | Number of household members |
| **PUFC01_LNO**      | Line number identifying each member of the household in the survey |
| **PUFC03_REL**      | Relationship of the person to the household head |
| **PUFC04_SEX**      | Gender of the individual (Male/Female) |
| **PUFC05_AGE**      | Age of the person since last birthday |
| **PUFC06_MSTAT**    | Marital status |
| **PUFC07_GRADE**    | Highest educational attainment |
| **PUFC08_CURSCH**   | Is currently attending school? (Yes/No) |
| **PUFC09_GRADTECH** | Graduate of a technical/vocational course? (Yes/No) |
| **PUFC10_CONWR**    | Category of Overseas Filipino Worker (OFW) |
| **PUFC11_WORK**     | **Target variable: Did the person work in the past week? (1 = Yes, 0 = No)** |
| **PUFC12_JOB**      | Did the person have a job or business during the past week? |
| **PUFC14_PROCC**    | Primary occupation during the past week |
| **PUFC16_PKB**      | Kind of business or industry of the person |
| **PUFC17_NATEM**    | Nature of employment (permanent, seasonal, etc.) |
| **PUFC18_PNWHRS**   | Normal working hours per day |
| **PUFC19_PHOURS**   | Total number of hours worked during the past week |
| **PUFC20_PWMORE**   | Do you want more hours of work during the past week? |
| **PUFC21_PLADDW**   | Did the person look for additional work during the past week? |
| **PUFC22_PFWRK**    | Was this the person’s first time to do any work? |
| **PUFC23_PCLASS**   | Class of worker for primary occupation |
| **PUFC24_PBASIS**   | Basis of payment for primary occupation |
| **PUFC25_PBASIC**   | Basic pay per day for primary occupation |
| **PUFC26_OJOB**     | Did the person have another job/business during the past week? |
| **PUFC27_NJOBS**    | Number of jobs the person had during the past week |
| **PUFC28_THOURS**   | Total number of hours worked across all jobs in the past week |
| **PUFC29_WWM48H**   | Main reason for not working more than 48 hours in the past week |
| **PUFC30_LOOKW**    | Did the person look for work or try to establish a business in the past week? |
| **PUFC31_FLWRK**    | Was it the person’s first time looking for work? |
| **PUFC32_JOBSM**    | Job search method used |
| **PUFC33_WEEKS**    | Number of weeks spent looking for work |
| **PUFC34_WYNOT**    | Reason for not looking for work |
| **PUFC35_LTLOOKW**  | When was the last time the person looked for work? |
| **PUFC36_AVAIL**    | If work was available, would the person have accepted it? |
| **PUFC37_WILLING**  | Is the person willing to take up work in the past week or within 2 weeks? |
| **PUFC38_PREVJOB**  | Has the person worked at any time before? |
| **PUFC40_POCC**     | What was the person’s last occupation? |
| **PUFC41_WQTR**     | Did the person work or have a business in the past quarter? |
| **PUFC43_QKB**      | Kind of business for the past quarter |
| **PUFNEWEMPSTAT**   | New Employment Criteria |

### **Key Features in the Dataset**  

| **Feature Name**      | **Description** |
|----------------------|----------------|
| **PUFC04_SEX**       | Gender of the individual (Male/Female). |
| **PUFC05_AGE**       | Age in years. |
| **PUFC06_MSTAT**     | Marital status. |
| **PUFC07_GRADE**     | Highest educational attainment. |
| **PUFC08_CURSCH**    | Is currently attending school? (Yes/No). |
| **PUFC11_WORK**      | **Target variable: Worked in the past week?** (`1 = Yes, 0 = No`). |
| **PUFC14_PROCC**     | Primary occupation during the past week. |
| **PUFC16_PKB**       | Business/industry sector. |
| **PUFC17_NATEM**     | Nature of employment (permanent, seasonal, etc.). |
| **PUFC18_PNWHRS**    | Normal working hours per day. |
| **PUFC19_PHOURS**    | Total hours worked in the past week. |
| **PUFC30_LOOKW**     | Did the person look for work in the past week? |
| **PUFC34_WYNOT**     | Reason for not looking for work. |

### **Implications for the Study**  
- **Education level (`PUFC07_GRADE`)** may impact employment probability.  
- **Age (`PUFC05_AGE`)** could indicate trends in youth and elderly employment.  
- **Industry sector (`PUFC16_PKB`)** helps analyze job availability in different fields.  
- **Job search behavior (`PUFC30_LOOKW`)** may indicate long-term unemployment trends.  

This dataset provides a rich source of information for understanding employment trends and predicting labor force participation.  


# Section 3. List of requirements

### 3.1 Importing the Libraries and Dataset

To proceed with data cleaning, we will start by importing the necessary libraries.

# Section 4. Data preprocessing and cleaning

# Section 5. Exploratory data analysis

# Section 6. Initial model training

# Section 7. Error analysis

# Section 8. Improving model performance

# Section 9. Model performance summary

# Section 10. Insights and conclusions

# Section 11. References