## Machine Learning Project 
#### Group 1 Members: 
* DIAZ, Rafael Inigo 
* HOMBREBUENO, Robin James 
* SILVA, Kyle Luis 
* TONGOL, Gabrielle Vincenza 

## Section 1. Introduction to the problem/task and dataset


#### 1.1 Problem Statement 
The primary goal of this project is to develop predictive models that can accurately analyze labor force participation and employment-related patterns based on the ***Labor Force Survey (LFS) 2016 dataset***. The dataset contains various socio-economic and demographic attributes aimed at understanding the employment landscape within the Philippines.

#### 1.2 Task Definition 
The specific task to be addressed in this project is a ***Classification Task***:

* ***Predicting Employment Status*** (employed or unemployed) based on features such as age, education level, work experience, industry, and other relevant attributes.

This task will be accomplished using machine learning models, and their performance will be evaluated using appropriate metrics such as accuracy, precision, recall, and F1-score.

## Section 2. Description of the dataset

In [1]:
import pandas as pd

dataset = pd.read_csv('dataset/LFS PUF April 2016.CSV')
dataset

Unnamed: 0,PUFREG,PUFPRV,PUFPRRCD,PUFHHNUM,PUFURB2K10,PUFPWGTFIN,PUFSVYMO,PUFSVYYR,PUFPSU,PUFRPL,...,PUFC33_WEEKS,PUFC34_WYNOT,PUFC35_LTLOOKW,PUFC36_AVAIL,PUFC37_WILLING,PUFC38_PREVJOB,PUFC40_POCC,PUFC41_WQTR,PUFC43_QKB,PUFNEWEMPSTAT
0,1,28,2800,1,2,405.2219,4,2016,217,1,...,,,,,,,,1,01,1
1,1,28,2800,1,2,388.8280,4,2016,217,1,...,,,,,,,,1,01,1
2,1,28,2800,1,2,406.1194,4,2016,217,1,...,,,,,,,,1,01,1
3,1,28,2800,2,2,405.2219,4,2016,217,1,...,,,,,,,,1,01,1
4,1,28,2800,2,2,384.3556,4,2016,217,1,...,,,,,,,,1,96,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
180857,17,59,5900,40880,2,239.4341,4,2016,258,1,...,,,,,,,,1,50,1
180858,17,59,5900,40880,2,189.8885,4,2016,258,1,...,,8,,,,2,,,,3
180859,17,59,5900,40880,2,207.7395,4,2016,258,1,...,,,,,,,,,,
180860,17,59,5900,40880,2,207.7395,4,2016,258,1,...,,,,,,,,,,


#### 2.1 Dataset Description 
The ***Labor Force Survey (LFS)*** is a nationwide survey of households conducted quarterly to gather data on the demographic and socio-economic characteristics of the population. It is primarily aimed at estimating the levels of employment and unemployment in the Philippines. One of the objectives of the LFS is to provide a quantitative framework for planning and policy-making affecting the labor market. The survey is designed to generate statistics on the levels and trends of employment, unemployment, and underemployment of the country, both nationally and regionally, covering all 17 administrative regions.

#### 2.2 Data Collection Process
The survey uses a total national sample of ***42,768 households*** (if including Batanes) or 42,576 households (if excluding Batanes) per survey round. This sample size is considered sufficient to provide precise and reliable estimates at both national and regional levels. The survey's reporting unit is the household, meaning the collected statistics pertain to the characteristics of individuals residing in private households. Persons belonging to the institutional population are not included within the scope of this survey.

#### 2.3 Dataset Structure 

* ***Rows*** - Each row represents a single individual or respondent from a surveyed household
* ***Columns*** - Each column represents a specific feature or attribute describing various aspects of the individual.
* ***Instances*** - To find out the the instances of the dataset, we can refer to the code below




In [6]:

num_rows, num_columns = dataset.shape

print(f"Number of instances: {num_rows}")

Number of instances: 180862


* ***Features*** - The dataset has 50 features, as seen below



| Feature Code     | Description                                                                                      |
|------------------|--------------------------------------------------------------------------------------------------|
| PUFREG           | Region                                                                                           |
| PUFPRV           | Province code                                                                                    |
| PUFPRRCD         | Province recode                                                                                  |
| PUFHHNUM         | Household unique sequential number                                                              |
| PUFURB2K10       | Urban / Rural in FIES 2010 survey                                                               |
| PUFPWGTFIN       | Final weight based on projection                                                                |
| PUFSVYMO         | Survey month                                                                                     |
| PUFSVYYR         | Survey year                                                                                      |
| PUFPSU           | PSU number                                                                                      |
| PUFRPL           | Replicate                                                                                       |
| PUFHHSIZE        | Number of household members                                                                     |
| PUFC01_LNO       | Line number used to identify each member of the household in the survey                         |
| PUFC03_REL       | Relationship of the person to the household head                                                |
| PUFC04_SEX       | Sex of the person                                                                                |
| PUFC05_AGE       | Age of the person since last birthday                                                           |
| PUFC06_MSTAT     | Marital status of the person since last birthday                                               |
| PUFC07_GRADE     | Highest grade completed of the person                                                           |
| PUFC08_CURSCH    | Is the person currently attending school?                                                      |
| PUFC09_GRADTECH  | Is the person a graduate of a technical / vocational course?                                    |
| PUFC10_CONWR     | Category of OFW                                                                                 |
| PUFC11_WORK      | Did the person do any work for at least one hour during the past week?                         |
| PUFC12_JOB       | Although the person did not work last week, did the person have a job or business?             |
| PUFC14_PROCC     | Primary occupation of the person during the past week                                          |
| PUFC16_PKB       | Kind of business or industry of the person                                                     |
| PUFC17_NATEM     | Nature of employment of the person (Permanence, regularity, seasonality)                       |
| PUFC18_PNWHRS    | Normal working hours per day                                                                    |
| PUFC19_PHOURS    | Total number of hours worked during the past week                                              |
| PUFC20_PWMORE    | Do you want more hours of work during the past week?                                           |
| PUFC21_PLADDW    | Did the person look for additional work during the past week?                                  |
| PUFC22_PFWRK     | Was this the person’s first time to do any work?                                               |
| PUFC23_PCLASS    | Class of worker for primary occupation (relationship to establishment)                         |
| PUFC24_PBASIS    | Basis of payment for primary occupation                                                        |
| PUFC25_PBASIC    | Basic pay per day for primary occupation                                                       |
| PUFC26_OJOB      | Did the person have other job or business during the past week?                                |
| PUFC27_NJOBS     | Number of jobs the person had during the past week                                             |
| PUFC28_THOURS    | Total number of hours worked by the person for all his jobs during the past week               |
| PUFC29_WWM48H    | Main reason for not working more than 48 hours in the past week                                |
| PUFC30_LOOKW     | Did the person look for work or try to establish a business in the past week?                  |
| PUFC31_FLWRK     | Was it the person’s first time looking for work or trying to establish a business?            |
| PUFC32_JOBSM     | Job search method                                                                               |
| PUFC33_WEEKS     | Number of weeks spent in looking for work                                                      |
| PUFC34_WYNOT     | Reason for not looking for work                                                                |
| PUFC35_LTLOOKW   | When was the last time the person looked for work?                                            |
| PUFC36_AVAIL     | Availability for work if opportunity existed within the past week or two weeks               |
| PUFC37_WILLING   | Is the person willing to take up work in the past week or within 2 weeks?                     |
| PUFC38_PREVJOB   | Has the person worked at any time before?                                                     |
| PUFC40_POCC      | What was the person’s last occupation?                                                       |
| PUFC41_WQTR      | Did the person work or have a job/business during the past quarter?                          |
| PUFC43_QKB       | Kind of business for the past quarter                                                        |
| PUFNEWEMPSTAT    | New Employment Criteria                                                                      |

## Section 3. List of requirements

## Section 4. Data preprocessing and cleaning

## Section 5. Exploratory data analysis

## Section 6. Initial model training

## Section 7. Error analysis

## Section 8. Improving model performance

## Section 9. Model performance summary

## Section 10. Insights and conclusions

## Section 11. References