## **Introduction** 📌  

### **1. Project Overview**  
The Labor Force Survey (LFS) is a nationwide quarterly survey conducted to collect key employment statistics, such as employment status, hours worked, and wages. Understanding labor market trends is crucial for policymakers, businesses, and economists to make informed decisions.  

This project aims to analyze the **Labor Force Survey 2016** dataset and build predictive models to extract meaningful insights about employment patterns.  

### **2. Objective and Research Questions**  
The primary objective of this study is to develop machine learning models to analyze labor market trends. We aim to answer the following key questions:  
- **(For Classification)**: Can we predict whether a person is employed based on demographic and socioeconomic factors?  
- **(For Regression)**: Can we estimate a worker's daily wage based on their education, industry, and work hours?  

### **3. Dataset Overview**  
The **2016 Labor Force Survey (LFS)** dataset contains demographic, educational, and employment-related attributes of individuals. Each row represents a surveyed person, with features such as:  
- **Demographics:** Age, sex, marital status, household size.  
- **Education:** Highest grade completed, technical training.  
- **Employment Status:** Current job status, industry type, hours worked.  
- **Income:** Basic daily pay, payment basis (hourly/daily/monthly).  

This dataset is publicly available from the **Philippine Statistics Authority (PSA)** and was collected through household surveys. The data provides valuable insights into employment trends and workforce characteristics.  

### **4. Machine Learning Approach**  
To gain insights from the data, we will apply the following steps:  
1. **Data Preprocessing & Cleaning:** Handle missing values, encode categorical features, and normalize numerical values.  
2. **Exploratory Data Analysis (EDA):** Identify trends, distributions, and relationships between features.  
3. **Model Training & Evaluation:** Compare different machine learning models for classification and/or regression.  
4. **Hyperparameter Tuning:** Optimize models to improve performance.  
5. **Insights & Conclusions:** Interpret results and discuss real-world implications.  

### **5. Expected Outcomes**  
By the end of this project, we expect to:  
✅ Identify key factors influencing employment and wages.  
✅ Build a predictive model with strong accuracy and interpretability.  
✅ Provide insights that can help policymakers address labor market challenges.  


## **Step 1: Data Cleaning 🧹**  

Before we can analyze and model the data, we need to ensure its quality by performing essential data cleaning steps. This involves handling missing values, correcting data types, dealing with duplicates, and addressing potential outliers.  

### **1.1 Loading the Dataset**  
First, we load the dataset and inspect its structure. This includes:  
✅ Checking the number of rows and columns.  
✅ Displaying the first few rows to understand the format.  
✅ Checking column names and data types.  



In [None]:
import pandas as pd

# Load the dataset
file_path = "path/to/LFS_PUF_April_2016.csv"  # Update with the actual file path
df = pd.read_csv(file_path)

# Display basic dataset information
df.info()
df.head()