# Source Code Notebook


## Background of the Dataset


The **Philippine Statistics Authority (PSA)** is the central statistical authority of the Philippines, responsible for collecting, compiling, and disseminating official data across various sectors. Established through the Philippine Statistical Act of 2013, the PSA integrates four previously separate agencies: the National Statistics Office (NSO), the National Statistical Coordination Board (NSCB), the Bureau of Agricultural Statistics (BAS), and the Bureau of Labor and Employment Statistics (BLES). Its primary mission is to provide timely, accurate, and relevant statistics for policy-making, planning, and research.

The **Family Income and Expenditure Survey (FIES)** is a nationwide survey of households undertaken every three years.  It is the main source of data on family income and expenditure, which include among others, levels of consumption by item of expenditure as well as sources of income in cash and in kind.  The results of FIES provide information on the levels of living and disparities in income of Filipino families, as well as their spending patterns. The survey aims to provide a detailed picture of the economic conditions of households, measuring disparities in income and spending patterns across various regions and socio-economic groups. While usually, the FIES is a survey conducted every three years, the 2023 FIES, the dataset used in this project, is the first biennial survey of the FIES. According to an article by Dr. Mapa in July 10, 2023, this change in standard procedure is due to respond to the clamor for more frequent and timely income and expenditure statistics, as well as poverty statistics. The results of the 2023 FIES will provide inputs to the 2023 Official Poverty Statistics, which will aid the government in planning, programming, policy formulation and decision-making.

Additional information about the 2023 FIES is that for the first time, the 2023 Geo-enabled Master Sample with sample size of about 180,000 households, which includes separate domains for Maguindanao del Norte and Maguindanao del Sur. Further, the 2023 FIES is the second in the series to implement the Computer-Aided Personal Interviewing (CAPI) system, which replaces the traditional Paper and Pencil Interviewing method. The use of the CAPI system eliminates the manual encoding of data obtained through paper questionnaires. It facilitates the data cleaning as consistency checks, skipping patterns and error detection are already embedded in the system. These features serve as safeguard to data quality and shortens the survey’s timetable of operations. The 2023 FIES Visit 1 was conducted July 9 to July 31 in 2023, while the 2023 FIES Visit 2 was conducted from January 8 to 31 in 2024. Attached here is link used to download the FIES dataset along with the metadata where the questionnaire is located. <https://psada.psa.gov.ph/catalog/FIES/about>

The dataset used for this project is found on the **PSADA Microdata Catalogue** accessed here: <https://psada.psa.gov.ph/home>. ALong with the FIES database, there are also numerous databases made open for public use with focuses on ICT, Tourism, Poverty Indicators, Wage Rates, and much more. Do note that you will need to have an account registered to access these resources, and along with the account, you will need to be subjected to the PSADA Microdata Catalogue Terms and Conditions.

References:
1. <https://rsso01.psa.gov.ph/statistics/fies/about>
2. <https://www.psa.gov.ph/statistics/income-expenditure/fies/node/1684059988>

In [1]:
import pandas as pd
fies_dataset = pd.read_csv('datasets/fies_2023_volume1_494887610821.csv')

  fies_dataset = pd.read_csv('datasets/fies_2023_volume1_494887610821.csv')


### Data Dictionary

In [2]:
fies_dataset.columns

Index(['RDMD_ID', 'Region', 'Province', 'Household ID', 'RECODED PROVINCE',
       'Family Size', 'Salaries/Wages from Regular Employment',
       'Salaries/Wages from Seasonal Employment',
       'Income from Salaries and Wages',
       'Net Share of Crops, Fruits, etc. (Tot. Net Value of Share)',
       'Cash Receipts, Support, etc. from Abroad',
       'Cash Receipts, Support, etc. from Domestic Source',
       'Rentals Received from Non-Agri Lands, etc.', 'Unnamed: 13',
       'Pension and Retirement Benefits', 'Dividends from Investment',
       'Other Sources of Income NEC', 'Family Sustenance Activities',
       'Total Received as Gifts', 'Crop Farming and Gardening',
       'Livestock and Poultry Raising', 'Fishing', 'Forestry and Hunting',
       'Wholesale and Retail', 'Manufacturing',
       'Transportation, Storage Services', 'Entrep. Activities NEC',
       'Entrep. Activities NEC.1', 'Entrep. Activities NEC.2',
       'Hhld, Income from Entrepreneurial Activities, Total',

In [3]:
fies_column_descriptions = {
    'RDMD_ID': 'Unique identifier for the record',
    'Region': 'Region code',
    'Province': 'Province code',
    'Household ID': 'Unique household identifier',
    'RECODED PROVINCE': 'Recoded province information',
    'Family Size': 'Number of people in the household',
    'Salaries/Wages from Regular Employment': 'Income from regular employment',
    'Salaries/Wages from Seasonal Employment': 'Income from seasonal employment',
    'Income from Salaries and Wages': 'Total income from salaries and wages',
    'Net Share of Crops, Fruits, etc. (Tot. Net Value of Share)': 'Net value from crop and fruit share',
    'Cash Receipts, Support, etc. from Abroad': 'Cash support received from abroad',
    'Cash Receipts, Support, etc. from Domestic Source': 'Cash support received domestically',
    'Rentals Received from Non-Agri Lands, etc.': 'Income from land rentals (non-agricultural)',
    'Unnamed: 13': 'Unknown or unnamed column',
    'Pension and Retirement Benefits': 'Income from pensions and retirement',
    'Dividends from Investment': 'Income from dividends',
    'Other Sources of Income NEC': 'Other sources of income not elsewhere classified',
    'Family Sustenance Activities': 'Income from family sustenance activities',
    'Total Received as Gifts': 'Total gifts received by the household',
    'Crop Farming and Gardening': 'Income from crop farming and gardening',
    'Livestock and Poultry Raising': 'Income from livestock and poultry raising',
    'Fishing': 'Income from fishing activities',
    'Forestry and Hunting': 'Income from forestry and hunting',
    'Wholesale and Retail': 'Income from wholesale and retail business',
    'Manufacturing': 'Income from manufacturing activities',
    'Transportation, Storage Services': 'Income from transportation and storage services',
    'Entrep. Activities NEC': 'Income from entrepreneurial activities (not elsewhere classified)',
    'Entrep. Activities NEC.1': 'Income from entrepreneurial activities (additional category 1)',
    'Entrep. Activities NEC.2': 'Income from entrepreneurial activities (additional category 2)',
    'Hhld, Income from Entrepreneurial Activities, Total': 'Total household income from entrepreneurial activities',
    'Losses from EA': 'Losses from entrepreneurial activities',
    'Cereal and Cereal Preparations (Total)': 'Expenditure on cereals and cereal preparations',
    'Meat and Meat Preparations': 'Expenditure on meat and meat preparations',
    'Fish and Marine Products (Total)': 'Expenditure on fish and marine products',
    'Dairy Products and Eggs (Total)': 'Expenditure on dairy products and eggs',
    'Oils and Fats (Total)': 'Expenditure on oils and fats',
    'Fruits and Vegetables': 'Expenditure on fruits and vegetables',
    'Vegetables (Total)': 'Expenditure on vegetables',
    'Sugar, Jam and Honey (Total)': 'Expenditure on sugar, jam, and honey',
    'Food Not Elsewhere Classified (Total)': 'Expenditure on other food items',
    'Fruit and vegetable juices': 'Expenditure on fruit and vegetable juices',
    'Coffee, Cocoa and Tea (Total)': 'Expenditure on coffee, cocoa, and tea',
    'Tea (total)  expenditure': 'Expenditure on tea',
    'Cocoa (total)  expenditure': 'Expenditure on cocoa',
    'Main Source of Water Supply (2nd visit only)': 'Main source of water supply (second visit)',
    'Softdrinks': 'Expenditure on soft drinks',
    'Other Non Alcoholic Beverages': 'Expenditure on other non-alcoholic beverages',
    'Alcoholic Beverages (Total)': 'Expenditure on alcoholic beverages',
    'Tobacco (Total)': 'Expenditure on tobacco products',
    'Other Vegetables (Total)': 'Expenditure on other types of vegetables',
    'Services_Primary_Goods': 'Expenditure on services and primary goods',
    'Alcohol Procduction Services': 'Expenditure on alcohol production services',
    'Total Food Consumed at Home (Total)': 'Total food consumed at home',
    'Food Regularly Consumed Outside The Home (Total)': 'Food consumed outside the home',
    'Hhld, Food': 'Household expenditure on food',
    'Clothing, Footwear and Other Wear': 'Expenditure on clothing, footwear, and other wear',
    'Housing and water (Total)': 'Expenditure on housing and water',
    'Actual House Rent': 'Expenditure on actual house rent',
    'Imputed House Rental Value': 'Imputed value of house rental',
    'Imputed Housing Benefit Rental Value': 'Imputed value of housing benefit rental',
    'House Rent/Rental Value': 'Expenditure on house rent/rental value',
    'Furnishings, Household Equipment & Routine Household Mainte': 'Expenditure on furnishings and household equipment',
    'Health (Total)': 'Expenditure on health services and products',
    'Transportation (Total)': 'Expenditure on transportation',
    'Communication (Total)': 'Expenditure on communication services',
    'Recreation and Culture (Total)': 'Expenditure on recreation and culture',
    'Education (Total)': 'Expenditure on education',
    'Insurance': 'Expenditure on insurance',
    'Miscellaneous Goods and Services (Total)': 'Expenditure on miscellaneous goods and services',
    'Durable Furniture': 'Expenditure on durable furniture',
    'Special Family Occasion': 'Expenditure on special family occasions',
    'Other Expenditure (inc. Value Consumed, Losses)': 'Other expenditures including losses',
    'Other Disbursements': 'Other household disbursements',
    'Accomodation Services': 'Expenditure on accommodation services',
    'Total Non-Food Expenditure': 'Total non-food expenditure',
    'Hhld, Income, Total': 'Total household income',
    'Hhld, Expenditures, Total': 'Total household expenditures',
    'Total Household Disbursements': 'Total household disbursements',
    'Other Receipts': 'Other household receipts',
    'Total Receipts': 'Total receipts',
    'Psu (Recode)': 'Primary Sampling Unit (recoded)',
    'Raising Factor': 'Raising factor for survey results',
    'Final Population Weights': 'Final weights for population data',
    'Urban / Rural': 'Urban or rural classification',
    'Per Capita Income': 'Household per capita income',
    'NPCINC': 'National per capita income',
    'RPCINC': 'Regional per capita income',
    'Per Capita Income Decile (Province)': 'Per capita income decile in the province',
    'pPCINC': 'Provincial per capita income decile',
    'Per Capita Income Decile (Region with Negros Island Region (NIR))': 'Per capita income decile (region with NIR)',
    'Region (with NIR)': 'Region code including NIR'
}


In [4]:
fies_dataset_data_dict = pd.DataFrame({
    'Column Name': fies_dataset.columns,
    'Data Type': fies_dataset.dtypes,
    'Non-Null Count': fies_dataset.notnull().sum(),
    'Unique Values': fies_dataset.nunique(),
    'Description': [fies_column_descriptions.get(col, 'No desciption available') for col in fies_dataset.columns]
})

fies_dataset_data_dict.to_csv('fies_dataset_data_dict.csv', index=False)

## Data Cleaning and Wrangling


## Preliminary Data Analysis


## Preliminary Visualization


## Preliminary Machine Learning Model


## Insights
