# MCO 1 - 2012 Family Income and Expenditure Survey (FIES)
In this Notebook, we will explore income and expenditure behavior across Filipino households using the 2012 Family Income and Expenditure Survey (FIES) dataset. We will focus on statistical inference, particularly confidence intervals and hypothesis testing for means, while also applying unsupervised learning techniques such as clustering to reveal patterns in household spending.

We aim to understand how households from different income groups allocate their spending across essential categories like food, education, and utilities.

The dataset, provided in the file FIES PUF 2012 Vol.1.CSV, comes from the Philippine Statistics Authority and contains anonymized microdata on household income from various sources (such as salaries, businesses, and remittances), categorized expenditures (including food, housing, education, health, and utilities), as well as demographic and geographic variables like region and urban/rural classification. Household characteristics such as household size and number of earners are also included. 

## Research Questions

### General Research Question:
What are the key differences in expenditure allocation (e.g., food, education, utilities) across income groups?

#### Supporting Research Questions:
1. What are the average and median incomes in each income group?
2. Which expenditure category takes up the largest portion of total expenses for each group?
3. Do wealthier households spend a higher or lower percentage of their income on basic needs like food and utilities?
4. How does spending on discretionary items (e.g., entertainment, travel) change across income levels?
5. Are low-income households more likely to prioritize essential expenses over discretionary ones?
6. How does the ratio of education spending to income change as income increases?
7. Is there a statistically significant difference in food expenditure between the lowest and highest income groups?



# Import Libraries

For the statistical functions, we will be using `scipy`, specifically, the `stats` submodule. The [`scipy.stats`](https://docs.scipy.org/doc/scipy/reference/stats.html) module provides a number of probability distribution functions, summary and frequency statistics, correlation functions, statistical tests, and more.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.stats import norm
from scipy.stats import ttest_ind

## Family Income and Expenditure Data


In [5]:
fies_df = pd.read_csv('./Dataset/FIES_PUF_2012_Vol.1.CSV')
fies_df.head()

Unnamed: 0,W_REGN,W_OID,W_SHSN,W_HCN,URB,RSTR,PSU,BWEIGHT,RFACT,FSIZE,...,PC_QTY,OVEN_QTY,MOTOR_BANCA_QTY,MOTORCYCLE_QTY,POP_ADJ,PCINC,NATPC,NATDC,REGDC,REGPC
0,14,101001000,2,25,2,21100,415052,138.25,200.6576,3.0,...,1.0,1.0,,,0.946172,108417.0,9,8,8,9
1,14,101001000,3,43,2,21100,415052,138.25,200.6576,12.5,...,,1.0,,1.0,0.946172,30631.6,5,9,9,4
2,14,101001000,4,62,2,21100,415052,138.25,200.6576,2.0,...,,1.0,,,0.946172,86992.5,9,6,6,8
3,14,101001000,5,79,2,21100,415052,138.25,200.6576,4.0,...,,1.0,,,0.946172,43325.75,6,6,6,6
4,14,101001000,10,165,2,21100,415052,138.25,200.6576,5.0,...,,,,1.0,0.946172,37481.8,6,6,6,5


Call the [`info()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.info.html) function.

In [6]:
fies_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 40171 entries, 0 to 40170
Columns: 119 entries, W_REGN to REGPC
dtypes: float64(5), int64(92), object(22)
memory usage: 36.5+ MB


Call the [`describe()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.describe.html) function.

In [7]:
fies_df.describe()

Unnamed: 0,W_REGN,W_OID,W_SHSN,W_HCN,URB,RSTR,PSU,BWEIGHT,RFACT,FSIZE,...,HSE_ALTERTN,TOILET,ELECTRIC,WATER,POP_ADJ,PCINC,NATPC,NATDC,REGDC,REGPC
count,40171.0,40171.0,40171.0,40171.0,40171.0,40171.0,40171.0,40171.0,40171.0,40171.0,...,40171.0,40171.0,40171.0,40171.0,40171.0,40171.0,40171.0,40171.0,40171.0,40171.0
mean,13.01989,4210536000.0,9.633666,1563.601753,1.617311,21547.277215,258123.702099,340.330363,533.363298,4.699223,...,1.94033,1.71813,1.131563,3.18603,0.942329,54324.33,5.233303,5.238306,5.445769,5.455129
std,11.995555,2285729000.0,6.198442,2977.363506,0.486049,3520.981146,112143.268816,112.377931,209.996517,2.19405,...,0.236877,1.539145,0.338019,2.405758,0.038631,73721.11,2.874581,2.856486,2.866703,2.864137
min,1.0,101001000.0,1.0,1.0,1.0,2475.0,100010.0,92.25,126.1643,1.0,...,1.0,0.0,1.0,1.0,0.876132,2979.2,1.0,1.0,1.0,1.0
25%,6.0,2239012000.0,4.0,95.0,1.0,21100.0,116384.0,271.5,399.615,3.0,...,2.0,1.0,1.0,1.0,0.92445,19968.03,3.0,3.0,3.0,3.0
50%,10.0,4112005000.0,9.0,204.0,2.0,22100.0,216212.0,329.75,509.8749,4.5,...,2.0,1.0,1.0,3.0,0.940724,33369.75,5.0,5.0,5.0,5.0
75%,14.0,6210006000.0,14.0,393.0,2.0,23200.0,316519.0,428.71,634.1608,6.0,...,2.0,2.0,1.0,4.0,0.961401,61758.67,8.0,8.0,8.0,8.0
max,42.0,9804035000.0,30.0,8026.0,2.0,29000.0,416581.0,1630.2,2895.8149,20.5,...,2.0,7.0,2.0,12.0,1.058416,3231120.0,10.0,10.0,10.0,10.0
