# <center> **Data Analysis on Filipino Family Income and Expenditure**
#### <center> By: Johann Sebastian E. Catalla, BSCS-I
#### <center> An output for the course CP102: Computer Programming 2
***

<div style="text-align: justify"> The dataset was sourced through kaggle from the Philippine Statistics Authority's (PSA) Family Income and Expenditure Survey (FIES) nationwide. The survey, which is undertaken every three (3) years, is aimed at providing data on family income and expenditure, including, among others, levels of consumption by item of expenditure, sources of income in cash, and related information affecting income and expenditure levels and patterns in the Philippines.
<br><br>
The Dataset contains more than 40k observations and 60 variables which is primarily comprised of the household income and expenditures of that specific household. For this analysis, I will be using Python's <b>pandas</b>, <b>numpy</b>, and <b>hvplot</b> libraries. </div>

In [324]:
import pandas as pd
import numpy as np
import hvplot.pandas
import holoviews as hv
from bokeh.models.formatters import NumeralTickFormatter
formatter = NumeralTickFormatter(format="0,0")

In [325]:
dataset = pd.read_csv("Family Income and Expenditure.csv")
dataset

Unnamed: 0,Total Household Income,Region,Total Food Expenditure,Main Source of Income,Agricultural Household indicator,Bread and Cereals Expenditure,Total Rice Expenditure,Meat Expenditure,Total Fish and marine products Expenditure,Fruit Expenditure,...,Number of Refrigerator/Freezer,Number of Washing Machine,Number of Airconditioner,"Number of Car, Jeep, Van",Number of Landline/wireless telephones,Number of Cellular phone,Number of Personal Computer,Number of Stove with Oven/Gas Range,Number of Motorized Banca,Number of Motorcycle/Tricycle
0,480332,CAR,117848,Wage/Salaries,0,42140,38300,24676,16806,3325,...,1,1,0,0,0,2,1,0,0,1
1,198235,CAR,67766,Wage/Salaries,0,17329,13008,17434,11073,2035,...,0,1,0,0,0,3,1,0,0,2
2,82785,CAR,61609,Wage/Salaries,1,34182,32001,7783,2590,1730,...,0,0,0,0,0,0,0,0,0,0
3,107589,CAR,78189,Wage/Salaries,0,34030,28659,10914,10812,690,...,0,0,0,0,0,1,0,0,0,0
4,189322,CAR,94625,Wage/Salaries,0,34820,30167,18391,11309,1395,...,1,0,0,0,0,3,0,0,0,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
41539,119773,XII - SOCCSKSARGEN,44875,Enterpreneurial Activities,1,23675,21542,1476,6120,1632,...,0,0,0,0,0,1,0,0,0,0
41540,137320,XII - SOCCSKSARGEN,31157,Enterpreneurial Activities,1,2691,1273,1886,4386,1840,...,0,0,0,0,0,3,0,0,0,0
41541,133171,XII - SOCCSKSARGEN,45882,Enterpreneurial Activities,2,28646,27339,480,4796,1232,...,0,0,0,0,0,1,0,0,0,0
41542,129500,XII - SOCCSKSARGEN,81416,Enterpreneurial Activities,1,29996,26655,2359,17730,2923,...,0,0,0,0,0,2,0,0,0,0


## **QUESTIONS** ##

1. What is the Average Yearly Food Expenditure and Average Income per region? 
2. What is the ratio of Food Expenditure to income per region?

In [368]:
# adding the food to income ratio heading
dataset["Food Percentage to Income"] = dataset["Total Food Expenditure"] / dataset["Total Household Income"]
# ratio of food to income
res = dataset.groupby(by="Region")[["Food Percentage to Income"]].mean().sort_values(by="Food Percentage to Income").round(2)

total_food = dataset.groupby("Region")[["Total Food Expenditure", "Total Household Income"]].mean().round(2).sort_values(by=["Total Food Expenditure", "Total Household Income"])

total_food.hvplot.barh(title="Regions in the Philippines sorted by \nAverage Yearly Food Expenditure and Average Income",color=["pink", "darkgreen"],\
                        fontscale=0.9, height=700,width=800, xformatter=formatter, line_color=None,\
                        ylabel="Yearly Food Expenditure per Family", xlabel="Region in the Philippines",\
                        fontsize={'xticks': 10, 'yticks': 10}, stacked=True, legend="top") + res.hvplot.heatmap(title="Ratio of Food Expenditure to Income", height=700, color="darkorange")


3. What are the main sources of income in the country? Per region?

In [327]:
country = dataset.groupby("Main Source of Income").size().to_frame().sort_values(0, ascending=False)
country.columns = ["Count"]
per_region = dataset.groupby(["Region", "Main Source of Income"]).size().to_frame()
country.hvplot.bar(height=700) + per_region.hvplot.bar(width=800, height=700, stacked=True, rot=25, legend='top')


5. How much does each region spend on their overall expenses? 

In [328]:
heads = dataset.columns
lst = []
for x in heads:
    lst.append(x)

print(lst)

['Total Household Income', 'Region', 'Total Food Expenditure', 'Main Source of Income', 'Agricultural Household indicator', 'Bread and Cereals Expenditure', 'Total Rice Expenditure', 'Meat Expenditure', 'Total Fish and  marine products Expenditure', 'Fruit Expenditure', 'Vegetables Expenditure', 'Restaurant and hotels Expenditure', 'Alcoholic Beverages Expenditure', 'Tobacco Expenditure', 'Clothing, Footwear and Other Wear Expenditure', 'Housing and water Expenditure', 'Imputed House Rental Value', 'Medical Care Expenditure', 'Transportation Expenditure', 'Communication Expenditure', 'Education Expenditure', 'Miscellaneous Goods and Services Expenditure', 'Special Occasions Expenditure', 'Crop Farming and Gardening expenses', 'Total Income from Entrepreneurial Acitivites', 'Household Head Sex', 'Household Head Age', 'Household Head Marital Status', 'Household Head Highest Grade Completed', 'Household Head Job or Business Indicator', 'Household Head Occupation', 'Household Head Class of

In [338]:
other_exp = ['Clothing, Footwear and Other Wear Expenditure', 'Housing and water Expenditure', 
       'Imputed House Rental Value', 'Medical Care Expenditure', 'Transportation Expenditure', 'Communication Expenditure', 
       'Education Expenditure', 'Miscellaneous Goods and Services Expenditure', 'Special Occasions Expenditure', 'Crop Farming and Gardening expenses']

expense_scatter = dataset.groupby("Region")[other_exp].mean().round(2)
expense_scatter.hvplot.bar(rot=25, width=1400, height=900, yformatter=formatter, stacked=True)

6. Head Sex

In [330]:
head_sex = dataset.groupby(["Region", "Household Head Sex"]).size().to_frame()
head_sex.columns = ["Count"]
head_sex.hvplot.bar(title="Household Head Sex per Region",stacked=True, height=700, width=800, legend='top', rot=25)

7. Head age distribution

In [351]:
head_age = dataset.groupby(["Household Head Age"]).size().to_frame()
head_age.hvplot.bar(width=1300, height=700)

8. Top regions of Total Entrepreneurial Activities

In [369]:
entrep = dataset.groupby("Region")[["Total Income from Entrepreneurial Acitivites"]].mean().sort_values("Total Income from Entrepreneurial Acitivites").round(2)
entrep.hvplot.barh()

9. Correlation of Income and Expenditures

In [370]:
dataset.columns

Index(['Total Household Income', 'Region', 'Total Food Expenditure',
       'Main Source of Income', 'Agricultural Household indicator',
       'Bread and Cereals Expenditure', 'Total Rice Expenditure',
       'Meat Expenditure', 'Total Fish and  marine products Expenditure',
       'Fruit Expenditure', 'Vegetables Expenditure',
       'Restaurant and hotels Expenditure', 'Alcoholic Beverages Expenditure',
       'Tobacco Expenditure', 'Clothing, Footwear and Other Wear Expenditure',
       'Housing and water Expenditure', 'Imputed House Rental Value',
       'Medical Care Expenditure', 'Transportation Expenditure',
       'Communication Expenditure', 'Education Expenditure',
       'Miscellaneous Goods and Services Expenditure',
       'Special Occasions Expenditure', 'Crop Farming and Gardening expenses',
       'Total Income from Entrepreneurial Acitivites', 'Household Head Sex',
       'Household Head Age', 'Household Head Marital Status',
       'Household Head Highest Grade Compl

In [372]:
correlation = dataset[['Total Household Income', 'Region', 'Total Food Expenditure',
       'Main Source of Income',
       'Bread and Cereals Expenditure', 'Total Rice Expenditure',
       'Meat Expenditure', 'Total Fish and  marine products Expenditure',
       'Fruit Expenditure', 'Vegetables Expenditure',
       'Restaurant and hotels Expenditure', 'Alcoholic Beverages Expenditure',
       'Tobacco Expenditure', 'Clothing, Footwear and Other Wear Expenditure',
       'Housing and water Expenditure', 'Imputed House Rental Value',
       'Medical Care Expenditure', 'Transportation Expenditure',
       'Communication Expenditure', 'Education Expenditure',
       'Miscellaneous Goods and Services Expenditure',
       'Special Occasions Expenditure', 'Crop Farming and Gardening expenses',
       'Total Income from Entrepreneurial Acitivites']].corr()
correlation.hvplot.heatmap(width=1500, height=1000, rot=25)

  'Total Income from Entrepreneurial Acitivites']].corr()


In [377]:
dataset.hvplot.hexbin('Total Food Expenditure', 'Total Household Income', logz=True, height=500, xformatter=formatter, yformatter=formatter)
