# Dataset Connections - Group 4

This notebook details the cleaning and merging processes performed, as well as the initial exploratory data analysis of the datasets to be used in our project relating to **Flood and Landslide Disaster Risk in the Philippines**.

As mentioned in the **Dataset Description** section, the datasets are divided into four source categories:
1. Philippine Pre-disaster Indicators
1. ICA Philippines: Flood Risk
1. ICA Philippines: Landslide Risk
1. ICA Philippines: Tropical Storm (above Category 3) Occurrence

In [None]:
import pandas as pd
import numpy as np

## Philippine Pre-disaster Indicators

This consists of 13 datasets in .xlsx format gathered from [data.world](https://data.world/ochaphilippines/f26a0a04-0549-4139-af91-81dfa6e56082) and the [Philippine Statistics Authority (PSA)](https://psa.gov.ph/content/age-and-sex-distribution-philippine-population-2020-census-population-and-housing), with which display the following information:

1. Philippine Standard Geographic Code (PSGC)
1. Construction Materials of Outer Walls and Roof by City and Municipality
1. Health Personnel by City and Municipality
1. Kind of Toilet Facility by City and Municipality
1. No. of Barangay by City and Municipality
1. No. of Evacuation Center by City and Municipality
1. No. of Household by City and Municipality
1. Philippine Population (as of 2020)
1. Source of Water for Drinking by City and Municipality
1. Vulnerable Groups by City and Municipality
1. Pantawid Pamilya Beneficiary Households and Household Members by Sex and Age Group
1. DepEd No. of Schools and Enrolment with Sex Disaggregation by Municipality
1. Health Facility Type by Municipality

An initial preview of the datasets are displayed as they are loaded into dataframes.

In [None]:
# mounting dataset drive
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


#### Philippine Standard Geographic Code (PSGC)

In [None]:
df_psgc = pd.read_excel('/content/drive/Shareddrives/[DATA101] - Data Visualization/project/data/pre-disaster/excel datasets/psgc-code-as-march-2018.xlsx')
df_psgc

Unnamed: 0,Region,Region Code,Province,Province Code,City/Mun,City/Mun Code,Bgy,Bgy Code
0,Region I (Ilocos Region),PH010000000,Ilocos Norte,PH012800000,Adams,PH012801000,Adams (Pob.),PH012801001
1,Region I (Ilocos Region),PH010000000,Ilocos Norte,PH012800000,Bacarra,PH012802000,Bani,PH012802001
2,Region I (Ilocos Region),PH010000000,Ilocos Norte,PH012800000,Bacarra,PH012802000,Buyon,PH012802002
3,Region I (Ilocos Region),PH010000000,Ilocos Norte,PH012800000,Bacarra,PH012802000,Cabaruan,PH012802003
4,Region I (Ilocos Region),PH010000000,Ilocos Norte,PH012800000,Bacarra,PH012802000,Cabulalaan,PH012802004
...,...,...,...,...,...,...,...,...
42039,Region XIII (Caraga),PH160000000,Dinagat Islands,PH168500000,Tubajon,PH168507000,Diaz (Romualdez),PH168507005
42040,Region XIII (Caraga),PH160000000,Dinagat Islands,PH168500000,Tubajon,PH168507000,Roxas,PH168507006
42041,Region XIII (Caraga),PH160000000,Dinagat Islands,PH168500000,Tubajon,PH168507000,San Roque (Pob.),PH168507007
42042,Region XIII (Caraga),PH160000000,Dinagat Islands,PH168500000,Tubajon,PH168507000,San Vicente (Pob.),PH168507008


#### Construction Materials of Outer Walls and Roof by City and Municipality

The original excel file for this information contains two separate excel sheets, where the first contains data on wall and roof materials, while the second contains data on construction material strength categories. These are loaded into dataframes `df_wall_roof_materials` and `df_wall_roof_categories` respectively.

In [None]:
df_wall_roof_materials = pd.read_excel('/content/drive/Shareddrives/[DATA101] - Data Visualization/project/data/pre-disaster/excel datasets/construction-materials-of-the-outer-walls-and-roof-by-city-municipality.xlsx',
                          sheet_name=0, skiprows=1)
df_wall_roof_materials

Unnamed: 0,Region,Region Code,Province,Province Code,Municipality_City,Municipality_City Code,Construction Materials of the Outer Walls,Total Occupied Housing Units,Galvanized iron/aluminum,Tile/concrete/clay tile,Half galvanized iron and half concrete,Bamboo/cogon/nipa/anahaw,Asbestos,Makeshift/ salvaged/ improvised materials,Trapal,Others,Not Reported
0,Autonomous Region in Muslim Mindanao (ARMM),PH150000000,Basilan,PH150700000,Akbar,PH150708000,Makeshift/salvaged/improvised materials,0,0,0,0,0,0,0,0,0,0
1,Autonomous Region in Muslim Mindanao (ARMM),PH150000000,Basilan,PH150700000,Akbar,PH150708000,Trapal,0,0,0,0,0,0,0,0,0,0
2,Autonomous Region in Muslim Mindanao (ARMM),PH150000000,Basilan,PH150700000,Akbar,PH150708000,Others,0,0,0,0,0,0,0,0,0,0
3,Autonomous Region in Muslim Mindanao (ARMM),PH150000000,Basilan,PH150700000,Akbar,PH150708000,No walls,0,0,0,0,0,0,0,0,0,0
4,Autonomous Region in Muslim Mindanao (ARMM),PH150000000,Basilan,PH150700000,Akbar,PH150708000,Asbestos,1,1,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
21224,Region XIII (Caraga),PH160000000,Surigao Del Sur,PH166800000,Tago,PH166818000,Bamboo/sawali/cogon/nipa,271,55,0,0,216,0,0,0,0,0
21225,Region XIII (Caraga),PH160000000,Surigao Del Sur,PH166800000,Tago,PH166818000,Half concrete/brick/stone and half wood,381,313,0,3,65,0,0,0,0,0
21226,Region XIII (Caraga),PH160000000,Surigao Del Sur,PH166800000,Tago,PH166818000,Concrete/brick/stone,1039,979,6,2,51,0,1,0,0,0
21227,Region XIII (Caraga),PH160000000,Surigao Del Sur,PH166800000,Tago,PH166818000,Wood,6105,1934,0,0,4169,0,0,2,0,0


In [None]:
df_wall_roof_categories = pd.read_excel('/content/drive/Shareddrives/[DATA101] - Data Visualization/project/data/pre-disaster/excel datasets/construction-materials-of-the-outer-walls-and-roof-by-city-municipality.xlsx',
                          sheet_name=1)
df_wall_roof_categories

Unnamed: 0,Region,Region Code,Province,Province Code,Municipality_City,Municipality_City Code,Housing Units,Strong Roof/Strong Wall,Strong Roof/Light Wall,Strong Roof/Salvage Wall,Light Roof/Strong Wall,Light Roof/Light Wall,Light Roof/Salvage Wall,Salvaged Roof/Strong Wall,Salvaged Roof/Light Wall,Salvaged Roof/Salvage Wall
0,Autonomous Region in Muslim Mindanao (ARMM),PH150000000,Basilan,PH150700000,Akbar,PH150708000,2560,441,1230,0,127,757,0,0,0,0
1,Autonomous Region in Muslim Mindanao (ARMM),PH150000000,Basilan,PH150700000,Al-Barka,PH150709000,3647,503,1853,1,24,1253,2,0,3,0
2,Autonomous Region in Muslim Mindanao (ARMM),PH150000000,Basilan,PH150700000,City of Lamitan (Capital),PH150702000,14751,4972,6284,24,103,3244,10,6,33,23
3,Autonomous Region in Muslim Mindanao (ARMM),PH150000000,Basilan,PH150700000,Hadji Mohammad Ajul,PH150710000,2747,510,1218,7,10,991,5,0,1,1
4,Autonomous Region in Muslim Mindanao (ARMM),PH150000000,Basilan,PH150700000,Hadji Muhtamad,PH150712000,3921,347,1309,5,30,2215,3,0,5,6
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1628,Region XIII (Caraga),PH160000000,Surigao Del Sur,PH166800000,Marihatag,PH166814000,3668,1054,1005,0,62,1545,0,0,0,1
1629,Region XIII (Caraga),PH160000000,Surigao Del Sur,PH166800000,San Agustin,PH166815000,4548,1328,1308,1,62,1840,4,1,1,3
1630,Region XIII (Caraga),PH160000000,Surigao Del Sur,PH166800000,San Miguel,PH166816000,7918,952,3753,0,8,3205,0,0,0,0
1631,Region XIII (Caraga),PH160000000,Surigao Del Sur,PH166800000,Tagbina,PH166817000,8318,2082,3349,6,44,2790,3,0,6,1


#### Health Personnel by City and Municipality

In [None]:
df_health_personnel = pd.read_excel('/content/drive/Shareddrives/[DATA101] - Data Visualization/project/data/pre-disaster/excel datasets/health-personnel-by-city-municipality.xlsx')
df_health_personnel

Unnamed: 0,Region,Region Code,Province,Province Code,Municipality_City,Municipality_City Code,DOCTOR,NURSE,MIDWIFE,DENTIST,NUTRITIONIST/DIETICIAN,PHARMACIST,OCCUPATIONAL THERAPIST,MEDICAL TECHNOLOGIST,PHYSICAL THERAPIST,RADIOLOGY TECHNOLOGIST,X-RAY TECHNOLOGIST
0,Region I (Ilocos Region),PH010000000,Ilocos Norte,PH012800000,Adams,PH012801000,1,4,3,0,0,0,0,0,0,0,0
1,Region I (Ilocos Region),PH010000000,Ilocos Norte,PH012800000,Bacarra,PH012802000,8,23,16,6,0,8,0,2,0,0,1
2,Region I (Ilocos Region),PH010000000,Ilocos Norte,PH012800000,Badoc,PH012803000,1,6,8,1,0,8,0,1,0,0,0
3,Region I (Ilocos Region),PH010000000,Ilocos Norte,PH012800000,Bangui,PH012804000,10,20,11,4,1,5,0,4,1,0,0
4,Region I (Ilocos Region),PH010000000,Ilocos Norte,PH012800000,City of Batac,PH012805000,109,233,33,3,5,43,0,28,8,10,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1642,Region VII (Central Visayas),PH070000000,Negros Oriental,PH074600000,City of Tanjay,PH074621000,2,15,32,0,0,1,0,5,0,0,0
1643,Region VII (Central Visayas),PH070000000,Negros Oriental,PH074600000,Tayasan,PH074622000,1,1,3,0,0,0,0,0,0,0,0
1644,Region VII (Central Visayas),PH070000000,Negros Oriental,PH074600000,Valencia (Luzurriaga),PH074623000,1,2,10,1,0,0,0,1,0,0,0
1645,Region VII (Central Visayas),PH070000000,Negros Oriental,PH074600000,Vallehermoso,PH074624000,0,0,0,0,0,0,0,0,0,0,0


#### Kind of Toilet Facility by City and Municipality

In [None]:
df_toilet_facility = pd.read_excel('/content/drive/Shareddrives/[DATA101] - Data Visualization/project/data/pre-disaster/excel datasets/kind-of-toilet-facility-by-city-municipality.xlsx')
df_toilet_facility

Unnamed: 0,Region,Region Code,Province,Province Code,Municipality_City,Municipality_City Code,Water Sealed,Closed pit,Open Pit,none
0,Region I (Ilocos Region),PH010000000,Ilocos Norte,PH012800000,Adams,PH012801000,290,88,0,2
1,Region I (Ilocos Region),PH010000000,Ilocos Norte,PH012800000,Bacarra,PH012802000,7461,69,9,36
2,Region I (Ilocos Region),PH010000000,Ilocos Norte,PH012800000,Badoc,PH012803000,6755,83,15,69
3,Region I (Ilocos Region),PH010000000,Ilocos Norte,PH012800000,Bangui,PH012804000,3528,10,11,5
4,Region I (Ilocos Region),PH010000000,Ilocos Norte,PH012800000,City of Batac,PH012805000,11801,398,70,39
...,...,...,...,...,...,...,...,...,...,...
1642,Region VII (Central Visayas),PH070000000,Negros Oriental,PH074600000,City of Tanjay,PH074621000,13630,705,551,2786
1643,Region VII (Central Visayas),PH070000000,Negros Oriental,PH074600000,Tayasan,PH074622000,4020,1231,893,2017
1644,Region VII (Central Visayas),PH070000000,Negros Oriental,PH074600000,Valencia (Luzurriaga),PH074623000,6172,271,55,298
1645,Region VII (Central Visayas),PH070000000,Negros Oriental,PH074600000,Vallehermoso,PH074624000,2305,287,542,4480


#### No. of Barangay by City and Municipality

In [None]:
df_no_of_brgy = pd.read_excel('/content/drive/Shareddrives/[DATA101] - Data Visualization/project/data/pre-disaster/excel datasets/no-of-barangay-by-city-municipality.xlsx')
df_no_of_brgy

Unnamed: 0,Region,Region Code,Province,Province Code,City/Mun,City/Mun Code,Number of Barangay
0,Autonomous Region In Muslim Mindanao (ARMM),PH150000000,Basilan,PH150700000,Akbar,PH150708000,9
1,Autonomous Region In Muslim Mindanao (ARMM),PH150000000,Basilan,PH150700000,Al-Barka,PH150709000,16
2,Autonomous Region In Muslim Mindanao (ARMM),PH150000000,Basilan,PH150700000,City of Lamitan (Capital),PH150702000,45
3,Autonomous Region In Muslim Mindanao (ARMM),PH150000000,Basilan,PH150700000,Hadji Mohammad Ajul,PH150710000,11
4,Autonomous Region In Muslim Mindanao (ARMM),PH150000000,Basilan,PH150700000,Hadji Muhtamad,PH150712000,10
...,...,...,...,...,...,...,...
1642,Region XIII (Caraga),PH160000000,Surigao Del Sur,PH166800000,Marihatag,PH166814000,12
1643,Region XIII (Caraga),PH160000000,Surigao Del Sur,PH166800000,San Agustin,PH166815000,13
1644,Region XIII (Caraga),PH160000000,Surigao Del Sur,PH166800000,San Miguel,PH166816000,18
1645,Region XIII (Caraga),PH160000000,Surigao Del Sur,PH166800000,Tagbina,PH166817000,25


#### No. of Evacuation Center by City and Municipality

In [None]:
df_no_of_evac_center = pd.read_excel('/content/drive/Shareddrives/[DATA101] - Data Visualization/project/data/pre-disaster/excel datasets/no-of-evacuation-center-by-city-municipality.xlsx')
df_no_of_evac_center

Unnamed: 0,Region,Region Code,Province,Province Code,Municipality_City,Municipality_City Code,Number of Evacuation Center
0,Region I (Ilocos Region),PH010000000,Ilocos Norte,PH012800000,Adams,PH012801000,3
1,Region I (Ilocos Region),PH010000000,Ilocos Norte,PH012800000,Bacarra,PH012802000,3
2,Region I (Ilocos Region),PH010000000,Ilocos Norte,PH012800000,Badoc,PH012803000,1
3,Region I (Ilocos Region),PH010000000,Ilocos Norte,PH012800000,Bangui,PH012804000,0
4,Region I (Ilocos Region),PH010000000,Ilocos Norte,PH012800000,City of Batac,PH012805000,78
...,...,...,...,...,...,...,...
1642,Region VII (Central Visayas),PH070000000,Negros Oriental,PH074600000,City of Tanjay,PH074621000,3
1643,Region VII (Central Visayas),PH070000000,Negros Oriental,PH074600000,Tayasan,PH074622000,10
1644,Region VII (Central Visayas),PH070000000,Negros Oriental,PH074600000,Valencia (Luzurriaga),PH074623000,28
1645,Region VII (Central Visayas),PH070000000,Negros Oriental,PH074600000,Vallehermoso,PH074624000,20


#### No. of Household by City and Municipality

In [None]:
df_no_of_household = pd.read_excel('/content/drive/Shareddrives/[DATA101] - Data Visualization/project/data/pre-disaster/excel datasets/no-of-household-by-city-municipality.xlsx')
df_no_of_household

Unnamed: 0,Region,Region Code,Province,Province Code,Municipality_City,Municipality_City Code,Number of Household
0,Region I (Ilocos Region),PH010000000,Ilocos Norte,PH012800000,Adams,PH012801000,411
1,Region I (Ilocos Region),PH010000000,Ilocos Norte,PH012800000,Bacarra,PH012802000,8269
2,Region I (Ilocos Region),PH010000000,Ilocos Norte,PH012800000,Badoc,PH012803000,7375
3,Region I (Ilocos Region),PH010000000,Ilocos Norte,PH012800000,Bangui,PH012804000,3568
4,Region I (Ilocos Region),PH010000000,Ilocos Norte,PH012800000,City of Batac,PH012805000,12797
...,...,...,...,...,...,...,...
1642,Region VII (Central Visayas),PH070000000,Negros Oriental,PH074600000,City of Tanjay,PH074621000,19043
1643,Region VII (Central Visayas),PH070000000,Negros Oriental,PH074600000,Tayasan,PH074622000,8623
1644,Region VII (Central Visayas),PH070000000,Negros Oriental,PH074600000,Valencia (Luzurriaga),PH074623000,7990
1645,Region VII (Central Visayas),PH070000000,Negros Oriental,PH074600000,Vallehermoso,PH074624000,8571


#### Philippine Population (as of 2020)

The original excel file for the Philippine population has a title placed in its first row, while the essential data headers are placed on the fourth row, so the first three rows are skipped upon loading the dataframe.

In [None]:
df_ph_population = pd.read_excel('/content/drive/Shareddrives/[DATA101] - Data Visualization/project/data/pre-disaster/excel datasets/Total and Household Population by Single-Year Age, Sex, and Region_2020 CPH.xlsx',
                                 skiprows=4) #skipping the first four rows
df_ph_population

Unnamed: 0.1,Unnamed: 0,Both Sexes,Male,Female,Both Sexes.1,Male.1,Female.1
0,,,,,,,
1,,,,,,,
2,PHILIPPINES,,,,,,
3,Total,109033245.0,55306793.0,53726452.0,108667043.0,55017643.0,53649400.0
4,Under 1,2136032.0,1109532.0,1026500.0,2135411.0,1109050.0,1026361.0
...,...,...,...,...,...,...,...
1515,Note: 1 Per PSA Board Resolution No. 13 Serie...,,,,,,
1516,Philippine Standard Geographic Code Updates ...,,,,,,
1517,in Muslim Mindanao (BARMM) and Correct the...,,,,,,
1518,,,,,,,


#### Source of Water for Drinking by City and Municipality

The original excel file for this information also contains two separate excel sheets, where the first contains data on water sources, while the second contains data on water source categories. These are loaded into dataframes `df_source_of_water` and `df_source_of_water_categories` respectively.

In [None]:
df_source_of_water = pd.read_excel('/content/drive/Shareddrives/[DATA101] - Data Visualization/project/data/pre-disaster/excel datasets/source-of-water-supply-for-drinking-by-city-municipality.xlsx')
df_source_of_water

Unnamed: 0,Region,Region Code,Province,Province Code,Municipality_City,Municipality_City Code,Number of Households*,Own use faucet community water system,Shared faucet community water system,Own use tubed/piped deep well,Shared tubed/piped deep well,Tubed/piped shallow well,Dug well,Protected spring,Unprotected spring,"Lake, river, rain and others",Peddler,Bottled water,Others,Not Reported
0,Autonomous Region in Muslim Mindanao (ARMM),PH150000000,Basilan,PH150700000,Akbar,PH150708000,2820,27,10,14,399,4,2351,2,0,10,2,1,0,0
1,Autonomous Region in Muslim Mindanao (ARMM),PH150000000,Basilan,PH150700000,Al-Barka,PH150709000,3662,266,527,102,635,47,1064,114,244,644,3,8,8,0
2,Autonomous Region in Muslim Mindanao (ARMM),PH150000000,Basilan,PH150700000,City of Lamitan (Capital),PH150702000,15065,3636,3302,549,2927,565,2722,557,40,81,122,535,29,0
3,Autonomous Region in Muslim Mindanao (ARMM),PH150000000,Basilan,PH150700000,Hadji Mohammad Ajul,PH150710000,2948,21,16,126,308,12,2440,1,3,2,7,12,0,0
4,Autonomous Region in Muslim Mindanao (ARMM),PH150000000,Basilan,PH150700000,Hadji Muhtamad,PH150712000,3956,72,41,3,18,23,1696,1,1,4,2081,16,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1628,Region XIII (Caraga),PH160000000,Surigao Del Sur,PH166800000,Marihatag,PH166814000,3699,28,2295,147,453,134,0,116,44,19,0,463,0,0
1629,Region XIII (Caraga),PH160000000,Surigao Del Sur,PH166800000,San Agustin,PH166815000,4584,57,584,113,2248,0,68,143,452,76,1,842,0,0
1630,Region XIII (Caraga),PH160000000,Surigao Del Sur,PH166800000,San Miguel,PH166816000,7980,1665,2219,220,1374,42,243,758,381,2,14,1062,0,0
1631,Region XIII (Caraga),PH160000000,Surigao Del Sur,PH166800000,Tagbina,PH166817000,8368,2474,2484,89,920,69,248,500,40,665,50,829,0,0


In [None]:
df_source_of_water_categories = pd.read_excel('/content/drive/Shareddrives/[DATA101] - Data Visualization/project/data/pre-disaster/excel datasets/source-of-water-supply-for-drinking-by-city-municipality.xlsx',
                                   sheet_name=1)
df_source_of_water_categories

Unnamed: 0,Region,Region Code,Province,Province Code,Municipality_City,Municipality_City Code,Number of Households,Faucet/Community System,Tubed/Piped,Dug well,Bottled Water,Natural Sources,Peddler/Others/Not Reported
0,Autonomous Region in Muslim Mindanao (ARMM),PH150000000,Basilan,PH150700000,Akbar,PH150708000,2820,37,417,2351,1,12,2
1,Autonomous Region in Muslim Mindanao (ARMM),PH150000000,Basilan,PH150700000,Al-Barka,PH150709000,3662,793,784,1064,8,1002,11
2,Autonomous Region in Muslim Mindanao (ARMM),PH150000000,Basilan,PH150700000,City of Lamitan (Capital),PH150702000,15065,6938,4041,2722,535,678,151
3,Autonomous Region in Muslim Mindanao (ARMM),PH150000000,Basilan,PH150700000,Hadji Mohammad Ajul,PH150710000,2948,37,446,2440,12,6,7
4,Autonomous Region in Muslim Mindanao (ARMM),PH150000000,Basilan,PH150700000,Hadji Muhtamad,PH150712000,3956,113,44,1696,16,6,2081
...,...,...,...,...,...,...,...,...,...,...,...,...,...
1628,Region XIII (Caraga),PH160000000,Surigao Del Sur,PH166800000,Marihatag,PH166814000,3699,2323,734,0,463,179,0
1629,Region XIII (Caraga),PH160000000,Surigao Del Sur,PH166800000,San Agustin,PH166815000,4584,641,2361,68,842,671,1
1630,Region XIII (Caraga),PH160000000,Surigao Del Sur,PH166800000,San Miguel,PH166816000,7980,3884,1636,243,1062,1141,14
1631,Region XIII (Caraga),PH160000000,Surigao Del Sur,PH166800000,Tagbina,PH166817000,8368,4958,1078,248,829,1205,50


#### Vulnerable Groups by City and Municipality

In [None]:
df_vulnerable_groups = pd.read_excel('/content/drive/Shareddrives/[DATA101] - Data Visualization/project/data/pre-disaster/excel datasets/vulnerable-groups-by-city-municipality.xlsx')
df_vulnerable_groups

Unnamed: 0,Region,Region Code,Province,Province Code,Municipality_City,Municipality_City Code,Child Headed_Male,Child Headed_Female,Single Headed_Male,Single Headed_Female,Disability_Male,Disability_Female,Solo Parent_Male,Solo Parent_Female,Older_Male,Older_Female
0,Region I (Ilocos Region),PH010000000,Ilocos Norte,PH012800000,Adams,PH012801000,0,0,3,0,1,1,3,10,12,14
1,Region I (Ilocos Region),PH010000000,Ilocos Norte,PH012800000,Bacarra,PH012802000,0,0,4,0,16,8,20,28,49,48
2,Region I (Ilocos Region),PH010000000,Ilocos Norte,PH012800000,Badoc,PH012803000,0,0,9,7,50,38,26,71,102,158
3,Region I (Ilocos Region),PH010000000,Ilocos Norte,PH012800000,Bangui,PH012804000,0,0,6,2,23,28,9,26,33,37
4,Region I (Ilocos Region),PH010000000,Ilocos Norte,PH012800000,City of Batac,PH012805000,0,0,6,8,30,31,21,55,78,105
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1642,Region VII (Central Visayas),PH070000000,Negros Oriental,PH074600000,City of Tanjay,PH074621000,5,3,137,81,171,149,139,455,596,653
1643,Region VII (Central Visayas),PH070000000,Negros Oriental,PH074600000,Tayasan,PH074622000,13,1,261,89,148,144,160,468,468,503
1644,Region VII (Central Visayas),PH070000000,Negros Oriental,PH074600000,Valencia (Luzurriaga),PH074623000,2,0,65,48,65,71,55,207,184,259
1645,Region VII (Central Visayas),PH070000000,Negros Oriental,PH074600000,Vallehermoso,PH074624000,10,3,273,66,207,182,386,617,526,623


#### Pantawid Pamilya Beneficiary Households and Household Members by Sex and Age Group

In [None]:
df_pantawid_pamilya = pd.read_excel('/content/drive/Shareddrives/[DATA101] - Data Visualization/project/data/pre-disaster/excel datasets/pantawid-pamilya-beneficiary-households-and-household-members-by-sex-and-age-group.xlsx')
df_pantawid_pamilya

Unnamed: 0,Region,PCODE_REG,Province,PCODE_PROV,City_Municipality,PCODE_MUN_CTY,Total # of Active HHs,HH MEM_M_0 to 4,HH MEM_M_5 to 9,HH MEM_M_10 to 14,...,HH MEM_F_40 to 44,HH MEM_F_45 to 49,HH MEM_F_50 to 54,HH MEM_F_55 to 59,HH MEM_F_60 to 64,HH MEM_F_65 to 69,HH MEM_F_70 to 74,HH MEM_F_75 to 79,HH MEM_F_80 and above,Total HH MEM_F
0,National Capital Region (NCR),PH130000000,"NCR, City of Manila, First District (Not a Pro...",PH133900000,BINONDO,PH133902000,101,16,26,55,...,15,11,5,6,1,1,2,0,0,271
1,National Capital Region (NCR),PH130000000,"NCR, City of Manila, First District (Not a Pro...",PH133900000,ERMITA,PH133908000,201,23,71,76,...,39,25,13,5,9,2,3,0,1,452
2,National Capital Region (NCR),PH130000000,"NCR, City of Manila, First District (Not a Pro...",PH133900000,INTRAMUROS,PH133909000,252,9,56,112,...,37,41,27,18,9,10,1,1,1,661
3,National Capital Region (NCR),PH130000000,"NCR, City of Manila, First District (Not a Pro...",PH133900000,MALATE,PH133910000,989,33,310,534,...,202,154,123,73,40,45,28,16,23,2871
4,National Capital Region (NCR),PH130000000,"NCR, City of Manila, First District (Not a Pro...",PH133900000,PACO,PH133911000,1038,14,253,513,...,200,181,122,83,61,41,31,16,12,2929
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1635,Autonomous Region in Muslim Mindanao (ARMM),PH150000000,TAWI-TAWI,PH157000000,SIMUNUL,PH157004000,2407,95,755,1144,...,312,361,256,181,106,84,40,40,32,6158
1636,Autonomous Region in Muslim Mindanao (ARMM),PH150000000,TAWI-TAWI,PH157000000,SITANGKAI,PH157005000,2512,84,578,1298,...,322,400,240,172,87,57,26,26,26,6914
1637,Autonomous Region in Muslim Mindanao (ARMM),PH150000000,TAWI-TAWI,PH157000000,SOUTH UBIAN,PH157006000,2261,99,537,961,...,399,314,190,199,120,102,50,58,49,6171
1638,Autonomous Region in Muslim Mindanao (ARMM),PH150000000,TAWI-TAWI,PH157000000,TANDUBAS,PH157007000,2789,181,738,1324,...,413,362,233,216,98,95,51,25,29,7860


#### DepEd No. of Schools and Enrolment with Sex Disaggregation by Municipality

In [None]:
df_no_of_schools_and_enrollment = pd.read_excel('/content/drive/Shareddrives/[DATA101] - Data Visualization/project/data/pre-disaster/excel datasets/deped-number-of-schools-and-enrolment-with-sex-disaggregation-by-municipality.xlsx')
df_no_of_schools_and_enrollment

Unnamed: 0,Region,Region code,Province,Province code,City_Municipality,City_Mun Code,Elementary_school,Secondary_school,Enrollment_Elementary_Male,Enrollment_Elementary_Female,Enrollment_Secondary_Male,Enrollment_Secondary_Female
0,Region I (Ilocos Region),PH010000000,Ilocos Norte,PH012800000,Adams,PH012801000,2,1,154,129,85,71
1,Region I (Ilocos Region),PH010000000,Ilocos Norte,PH012800000,Bacarra,PH012802000,24,2,1971,1842,1309,1272
2,Region I (Ilocos Region),PH010000000,Ilocos Norte,PH012800000,Badoc,PH012803000,21,3,2133,1814,420,326
3,Region I (Ilocos Region),PH010000000,Ilocos Norte,PH012800000,Bangui,PH012804000,13,3,985,976,672,635
4,Region I (Ilocos Region),PH010000000,Ilocos Norte,PH012800000,Banna (Espiritu),PH012811000,17,4,1375,1228,517,473
...,...,...,...,...,...,...,...,...,...,...,...,...
1643,Autonomous Region in Muslim Mindanao (ARMM),PH150000000,Tawi-Tawi,PH157000000,Sitangkai,PH157005000,13,4,2423,2442,643,661
1644,Autonomous Region in Muslim Mindanao (ARMM),PH150000000,Tawi-Tawi,PH157000000,South Ubian,PH157006000,12,1,1482,1485,110,109
1645,Autonomous Region in Muslim Mindanao (ARMM),PH150000000,Tawi-Tawi,PH157000000,Tandubas,PH157007000,15,2,1918,1961,330,367
1646,Autonomous Region in Muslim Mindanao (ARMM),PH150000000,Tawi-Tawi,PH157000000,Turtle Islands,PH157008000,3,1,411,416,107,122


#### Health Facility Type by Municipality

While the Philippine Pre-disaster indicator set of datasets has two versions of the health facility type dataset, the more recent version will be used, which has been updated as of January 27, 2021.

In [None]:
df_health_facility_type = pd.read_excel('/content/drive/Shareddrives/[DATA101] - Data Visualization/project/data/pre-disaster/excel datasets/health-facility-type-my-municipality-2021.xlsx',
                                        sheet_name=2)
df_health_facility_type

Unnamed: 0,Region Name,Region PSGC,Province Name,Province PSGC,City/Municipality Name,City/Municipality PSGC,Ownership Major Classification,Ambulatory Surgical Clinic,Animal Bite Treatment Center,Barangay Health Station,...,Drug Testing Laboratory,General Clinic Laboratory,Hospital,Infirmary,Municipal Health Office,Provincial Health Office,Psychiatric Care Facility,Rural Health Unit,Social hygiene Clinic,Total
0,AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM),PH150000000,BASILAN,PH150700000,AKBAR,PH150708000,Government,0,0,9,...,0,0,0,0,0,0,0,1,0,11
1,AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM),PH150000000,BASILAN,PH150700000,AL-BARKA,PH150709000,Government,0,0,4,...,0,0,0,0,0,0,0,1,0,5
2,AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM),PH150000000,BASILAN,PH150700000,CITY OF LAMITAN (Capital),PH150702000,Government,0,0,15,...,0,0,1,0,0,0,0,3,0,23
3,AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM),PH150000000,BASILAN,PH150700000,CITY OF LAMITAN (Capital),PH150702000,Private,0,0,0,...,0,0,1,0,0,0,0,0,0,2
4,AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM),PH150000000,BASILAN,PH150700000,HADJI MOHAMMAD AJUL,PH150710000,Government,0,0,5,...,0,0,0,0,0,0,0,1,0,6
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2314,REGION XIII (CARAGA),PH160000000,SURIGAO DEL SUR,PH166800000,MARIHATAG,PH166814000,Government,0,0,11,...,0,0,0,1,0,0,0,1,0,13
2315,REGION XIII (CARAGA),PH160000000,SURIGAO DEL SUR,PH166800000,SAN AGUSTIN,PH166815000,Government,0,0,10,...,0,0,0,0,0,0,0,1,0,12
2316,REGION XIII (CARAGA),PH160000000,SURIGAO DEL SUR,PH166800000,SAN MIGUEL,PH166816000,Government,0,0,14,...,0,0,0,1,0,0,0,1,0,17
2317,REGION XIII (CARAGA),PH160000000,SURIGAO DEL SUR,PH166800000,TAGBINA,PH166817000,Government,0,0,24,...,0,0,0,0,0,0,0,1,0,26


### Data Cleaning and Aggregating by Region and Province

Since we plan to link the pre-disaster indicator datasets to our flood and landslide risk datasets and map these out by **region and province**, data cleaning and aggregation is needed to transform the data that are displayed by city and municipality. Such datasets that need cleaning and aggregating are the following:

1. Construction Materials of Outer Walls and Roof by City and Municipality (by category)
1. Health Personnel by City and Municipality
1. Kind of Toilet Facility by City and Municipality
1. No. of Barangay by City and Municipality
1. No. of Evacuation Center by City and Municipality
1. No. of Household by City and Municipality
1. Source of Water for Drinking by City and Municipality
1. Vulnerable Groups by City and Municipality
1. Pantawid Pamilya Beneficiary Households and Household Members by Sex and Age Group
1. DepEd No. of Schools and Enrolment with Sex Disaggregation by Municipality
1. Health Facility Type by Municipality

With the exception of the `Health Facility Type by Municipality` and `DepEd No. of Schools and Enrolment with Sex Disaggregation by Municipality` datasets, the other datasets can have their data grouped by `Region`, `Region Code`, and `Province` using the `groupby()` and `agg()` functions of pandas, then the `lambda` function of Python to get the sum of numeric columns.

For consistency, data for `Region` and `Province` will be converted to uppercase and sorted alphabetically when applicable.

#### 1. Construction Materials of Outer Walls and Roof by City and Municipality (by category)

In [None]:
# changing data to uppercase
df_wall_roof_categories['Region'] = df_wall_roof_categories['Region'].str.upper()
df_wall_roof_categories['Province'] = df_wall_roof_categories['Province'].str.upper()

df_wall_roof_categories = df_wall_roof_categories.groupby(['Region', 'Region Code', 'Province']).agg(lambda x: sum(x))

df_wall_roof_categories

  df_wall_roof_categories = df_wall_roof_categories.groupby(['Region', 'Region Code', 'Province']).agg(lambda x: sum(x))


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Housing Units,Strong Roof/Strong Wall,Strong Roof/Light Wall,Strong Roof/Salvage Wall,Light Roof/Strong Wall,Light Roof/Light Wall,Light Roof/Salvage Wall,Salvaged Roof/Strong Wall,Salvaged Roof/Light Wall,Salvaged Roof/Salvage Wall
Region,Region Code,Province,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM),PH150000000,BASILAN,57704,10991,24572,58,530,21274,45,7,54,53
AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM),PH150000000,LANAO DEL SUR,121372,48766,66336,75,266,4676,30,17,120,114
AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM),PH150000000,MAGUINDANAO,190029,52730,65505,256,3465,65200,567,45,279,785
AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM),PH150000000,SULU,135083,14958,60749,16,875,56878,51,28,218,42
AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM),PH150000000,TAWI-TAWI,57638,6929,37403,21,170,12771,47,4,27,15
...,...,...,...,...,...,...,...,...,...,...,...,...
REGION XIII (CARAGA),PH160000000,AGUSAN DEL NORTE,151197,57472,52851,216,2104,37189,248,74,351,190
REGION XIII (CARAGA),PH160000000,AGUSAN DEL SUR,151934,35664,67337,311,694,46165,262,15,168,87
REGION XIII (CARAGA),PH160000000,DINAGAT ISLANDS,27902,11620,10567,77,235,5076,107,8,50,73
REGION XIII (CARAGA),PH160000000,SURIGAO DEL NORTE,106660,54034,24315,182,2537,24816,239,35,136,100



#### 2. Health Personnel by City and Municipality

In [None]:
# changing data to uppercase
df_health_personnel['Region'] = df_health_personnel['Region'].str.upper()
df_health_personnel['Province'] = df_health_personnel['Province'].str.upper()

df_health_personnel_by_region_and_province = df_health_personnel.groupby(['Region', 'Region Code', 'Province']).agg(lambda x: sum(x))
df_health_personnel_by_region_and_province

  df_health_personnel_by_region_and_province = df_health_personnel.groupby(['Region', 'Region Code', 'Province']).agg(lambda x: sum(x))


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,DOCTOR,NURSE,MIDWIFE,DENTIST,NUTRITIONIST/DIETICIAN,PHARMACIST,OCCUPATIONAL THERAPIST,MEDICAL TECHNOLOGIST,PHYSICAL THERAPIST,RADIOLOGY TECHNOLOGIST
Region,Region Code,Province,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM),PH150000000,BASILAN,9,26,5,2,1,2,0,3,0,2
AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM),PH150000000,LANAO DEL SUR,136,185,98,9,4,16,0,20,0,4
AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM),PH150000000,MAGUINDANAO,40,131,63,6,4,7,0,15,2,3
AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM),PH150000000,SULU,40,69,45,6,1,6,0,13,0,1
AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM),PH150000000,TAWI-TAWI,24,71,92,5,4,5,0,15,0,2
...,...,...,...,...,...,...,...,...,...,...,...,...
REGION XIII (CARAGA),PH160000000,AGUSAN DEL NORTE,158,440,237,20,10,30,0,65,5,20
REGION XIII (CARAGA),PH160000000,AGUSAN DEL SUR,61,196,165,14,5,14,0,34,0,9
REGION XIII (CARAGA),PH160000000,DINAGAT ISLANDS,16,23,50,0,1,1,0,4,0,0
REGION XIII (CARAGA),PH160000000,SURIGAO DEL NORTE,91,216,149,11,7,18,0,32,0,4


#### 3. Kind of Toilet Facility by City and Municipality

In [None]:
# changing data to uppercase
df_toilet_facility['Region'] = df_toilet_facility['Region'].str.upper()
df_toilet_facility['Province'] = df_toilet_facility['Province'].str.upper()

df_toilet_facility_by_region_and_province = df_toilet_facility.groupby(['Region', 'Region Code', 'Province']).agg(lambda x: sum(x))
df_toilet_facility_by_region_and_province

  df_toilet_facility_by_region_and_province = df_toilet_facility.groupby(['Region', 'Region Code', 'Province']).agg(lambda x: sum(x))


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Water Sealed,Closed pit,Open Pit,none
Region,Region Code,Province,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM),PH150000000,BASILAN,16633,10267,12201,8821
AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM),PH150000000,LANAO DEL SUR,56769,48687,29034,4436
AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM),PH150000000,MAGUINDANAO,54492,48056,36851,12531
AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM),PH150000000,SULU,28669,32873,46104,6921
AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM),PH150000000,TAWI-TAWI,9556,20000,29194,1847
...,...,...,...,...,...,...
REGION XIII (CARAGA),PH160000000,AGUSAN DEL NORTE,113955,6786,1990,9344
REGION XIII (CARAGA),PH160000000,AGUSAN DEL SUR,104811,14820,4728,8410
REGION XIII (CARAGA),PH160000000,DINAGAT ISLANDS,22324,2006,326,2588
REGION XIII (CARAGA),PH160000000,SURIGAO DEL NORTE,79553,3770,2468,7694


#### 4. No. of Barangay by City and Municipality

In [None]:
# changing data to uppercase
df_no_of_brgy['Region'] = df_no_of_brgy['Region'].str.upper()
df_no_of_brgy['Province'] = df_no_of_brgy['Province'].str.upper()

df_no_of_brgy_by_region_and_province = df_no_of_brgy.groupby(['Region', 'Region Code', 'Province', 'Province Code']).agg(lambda x: sum(x))
df_no_of_brgy_by_region_and_province

  df_no_of_brgy_by_region_and_province = df_no_of_brgy.groupby(['Region', 'Region Code', 'Province', 'Province Code']).agg(lambda x: sum(x))


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Number of Barangay
Region,Region Code,Province,Province Code,Unnamed: 4_level_1
AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM),PH150000000,BASILAN,PH150700000,210
AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM),PH150000000,LANAO DEL SUR,PH153600000,1159
AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM),PH150000000,MAGUINDANAO,PH153800000,508
AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM),PH150000000,SULU,PH156600000,410
AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM),PH150000000,TAWI-TAWI,PH157000000,203
...,...,...,...,...
REGION XIII (CARAGA),PH160000000,AGUSAN DEL NORTE,PH160200000,253
REGION XIII (CARAGA),PH160000000,AGUSAN DEL SUR,PH160300000,314
REGION XIII (CARAGA),PH160000000,DINAGAT ISLANDS,PH168500000,100
REGION XIII (CARAGA),PH160000000,SURIGAO DEL NORTE,PH166700000,335


#### 5. No. of Evacuation Center by City and Municipality

In [None]:
# changing data to uppercase
df_no_of_evac_center['Region'] = df_no_of_evac_center['Region'].str.upper()
df_no_of_evac_center['Province'] = df_no_of_evac_center['Province'].str.upper()

df_no_of_evac_center_by_region_and_province = df_no_of_evac_center.groupby(['Region', 'Region Code', 'Province', 'Province Code']).agg(lambda x: sum(x))
df_no_of_evac_center_by_region_and_province

  df_no_of_evac_center_by_region_and_province = df_no_of_evac_center.groupby(['Region', 'Region Code', 'Province', 'Province Code']).agg(lambda x: sum(x))


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Number of Evacuation Center
Region,Region Code,Province,Province Code,Unnamed: 4_level_1
AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM),PH150000000,BASILAN,PH150700000,0
AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM),PH150000000,LANAO DEL SUR,PH153600000,0
AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM),PH150000000,MAGUINDANAO,PH153800000,0
AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM),PH150000000,SULU,PH156600000,0
AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM),PH150000000,TAWI-TAWI,PH157000000,0
...,...,...,...,...
REGION XIII (CARAGA),PH160000000,AGUSAN DEL NORTE,PH160200000,215
REGION XIII (CARAGA),PH160000000,AGUSAN DEL SUR,PH160300000,127
REGION XIII (CARAGA),PH160000000,DINAGAT ISLANDS,PH168500000,234
REGION XIII (CARAGA),PH160000000,SURIGAO DEL NORTE,PH166700000,224


#### 6. No. of Household by City and Municipality

In [None]:
# changing data to uppercase
df_no_of_household['Region'] = df_no_of_household['Region'].str.upper()
df_no_of_household['Province'] = df_no_of_household['Province'].str.upper()

df_no_of_household_by_region_and_province = df_no_of_household.groupby(['Region', 'Region Code', 'Province', 'Province Code']).agg(lambda x: sum(x))
df_no_of_household_by_region_and_province

  df_no_of_household_by_region_and_province = df_no_of_household.groupby(['Region', 'Region Code', 'Province', 'Province Code']).agg(lambda x: sum(x))


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Number of Household
Region,Region Code,Province,Province Code,Unnamed: 4_level_1
AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM),PH150000000,BASILAN,PH150700000,59860
AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM),PH150000000,LANAO DEL SUR,PH153600000,160132
AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM),PH150000000,MAGUINDANAO,PH153800000,194507
AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM),PH150000000,SULU,PH156600000,138357
AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM),PH150000000,TAWI-TAWI,PH157000000,67529
...,...,...,...,...
REGION XIII (CARAGA),PH160000000,AGUSAN DEL NORTE,PH160200000,153857
REGION XIII (CARAGA),PH160000000,AGUSAN DEL SUR,PH160300000,153653
REGION XIII (CARAGA),PH160000000,DINAGAT ISLANDS,PH168500000,28557
REGION XIII (CARAGA),PH160000000,SURIGAO DEL NORTE,PH166700000,108814


#### 7.1. Source of Water for Drinking by City and Municipality (by source)

In [None]:
# changing data to uppercase
df_source_of_water['Region'] = df_source_of_water['Region'].str.upper()
df_source_of_water['Province'] = df_source_of_water['Province'].str.upper()

df_source_of_water_by_region_and_province = df_source_of_water.groupby(['Region', 'Region Code', 'Province', 'Province Code']).agg(lambda x: sum(x))
df_source_of_water_by_region_and_province

  df_source_of_water_by_region_and_province = df_source_of_water.groupby(['Region', 'Region Code', 'Province', 'Province Code']).agg(lambda x: sum(x))


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Number of Households*,Own use faucet community water system,Shared faucet community water system,Own use tubed/piped deep well,Shared tubed/piped deep well,Tubed/piped shallow well,Dug well,Protected spring,Unprotected spring,"Lake, river, rain and others",Peddler,Bottled water,Others,Not Reported
Region,Region Code,Province,Province Code,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM),PH150000000,BASILAN,PH150700000,59860,9081,12206,2321,10207,1059,16782,1578,969,1827,2945,784,101,0
AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM),PH150000000,LANAO DEL SUR,PH153600000,160132,23392,24144,9430,18445,5552,10342,18292,12009,33519,1614,3106,287,0
AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM),PH150000000,MAGUINDANAO,PH153800000,194507,14046,17392,25735,54427,7385,37104,14668,5467,6451,6868,4098,866,0
AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM),PH150000000,SULU,PH156600000,138357,18149,24143,4386,20779,2610,37578,2737,405,11604,14302,1451,213,0
AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM),PH150000000,TAWI-TAWI,PH157000000,67529,5113,5315,942,4784,892,22147,1486,31,21817,1257,2839,906,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
REGION XIII (CARAGA),PH160000000,AGUSAN DEL NORTE,PH160200000,153815,21041,25342,3979,35403,1317,1500,6739,1557,237,9305,47339,56,0
REGION XIII (CARAGA),PH160000000,AGUSAN DEL SUR,PH160300000,153653,15195,29868,3657,25754,2040,9103,16814,7437,9814,9474,24330,167,0
REGION XIII (CARAGA),PH160000000,DINAGAT ISLANDS,PH168500000,28557,9501,11485,776,2217,342,671,1654,204,451,332,880,44,0
REGION XIII (CARAGA),PH160000000,SURIGAO DEL NORTE,PH166700000,108814,37351,24745,1525,6422,638,1336,3866,460,1864,2101,28389,117,0


#### 7.2. Source of Water for Drinking by City and Municipality (by categories)

In [None]:
# changing data to uppercase
df_source_of_water_categories['Region'] = df_source_of_water_categories['Region'].str.upper()
df_source_of_water_categories['Province'] = df_source_of_water_categories['Province'].str.upper()

df_source_of_water_categories_by_region_and_province = df_source_of_water_categories.groupby(['Region', 'Region Code', 'Province', 'Province Code']).agg(lambda x: sum(x))
df_source_of_water_categories_by_region_and_province

  df_source_of_water_categories_by_region_and_province = df_source_of_water_categories.groupby(['Region', 'Region Code', 'Province', 'Province Code']).agg(lambda x: sum(x))


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Number of Households,Faucet/Community System,Tubed/Piped,Dug well,Bottled Water,Natural Sources,Peddler/Others/Not Reported
Region,Region Code,Province,Province Code,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM),PH150000000,BASILAN,PH150700000,59860,21287,13587,16782,784,4374,3046
AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM),PH150000000,LANAO DEL SUR,PH153600000,160132,47536,33427,10342,3106,63820,1901
AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM),PH150000000,MAGUINDANAO,PH153800000,194507,31438,87547,37104,4098,26586,7734
AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM),PH150000000,SULU,PH156600000,138357,42292,27775,37578,1451,14746,14515
AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM),PH150000000,TAWI-TAWI,PH157000000,67529,10428,6618,22147,2839,23334,2163
...,...,...,...,...,...,...,...,...,...,...
REGION XIII (CARAGA),PH160000000,AGUSAN DEL NORTE,PH160200000,153815,46383,40699,1500,47339,8533,9361
REGION XIII (CARAGA),PH160000000,AGUSAN DEL SUR,PH160300000,153653,45063,31451,9103,24330,34065,9641
REGION XIII (CARAGA),PH160000000,DINAGAT ISLANDS,PH168500000,28557,20986,3335,671,880,2309,376
REGION XIII (CARAGA),PH160000000,SURIGAO DEL NORTE,PH166700000,108814,62096,8585,1336,28389,6190,2218


#### 8. Vulnerable Groups by City and Municipality

In [None]:
# changing data to uppercase
df_vulnerable_groups['Region'] = df_vulnerable_groups['Region'].str.upper()
df_vulnerable_groups['Province'] = df_vulnerable_groups['Province'].str.upper()

df_vulnerable_groups_by_region_and_province = df_vulnerable_groups.groupby(['Region', 'Region Code', 'Province', 'Province Code']).agg(lambda x: sum(x))
df_vulnerable_groups_by_region_and_province

  df_vulnerable_groups_by_region_and_province = df_vulnerable_groups.groupby(['Region', 'Region Code', 'Province', 'Province Code']).agg(lambda x: sum(x))


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Child Headed_Male,Child Headed_Female,Single Headed_Male,Single Headed_Female,Disability_Male,Disability_Female,Solo Parent_Male,Solo Parent_Female,Older_Male,Older_Female
Region,Region Code,Province,Province Code,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM),PH150000000,BASILAN,PH150700000,152,31,580,118,779,688,1707,4395,2003,1944
AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM),PH150000000,LANAO DEL SUR,PH153600000,489,104,1395,623,1705,1270,8300,20914,6060,6186
AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM),PH150000000,MAGUINDANAO,PH153800000,567,65,1185,446,1557,1004,4838,11047,7963,6804
AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM),PH150000000,SULU,PH156600000,437,66,1006,703,1664,1464,5107,12997,5687,6334
AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM),PH150000000,TAWI-TAWI,PH157000000,82,14,287,106,777,653,1050,2394,1994,1875
...,...,...,...,...,...,...,...,...,...,...,...,...,...
REGION XIII (CARAGA),PH160000000,AGUSAN DEL NORTE,PH160200000,59,18,1047,398,2354,1775,2206,5390,4402,5038
REGION XIII (CARAGA),PH160000000,AGUSAN DEL SUR,PH160300000,97,25,1060,337,3559,2604,2182,4432,6136,6463
REGION XIII (CARAGA),PH160000000,DINAGAT ISLANDS,PH168500000,8,2,170,151,646,578,344,1082,1952,2145
REGION XIII (CARAGA),PH160000000,SURIGAO DEL NORTE,PH166700000,47,5,936,332,2309,1981,1273,3291,4364,5758


For the `DepEd No. of Schools and Enrolment with Sex Disaggregation by Municipality`, `Pantawid Pamilya Beneficiary Households and Household Members by Sex and Age Group`, and `Health Facility Type by Municipality` datasets, some renaming of columns is needed before data can be aggregated.

Additionally, the region values of `DepEd No. of Schools and Enrolment with Sex Disaggregation by Municipality` and `Pantawid Pamilya Beneficiary Households and Household Members by Sex and Age Group` datasets are not sorted, so these will be sorted alphabetically for consistency.

#### 9. Pantawid Pamilya Beneficiary Households and Household Members by Sex and Age Group

In [None]:
# changing data to uppercase
df_pantawid_pamilya['Region'] = df_pantawid_pamilya['Region'].str.upper()
df_pantawid_pamilya['Province'] = df_pantawid_pamilya['Province'].str.upper()

# rename 'PCODE_REG', 'PCODE_PROV' columns
df_pantawid_pamilya.rename(columns = {'PCODE_REG':'Region Code', 'PCODE_PROV':'Province Code'}, inplace = True)

# sort region alphabetically
df_pantawid_pamilya = df_pantawid_pamilya.sort_values(['Region', 'Province'], ignore_index=True)

# aggregate values
df_pantawid_pamilya = df_pantawid_pamilya.groupby(['Region', 'Region Code', 'Province', 'Province Code']).agg(lambda x: sum(x))


df_pantawid_pamilya

  df_pantawid_pamilya = df_pantawid_pamilya.groupby(['Region', 'Region Code', 'Province', 'Province Code']).agg(lambda x: sum(x))


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Total # of Active HHs,HH MEM_M_5 to 9,HH MEM_M_10 to 14,HH MEM_M_15 to 19,HH MEM_M_20 to 24,HH MEM_M_25 to 29,HH MEM_M_30 to 34,HH MEM_M_35 to 39,HH MEM_M_40 to 44,HH MEM_M_45 to 49,...,HH MEM_F_40 to 44,HH MEM_F_45 to 49,HH MEM_F_50 to 54,HH MEM_F_55 to 59,HH MEM_F_60 to 64,HH MEM_F_65 to 69,HH MEM_F_70 to 74,HH MEM_F_75 to 79,HH MEM_F_80 and above,Total HH MEM_F
Region,Region Code,Province,Province Code,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1
AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM),PH150000000,BASILAN,PH150700000,26878,5058,15505,14648,8804,5868,4247,5364,4242,4558,...,4677,4341,2330,1478,685,497,253,226,185,75980
AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM),PH150000000,LANAO DEL SUR,PH153600000,85110,13425,58574,57209,32002,19109,13362,16903,14068,14735,...,15466,15072,8620,5595,2755,2153,1239,1001,859,275263
AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM),PH150000000,MAGUINDANAO,PH153800000,145538,23656,79096,75613,37974,26606,24078,31214,24539,24666,...,24247,22189,11725,7943,3486,2700,1227,827,541,375607
AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM),PH150000000,SULU,PH156600000,70761,10626,40924,40661,21813,14747,11250,15462,12519,13829,...,13227,12838,6094,4284,1720,1496,738,680,655,207485
AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM),PH150000000,TAWI-TAWI,PH157000000,34795,9421,17289,15335,8541,7073,5859,6300,5097,5172,...,5331,4822,2928,2118,1055,795,406,319,305,93898
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
REGION XIII (CARAGA),PH160000000,AGUSAN DEL NORTE,PH160200000,38947,7643,21140,18430,12790,10139,7632,7369,6632,6437,...,6650,6275,4799,3317,1888,1289,855,630,722,108753
REGION XIII (CARAGA),PH160000000,AGUSAN DEL SUR,PH160300000,54986,9813,28430,24787,16688,13889,10883,10328,9598,9074,...,9233,8293,6208,4124,2145,1522,943,637,833,146378
REGION XIII (CARAGA),PH160000000,DINAGAT ISLANDS,PH168500000,8993,1040,4572,4465,3032,2240,1565,1645,1655,1569,...,1579,1640,1229,781,542,420,368,276,317,25236
REGION XIII (CARAGA),PH160000000,SURIGAO DEL NORTE,PH166700000,32541,4094,16463,16481,11394,9008,6476,6375,6280,5770,...,6171,5607,4491,2929,1785,1300,940,875,1078,92091


#### 10. DepEd No. of Schools and Enrolment with Sex Disaggregation by Municipality

The `DepEd No. of Schools and Enrolment with Sex Disaggregation by Municipality` dataset particularly has a modified code for regions so this will first be replaced with data copied from the `Region` column, then replaced with the corresponding PSGC code obtained from a lookup created from another dataset (`Vulnerable Groups`).

In [None]:
# rename 'Region code' and 'Province code' columns
df_no_of_schools_and_enrollment.rename(columns = {'Region code':'Region Code'}, inplace = True)

# changing data to uppercase
df_no_of_schools_and_enrollment['Region'] = df_no_of_schools_and_enrollment['Region'].str.upper()
df_no_of_schools_and_enrollment['Province'] = df_no_of_schools_and_enrollment['Province'].str.upper()

# copying region values to region code column
df_no_of_schools_and_enrollment['Region Code'] = df_no_of_schools_and_enrollment['Region']

# creating a lookup for replacement
lookup_region_region_code = dict(zip(df_vulnerable_groups['Region'], df_vulnerable_groups['Region Code']))

# mapped replacement of region code data
df_no_of_schools_and_enrollment['Region Code'] = df_no_of_schools_and_enrollment['Region Code'].map(lookup_region_region_code)

# sort region alphabetically
df_no_of_schools_and_enrollment = df_no_of_schools_and_enrollment.sort_values(['Region', 'Province'], ignore_index=True)

df_no_of_schools_and_enrollment

Unnamed: 0,Region,Region Code,Province,Province code,City_Municipality,City_Mun Code,Elementary_school,Secondary_school,Enrollment_Elementary_Male,Enrollment_Elementary_Female,Enrollment_Secondary_Male,Enrollment_Secondary_Female
0,AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM),PH150000000,BASILAN,PH150700000,Akbar,PH150708000,6,1,820,781,119,111
1,AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM),PH150000000,BASILAN,PH150700000,Al-Barka,PH150709000,11,0,1078,1072,0,0
2,AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM),PH150000000,BASILAN,PH150700000,City of Lamitan (Capital),PH150702000,42,7,7922,7588,1798,2154
3,AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM),PH150000000,BASILAN,PH150700000,Hadji Mohammad Ajul,PH150710000,10,2,1080,1060,195,188
4,AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM),PH150000000,BASILAN,PH150700000,Hadji Muhtamad,PH150712000,3,2,529,495,250,333
...,...,...,...,...,...,...,...,...,...,...,...,...
1643,REGION XIII (CARAGA),PH160000000,SURIGAO DEL SUR,PH166800000,San Agustin,PH166815000,13,3,1712,1539,637,636
1644,REGION XIII (CARAGA),PH160000000,SURIGAO DEL SUR,PH166800000,San Miguel,PH166816000,42,8,4118,3863,1404,1404
1645,REGION XIII (CARAGA),PH160000000,SURIGAO DEL SUR,PH166800000,Tagbina,PH166817000,39,16,3607,3321,1665,1592
1646,REGION XIII (CARAGA),PH160000000,SURIGAO DEL SUR,PH166800000,Tago,PH166818000,30,5,3393,3070,1447,1484


Now the `DepEd No. of Schools and Enrolment with Sex Disaggregation by Municipality` dataset can be aggregated.

In [None]:
df_no_of_schools_and_enrollment_by_region_and_province = df_no_of_schools_and_enrollment.groupby(['Region', 'Region Code', 'Province']).agg(lambda x: sum(x))
df_no_of_schools_and_enrollment_by_region_and_province

  df_no_of_schools_and_enrollment_by_region_and_province = df_no_of_schools_and_enrollment.groupby(['Region', 'Region Code', 'Province']).agg(lambda x: sum(x))


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Elementary_school,Secondary_school,Enrollment_Elementary_Male,Enrollment_Elementary_Female,Enrollment_Secondary_Male,Enrollment_Secondary_Female
Region,Region Code,Province,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM),PH150000000,BASILAN,201,26,28508,27497,5571,7040
AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM),PH150000000,LANAO DEL SUR,742,137,83584,92840,21276,27699
AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM),PH150000000,MAGUINDANAO,552,81,97732,97349,18943,22387
AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM),PH150000000,SULU,416,40,48864,48852,9784,12520
AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM),PH150000000,TAWI-TAWI,238,24,28686,28472,4575,5368
...,...,...,...,...,...,...,...,...
REGION XIII (CARAGA),PH160000000,AGUSAN DEL NORTE,300,88,59695,54809,24917,25219
REGION XIII (CARAGA),PH160000000,AGUSAN DEL SUR,454,100,68476,62344,26968,27698
REGION XIII (CARAGA),PH160000000,DINAGAT ISLANDS,108,31,9439,8469,4778,4561
REGION XIII (CARAGA),PH160000000,SURIGAO DEL NORTE,347,81,40271,36466,19506,19334


#### 11. Health Facility Type by Municipality

In [None]:
# rename 'Region Name', 'Region PSGC', 'Province Name', 'Province PSGC' columns
df_health_facility_type.rename(columns = {'Region Name':'Region', 'Region PSGC':'Region Code',
                                                  'Province Name':'Province', 'Province PSGC':'Province Code'}, inplace = True)

# Region and Province data are already in uppercase, so proceed to aggregation
df_health_facility_type_by_region_and_province = df_health_facility_type.groupby(['Region', 'Region Code', 'Province']).agg(lambda x: sum(x))
df_health_facility_type_by_region_and_province

  df_health_facility_type_by_region_and_province = df_health_facility_type.groupby(['Region', 'Region Code', 'Province']).agg(lambda x: sum(x))


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Ambulatory Surgical Clinic,Animal Bite Treatment Center,Barangay Health Station,Birthing Home,City Health Office,COVID-19 Testing Laboratory,DepEd Clinic,Dialysis Clinic,Drug Abuse Treatment and Rehabilitation Centers,Drug Testing Laboratory,General Clinic Laboratory,Hospital,Infirmary,Municipal Health Office,Provincial Health Office,Psychiatric Care Facility,Rural Health Unit,Social hygiene Clinic,Total
Region,Region Code,Province,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM),PH150000000,BASILAN,0,0,67,13,0,0,0,0,0,0,0,2,1,0,0,0,15,0,98
AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM),PH150000000,LANAO DEL SUR,0,0,140,39,0,2,0,0,0,0,0,10,7,0,0,0,44,0,242
AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM),PH150000000,MAGUINDANAO,1,0,224,70,0,0,0,0,0,0,0,6,7,0,0,0,36,0,344
AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM),PH150000000,SULU,0,0,125,15,0,0,0,0,0,0,0,6,4,0,0,0,21,0,171
AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM),PH150000000,TAWI-TAWI,0,0,73,10,0,0,0,0,0,0,0,6,2,0,0,0,12,0,103
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
REGION XIII (CARAGA),PH160000000,AGUSAN DEL NORTE,0,0,237,29,0,3,0,2,0,0,12,7,11,0,0,0,13,0,314
REGION XIII (CARAGA),PH160000000,AGUSAN DEL SUR,0,0,162,5,0,1,3,0,0,0,0,5,7,0,0,0,14,0,197
REGION XIII (CARAGA),PH160000000,DINAGAT ISLANDS,0,0,26,0,0,0,0,0,0,0,0,0,3,0,0,0,7,0,36
REGION XIII (CARAGA),PH160000000,SURIGAO DEL NORTE,0,0,161,1,0,2,0,0,1,0,0,5,5,0,0,0,26,0,201


### Data Cleaning for Philippine Population (as of 2020)

The dataset for Philippine population as of 2020 has data on population count by region that is formatted differently, therefore a new dataframe will be constructed to extract the needed data.

Upon examination, the first column combines data on singe-year age and region. This will be transformed into separate columns to format it similarly to the other datasets.

To begin, entire rows with `NaN` data and the last four rows containing footnote information are dropped, while the header row labels is renamed for better processing.

In [None]:
df_ph_population_dropped = df_ph_population.dropna(how='all')

# dropping rows with footnotes
df_ph_population_dropped.drop(df_ph_population_dropped.tail(4).index, inplace = True)

# rename headers
df_ph_population_dropped.rename(columns = {'Unnamed: 0':'Single-Year Age and Region', 'Both Sexes': 'Both Sexes Total',
                                           'Male': 'Male Total', 'Female': 'Female Total',
                                           'Both Sexes.1': 'Both Sexes Household Total',
                                           'Male.1': 'Male Household Total', 'Female.1': 'Female Household Total'}, inplace = True)

df_ph_population_dropped

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_ph_population_dropped.drop(df_ph_population_dropped.tail(4).index, inplace = True)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_ph_population_dropped.rename(columns = {'Unnamed: 0':'Single-Year Age and Region', 'Both Sexes': 'Both Sexes Total',


Unnamed: 0,Single-Year Age and Region,Both Sexes Total,Male Total,Female Total,Both Sexes Household Total,Male Household Total,Female Household Total
2,PHILIPPINES,,,,,,
3,Total,109033245.0,55306793.0,53726452.0,108667043.0,55017643.0,53649400.0
4,Under 1,2136032.0,1109532.0,1026500.0,2135411.0,1109050.0,1026361.0
5,1,2199834.0,1133593.0,1066241.0,2199452.0,1133412.0,1066040.0
6,2,2218716.0,1145369.0,1073347.0,2218136.0,1145070.0,1073066.0
...,...,...,...,...,...,...,...
1508,76,2385.0,1187.0,1198.0,2382.0,1186.0,1196.0
1509,77,2547.0,1246.0,1301.0,2543.0,1244.0,1299.0
1510,78,2267.0,1145.0,1122.0,2267.0,1145.0,1122.0
1511,79,2442.0,1169.0,1273.0,2440.0,1169.0,1271.0


This dropped and relabeled dataset can now have its `Single-Year Age and Region` column separated, starting with adding a new column for region data.

In [None]:
df_ph_population_new = df_ph_population_dropped.copy()
count = 0
df_ph_population_new['Both Sexes Total'].isnull()

for i in df_ph_population_new.index:
  if pd.isnull(df_ph_population_new.at[i, 'Both Sexes Total']): # check if null value exists at 'Both Sexes Total' column in row i
    region = df_ph_population_new.at[i, 'Single-Year Age and Region'] # if so, update region value

  else:
    df_ph_population_new.at[i, 'Region']  = region # if row has no null value, fill in new region column

df_ph_population_new

Unnamed: 0,Single-Year Age and Region,Both Sexes Total,Male Total,Female Total,Both Sexes Household Total,Male Household Total,Female Household Total,Region
2,PHILIPPINES,,,,,,,
3,Total,109033245.0,55306793.0,53726452.0,108667043.0,55017643.0,53649400.0,PHILIPPINES
4,Under 1,2136032.0,1109532.0,1026500.0,2135411.0,1109050.0,1026361.0,PHILIPPINES
5,1,2199834.0,1133593.0,1066241.0,2199452.0,1133412.0,1066040.0,PHILIPPINES
6,2,2218716.0,1145369.0,1073347.0,2218136.0,1145070.0,1073066.0,PHILIPPINES
...,...,...,...,...,...,...,...,...
1508,76,2385.0,1187.0,1198.0,2382.0,1186.0,1196.0,AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM)
1509,77,2547.0,1246.0,1301.0,2543.0,1244.0,1299.0,AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM)
1510,78,2267.0,1145.0,1122.0,2267.0,1145.0,1122.0,AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM)
1511,79,2442.0,1169.0,1273.0,2440.0,1169.0,1271.0,AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM)


The new `Region` column can now be placed as the first column, the `Single-Year Age and Region` column is renamed to `Single-Year Age`, while all rows with `NaN` values are dropped.

Meanwhile sorting the regions alphabetically was skipped in favor of preserving the order of data under the `Single-Year Age` column, which is intuitive by default, beginning with the total count, those aged under a year old, then one year old, two years old, and so on until the age of 80 years and older.

In [None]:
# make 'Region' column the first column
col = df_ph_population_new.pop("Region")
df_ph_population_new.insert(0, col.name, col)

# rename 'Single-Year Age and Region' column
df_ph_population_new.rename(columns = {'Single-Year Age and Region':'Single-Year Age'}, inplace=True)

# drop all rows with null values
df_ph_population_new = df_ph_population_new.dropna()

# capitalize region data
df_ph_population_new['Region'] = df_ph_population_new['Region'].str.upper()

df_ph_population_new

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_ph_population_new['Region'] = df_ph_population_new['Region'].str.upper()


Unnamed: 0,Region,Single-Year Age,Both Sexes Total,Male Total,Female Total,Both Sexes Household Total,Male Household Total,Female Household Total
3,PHILIPPINES,Total,109033245.0,55306793.0,53726452.0,108667043.0,55017643.0,53649400.0
4,PHILIPPINES,Under 1,2136032.0,1109532.0,1026500.0,2135411.0,1109050.0,1026361.0
5,PHILIPPINES,1,2199834.0,1133593.0,1066241.0,2199452.0,1133412.0,1066040.0
6,PHILIPPINES,2,2218716.0,1145369.0,1073347.0,2218136.0,1145070.0,1073066.0
7,PHILIPPINES,3,2236525.0,1154752.0,1081773.0,2235907.0,1154441.0,1081466.0
...,...,...,...,...,...,...,...,...
1508,AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM),76,2385.0,1187.0,1198.0,2382.0,1186.0,1196.0
1509,AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM),77,2547.0,1246.0,1301.0,2543.0,1244.0,1299.0
1510,AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM),78,2267.0,1145.0,1122.0,2267.0,1145.0,1122.0
1511,AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM),79,2442.0,1169.0,1273.0,2440.0,1169.0,1271.0


### Data Cleaning for Remaining Datasets

The `Philippine Standard Geographic Code (PSGC)` dataset serves as the main lookup dataframe for regions, provinces, and their respective codes; while the `Construction Materials of Outer Walls and Roof by City and Municipality (by materials)` dataset has a different table structure from the other datasets categorized by city and municipality.

Due to their difference in data structures compared to other datasets, these will be kept separate and will simply have their region and province data capitalized and renamed for consistency.

#### Philippine Standard Geographic Code (PSGC)

In [None]:
# changing data to uppercase
df_psgc['Region'] = df_psgc['Region'].str.upper()
df_psgc['Province'] = df_psgc['Province'].str.upper()

df_psgc

Unnamed: 0,Region,Region Code,Province,Province Code,City/Mun,City/Mun Code,Bgy,Bgy Code
0,REGION I (ILOCOS REGION),PH010000000,ILOCOS NORTE,PH012800000,Adams,PH012801000,Adams (Pob.),PH012801001
1,REGION I (ILOCOS REGION),PH010000000,ILOCOS NORTE,PH012800000,Bacarra,PH012802000,Bani,PH012802001
2,REGION I (ILOCOS REGION),PH010000000,ILOCOS NORTE,PH012800000,Bacarra,PH012802000,Buyon,PH012802002
3,REGION I (ILOCOS REGION),PH010000000,ILOCOS NORTE,PH012800000,Bacarra,PH012802000,Cabaruan,PH012802003
4,REGION I (ILOCOS REGION),PH010000000,ILOCOS NORTE,PH012800000,Bacarra,PH012802000,Cabulalaan,PH012802004
...,...,...,...,...,...,...,...,...
42039,REGION XIII (CARAGA),PH160000000,DINAGAT ISLANDS,PH168500000,Tubajon,PH168507000,Diaz (Romualdez),PH168507005
42040,REGION XIII (CARAGA),PH160000000,DINAGAT ISLANDS,PH168500000,Tubajon,PH168507000,Roxas,PH168507006
42041,REGION XIII (CARAGA),PH160000000,DINAGAT ISLANDS,PH168500000,Tubajon,PH168507000,San Roque (Pob.),PH168507007
42042,REGION XIII (CARAGA),PH160000000,DINAGAT ISLANDS,PH168500000,Tubajon,PH168507000,San Vicente (Pob.),PH168507008


#### Construction Materials of Outer Walls and Roof by City and Municipality (by materials)

In [None]:
# changing data to uppercase
df_wall_roof_materials['Region'] = df_wall_roof_materials['Region'].str.upper()
df_wall_roof_materials['Province'] = df_wall_roof_materials['Province'].str.upper()

df_wall_roof_materials

Unnamed: 0,Region,Region Code,Province,Province Code,Municipality_City,Municipality_City Code,Construction Materials of the Outer Walls,Total Occupied Housing Units,Galvanized iron/aluminum,Tile/concrete/clay tile,Half galvanized iron and half concrete,Bamboo/cogon/nipa/anahaw,Asbestos,Makeshift/ salvaged/ improvised materials,Trapal,Others,Not Reported
0,AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM),PH150000000,BASILAN,PH150700000,Akbar,PH150708000,Makeshift/salvaged/improvised materials,0,0,0,0,0,0,0,0,0,0
1,AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM),PH150000000,BASILAN,PH150700000,Akbar,PH150708000,Trapal,0,0,0,0,0,0,0,0,0,0
2,AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM),PH150000000,BASILAN,PH150700000,Akbar,PH150708000,Others,0,0,0,0,0,0,0,0,0,0
3,AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM),PH150000000,BASILAN,PH150700000,Akbar,PH150708000,No walls,0,0,0,0,0,0,0,0,0,0
4,AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM),PH150000000,BASILAN,PH150700000,Akbar,PH150708000,Asbestos,1,1,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
21224,REGION XIII (CARAGA),PH160000000,SURIGAO DEL SUR,PH166800000,Tago,PH166818000,Bamboo/sawali/cogon/nipa,271,55,0,0,216,0,0,0,0,0
21225,REGION XIII (CARAGA),PH160000000,SURIGAO DEL SUR,PH166800000,Tago,PH166818000,Half concrete/brick/stone and half wood,381,313,0,3,65,0,0,0,0,0
21226,REGION XIII (CARAGA),PH160000000,SURIGAO DEL SUR,PH166800000,Tago,PH166818000,Concrete/brick/stone,1039,979,6,2,51,0,1,0,0,0
21227,REGION XIII (CARAGA),PH160000000,SURIGAO DEL SUR,PH166800000,Tago,PH166818000,Wood,6105,1934,0,0,4169,0,0,2,0,0


### Merging Datasets

Some the cleaned datasets will be merged, given that they have the same keys (Region, Region Code, Province, and Province Code) and the same number of rows. This is done to reduce the number of total datasets for easier visualization development.

The following datasets will be merged into one due to each not only having similar structure upon aggregating, but also displaying numeric data comprising of only one column respectively:
1. No. of Barangay by City and Municipality
1. No. of Evacuation Center by City and Municipality
1. No. of Household by City and Municipality

The other set of datasets to be merged also due to their similarity comprises of the two datasets derived from `Source of Water for Drinking`(by source and by category).

#### Merging `No. of Barangay by City and Municipality`, `No. of Evacuation Center by City and Municipality`, and `No. of Household by City and Municipality`

In [None]:
# merge 'No of Brgy' and 'No of Evac Center' datasets
df_no_of_brgy_evac_by_region_and_province = pd.merge(df_no_of_brgy_by_region_and_province, df_no_of_evac_center_by_region_and_province,
                                                     on=['Region', 'Region Code', 'Province', 'Province Code'], how='left')

# merge with 'No of Household'
df_no_of_brgy_evac_household_by_region_and_province = pd.merge(df_no_of_brgy_evac_by_region_and_province, df_no_of_household_by_region_and_province,
                                                     on=['Region', 'Region Code', 'Province', 'Province Code'], how='left')

df_no_of_brgy_evac_household_by_region_and_province

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Number of Barangay,Number of Evacuation Center,Number of Household
Region,Region Code,Province,Province Code,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM),PH150000000,BASILAN,PH150700000,210,0.0,59860.0
AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM),PH150000000,LANAO DEL SUR,PH153600000,1159,0.0,160132.0
AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM),PH150000000,MAGUINDANAO,PH153800000,508,0.0,194507.0
AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM),PH150000000,SULU,PH156600000,410,0.0,138357.0
AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM),PH150000000,TAWI-TAWI,PH157000000,203,0.0,67529.0
...,...,...,...,...,...,...
REGION XIII (CARAGA),PH160000000,AGUSAN DEL NORTE,PH160200000,253,215.0,153857.0
REGION XIII (CARAGA),PH160000000,AGUSAN DEL SUR,PH160300000,314,127.0,153653.0
REGION XIII (CARAGA),PH160000000,DINAGAT ISLANDS,PH168500000,100,234.0,28557.0
REGION XIII (CARAGA),PH160000000,SURIGAO DEL NORTE,PH166700000,335,224.0,108814.0
