# Cleaning Covid data for Mapbox: provinces

**Background**: We use Covid-19 cases data in the Philippines from data from the health department and shapefile processed through geopandas to create an interactive map. 

**Tools**: pandas, geopandas, Mapbox

As of August 06, 2022

# Do your imports

In [1]:
import pandas as pd
import numpy as np
import geopandas as gpd
from shapely.geometry import Point, LineString
pd.set_option('display.max_columns', None)

# Read your CSV

In [2]:
df= pd.read_csv('provinces-cases.csv')
df

Unnamed: 0,province,cases
0,Lanao Del Sur,7918
1,Maguindanao,7046
2,Cotabato City (Not A Province),6928
3,Basilan,1786
4,Sulu,1623
...,...,...
101,,123
102,1,153036
103,"NCR, Second District",455451
104,"NCR, Third District",187489


# Cleaning the data

## Lowercase column headers

In [3]:
df.columns = df.columns.str.lower()
df.head()

Unnamed: 0,province,cases
0,Lanao Del Sur,7918
1,Maguindanao,7046
2,Cotabato City (Not A Province),6928
3,Basilan,1786
4,Sulu,1623


## Clean provincial names

This is so they would match the names in the shapefile for merging later.

In [4]:
df.province = df.province.str.replace("City Of Isabela (Not A Province)", "City of Isabela", regex=False)
df.province = df.province.str.replace("Cotabato (North Cotabato)", "Cotabato", regex=False)
df.province = df.province.str.replace("Samar (Western Samar)", "Samar", regex=False)
df.province = df.province.str.replace("Cotabato City (Not A Province)", "Cotabato City", regex=False)
df.province = df.province.str.replace("Del", "del", regex=False)
df.province = df.province.str.replace("De", "de", regex=False)
df.province = df.province.str.replace("1", "NCR, City of Manila, First District", regex=False)

## Rename columns

This is again to match the shapefile column containing the provinces' names which is 'adm2_en'. We are also renaming the column containing the Covid-19 cases tally.

In [5]:
df= df.rename(columns={"province": "adm2_en"})
df

Unnamed: 0,adm2_en,cases
0,Lanao del Sur,7918
1,Maguindanao,7046
2,Cotabato City,6928
3,Basilan,1786
4,Sulu,1623
...,...,...
101,,123
102,"NCR, City of Manila, First District",153036
103,"NCR, Second District",455451
104,"NCR, Third District",187489


# Geopandas

## Read through file

In [6]:
provinces = gpd.read_file('ph-provinces.zip')
provinces

Unnamed: 0,Shape_Leng,Shape_Area,ADM2_EN,ADM2_PCODE,ADM2_REF,ADM2ALT1EN,ADM2ALT2EN,ADM1_EN,ADM1_PCODE,ADM0_EN,ADM0_PCODE,date,validOn,validTo,geometry
0,2.640967,0.334223,Abra,PH140100000,,,,Cordillera Administrative Region,PH140000000,Philippines (the),PH,2016-06-30,2020-05-29,,"POLYGON ((120.96109 17.95348, 120.97201 17.946..."
1,3.674955,0.220065,Agusan del Norte,PH160200000,,,,Region XIII,PH160000000,Philippines (the),PH,2016-06-30,2020-05-29,,"MULTIPOLYGON (((125.58886 9.45793, 125.59687 9..."
2,5.222636,0.693968,Agusan del Sur,PH160300000,,,,Region XIII,PH160000000,Philippines (the),PH,2016-06-30,2020-05-29,,"POLYGON ((125.88961 8.98195, 125.88896 8.96446..."
3,4.626091,0.139664,Aklan,PH060400000,,,,Region VI,PH060000000,Philippines (the),PH,2016-06-30,2020-05-29,,"MULTIPOLYGON (((122.43980 11.59717, 122.43979 ..."
4,6.507665,0.205939,Albay,PH050500000,,,,Region V,PH050000000,Philippines (the),PH,2016-06-30,2020-05-29,,"MULTIPOLYGON (((124.20992 13.16871, 124.20993 ..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
82,15.971439,0.094186,Tawi-Tawi,PH157000000,,,,Autonomous Region in Muslim Mindanao,PH150000000,Philippines (the),PH,2016-06-30,2020-05-29,,"MULTIPOLYGON (((119.46876 4.59360, 119.46881 4..."
83,5.329770,0.313705,Zambales,PH037100000,,,,Region III,PH030000000,Philippines (the),PH,2016-06-30,2020-05-29,,"MULTIPOLYGON (((120.11687 14.76309, 120.11689 ..."
84,8.170921,0.515482,Zamboanga del Norte,PH097200000,,,,Region IX,PH090000000,Philippines (the),PH,2016-06-30,2020-05-29,,"MULTIPOLYGON (((122.09474 7.53104, 122.09482 7..."
85,11.811347,0.439807,Zamboanga del Sur,PH097300000,,,,Region IX,PH090000000,Philippines (the),PH,2016-06-30,2020-05-29,,"MULTIPOLYGON (((122.05710 6.87274, 122.05724 6..."


## Merge data 

In [7]:
provinces.columns = provinces.columns.str.lower()
provinces

Unnamed: 0,shape_leng,shape_area,adm2_en,adm2_pcode,adm2_ref,adm2alt1en,adm2alt2en,adm1_en,adm1_pcode,adm0_en,adm0_pcode,date,validon,validto,geometry
0,2.640967,0.334223,Abra,PH140100000,,,,Cordillera Administrative Region,PH140000000,Philippines (the),PH,2016-06-30,2020-05-29,,"POLYGON ((120.96109 17.95348, 120.97201 17.946..."
1,3.674955,0.220065,Agusan del Norte,PH160200000,,,,Region XIII,PH160000000,Philippines (the),PH,2016-06-30,2020-05-29,,"MULTIPOLYGON (((125.58886 9.45793, 125.59687 9..."
2,5.222636,0.693968,Agusan del Sur,PH160300000,,,,Region XIII,PH160000000,Philippines (the),PH,2016-06-30,2020-05-29,,"POLYGON ((125.88961 8.98195, 125.88896 8.96446..."
3,4.626091,0.139664,Aklan,PH060400000,,,,Region VI,PH060000000,Philippines (the),PH,2016-06-30,2020-05-29,,"MULTIPOLYGON (((122.43980 11.59717, 122.43979 ..."
4,6.507665,0.205939,Albay,PH050500000,,,,Region V,PH050000000,Philippines (the),PH,2016-06-30,2020-05-29,,"MULTIPOLYGON (((124.20992 13.16871, 124.20993 ..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
82,15.971439,0.094186,Tawi-Tawi,PH157000000,,,,Autonomous Region in Muslim Mindanao,PH150000000,Philippines (the),PH,2016-06-30,2020-05-29,,"MULTIPOLYGON (((119.46876 4.59360, 119.46881 4..."
83,5.329770,0.313705,Zambales,PH037100000,,,,Region III,PH030000000,Philippines (the),PH,2016-06-30,2020-05-29,,"MULTIPOLYGON (((120.11687 14.76309, 120.11689 ..."
84,8.170921,0.515482,Zamboanga del Norte,PH097200000,,,,Region IX,PH090000000,Philippines (the),PH,2016-06-30,2020-05-29,,"MULTIPOLYGON (((122.09474 7.53104, 122.09482 7..."
85,11.811347,0.439807,Zamboanga del Sur,PH097300000,,,,Region IX,PH090000000,Philippines (the),PH,2016-06-30,2020-05-29,,"MULTIPOLYGON (((122.05710 6.87274, 122.05724 6..."


## Rename 'Compostela Valley' to 'Davao de Oro'

The name change was by virtue of a law.

In [8]:
provinces.adm2_en = provinces.adm2_en.str.replace("Compostela Valley", "Davao de Oro", regex=False)

In [9]:
provinces_cases = provinces.merge(df, on='adm2_en')
provinces_cases

Unnamed: 0,shape_leng,shape_area,adm2_en,adm2_pcode,adm2_ref,adm2alt1en,adm2alt2en,adm1_en,adm1_pcode,adm0_en,adm0_pcode,date,validon,validto,geometry,cases
0,2.640967,0.334223,Abra,PH140100000,,,,Cordillera Administrative Region,PH140000000,Philippines (the),PH,2016-06-30,2020-05-29,,"POLYGON ((120.96109 17.95348, 120.97201 17.946...",6083
1,3.674955,0.220065,Agusan del Norte,PH160200000,,,,Region XIII,PH160000000,Philippines (the),PH,2016-06-30,2020-05-29,,"MULTIPOLYGON (((125.58886 9.45793, 125.59687 9...",22614
2,5.222636,0.693968,Agusan del Sur,PH160300000,,,,Region XIII,PH160000000,Philippines (the),PH,2016-06-30,2020-05-29,,"POLYGON ((125.88961 8.98195, 125.88896 8.96446...",16675
3,4.626091,0.139664,Aklan,PH060400000,,,,Region VI,PH060000000,Philippines (the),PH,2016-06-30,2020-05-29,,"MULTIPOLYGON (((122.43980 11.59717, 122.43979 ...",16771
4,6.507665,0.205939,Albay,PH050500000,,,,Region V,PH050000000,Philippines (the),PH,2016-06-30,2020-05-29,,"MULTIPOLYGON (((124.20992 13.16871, 124.20993 ...",14350
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
82,15.971439,0.094186,Tawi-Tawi,PH157000000,,,,Autonomous Region in Muslim Mindanao,PH150000000,Philippines (the),PH,2016-06-30,2020-05-29,,"MULTIPOLYGON (((119.46876 4.59360, 119.46881 4...",752
83,5.329770,0.313705,Zambales,PH037100000,,,,Region III,PH030000000,Philippines (the),PH,2016-06-30,2020-05-29,,"MULTIPOLYGON (((120.11687 14.76309, 120.11689 ...",21518
84,8.170921,0.515482,Zamboanga del Norte,PH097200000,,,,Region IX,PH090000000,Philippines (the),PH,2016-06-30,2020-05-29,,"MULTIPOLYGON (((122.09474 7.53104, 122.09482 7...",12770
85,11.811347,0.439807,Zamboanga del Sur,PH097300000,,,,Region IX,PH090000000,Philippines (the),PH,2016-06-30,2020-05-29,,"MULTIPOLYGON (((122.05710 6.87274, 122.05724 6...",46766


## Drop unnecessary columns

In [10]:
provinces_cases = provinces_cases.drop(['adm2_ref', 'adm2_pcode','adm2alt1en', 'adm2alt2en', 'adm0_en', 'shape_leng', 'shape_area', 'adm0_pcode', 'adm1_pcode', 'date', 'validon', 'validto'], axis=1)
provinces_cases

Unnamed: 0,adm2_en,adm1_en,geometry,cases
0,Abra,Cordillera Administrative Region,"POLYGON ((120.96109 17.95348, 120.97201 17.946...",6083
1,Agusan del Norte,Region XIII,"MULTIPOLYGON (((125.58886 9.45793, 125.59687 9...",22614
2,Agusan del Sur,Region XIII,"POLYGON ((125.88961 8.98195, 125.88896 8.96446...",16675
3,Aklan,Region VI,"MULTIPOLYGON (((122.43980 11.59717, 122.43979 ...",16771
4,Albay,Region V,"MULTIPOLYGON (((124.20992 13.16871, 124.20993 ...",14350
...,...,...,...,...
82,Tawi-Tawi,Autonomous Region in Muslim Mindanao,"MULTIPOLYGON (((119.46876 4.59360, 119.46881 4...",752
83,Zambales,Region III,"MULTIPOLYGON (((120.11687 14.76309, 120.11689 ...",21518
84,Zamboanga del Norte,Region IX,"MULTIPOLYGON (((122.09474 7.53104, 122.09482 7...",12770
85,Zamboanga del Sur,Region IX,"MULTIPOLYGON (((122.05710 6.87274, 122.05724 6...",46766


In [11]:
provinces_cases= provinces_cases.rename(columns={"adm2_en": "province"})

## Read and merge with population data

In [12]:
df2 = pd.read_excel('population.xlsx', sheet_name="province")
df2.head()

Unnamed: 0,province,population
0,"NCR, City of Manila, First District",1846513
1,"NCR, Second District",4771371
2,"NCR, Third District",3004627
3,"NCR, Fourth District",3861951
4,Abra,250985


In [13]:
provinces_final = provinces_cases.merge(df2, on='province')
provinces_final.head()

Unnamed: 0,province,adm1_en,geometry,cases,population
0,Abra,Cordillera Administrative Region,"POLYGON ((120.96109 17.95348, 120.97201 17.946...",6083,250985
1,Agusan del Norte,Region XIII,"MULTIPOLYGON (((125.58886 9.45793, 125.59687 9...",22614,760413
2,Agusan del Sur,Region XIII,"POLYGON ((125.88961 8.98195, 125.88896 8.96446...",16675,739367
3,Aklan,Region VI,"MULTIPOLYGON (((122.43980 11.59717, 122.43979 ...",16771,615475
4,Albay,Region V,"MULTIPOLYGON (((124.20992 13.16871, 124.20993 ...",14350,1374768


## Compute for population ratio

We do this by dividing the number of Covid-19 cases to total population per province and then multiply by 100,000. That would give us cases per 100,000 people in the area.

In [14]:
provinces_final ['case_per_pop'] = provinces_final.cases / provinces_final.population * 100000
provinces_final = provinces_final.round(1)
provinces_final.head()

Unnamed: 0,province,adm1_en,geometry,cases,population,case_per_pop
0,Abra,Cordillera Administrative Region,"POLYGON ((120.96109 17.95348, 120.97201 17.946...",6083,250985,2423.7
1,Agusan del Norte,Region XIII,"MULTIPOLYGON (((125.58886 9.45793, 125.59687 9...",22614,760413,2973.9
2,Agusan del Sur,Region XIII,"POLYGON ((125.88961 8.98195, 125.88896 8.96446...",16675,739367,2255.3
3,Aklan,Region VI,"MULTIPOLYGON (((122.43980 11.59717, 122.43979 ...",16771,615475,2724.9
4,Albay,Region V,"MULTIPOLYGON (((124.20992 13.16871, 124.20993 ...",14350,1374768,1043.8


## Create bins for cases

The bins will allow us to categorize the number of cases, necessary for mapping later.

In [15]:
provinces_final['percentiles'] = pd.cut(np.array(provinces_final['case_per_pop']),
       [0, 1001, 2001, 3001, 4001, 5001, 6001, 7001, 8001, 9001, 10001, 11000], labels=["0-1000", "1001-2000", "2001-3000", "3001-4000", "4001-5000", "5001-6000", "6001-7000", "7001-8000", "8001-9000", "9001-10000", "10001-11000"])


In [16]:
provinces_final

Unnamed: 0,province,adm1_en,geometry,cases,population,case_per_pop,percentiles
0,Abra,Cordillera Administrative Region,"POLYGON ((120.96109 17.95348, 120.97201 17.946...",6083,250985,2423.7,2001-3000
1,Agusan del Norte,Region XIII,"MULTIPOLYGON (((125.58886 9.45793, 125.59687 9...",22614,760413,2973.9,2001-3000
2,Agusan del Sur,Region XIII,"POLYGON ((125.88961 8.98195, 125.88896 8.96446...",16675,739367,2255.3,2001-3000
3,Aklan,Region VI,"MULTIPOLYGON (((122.43980 11.59717, 122.43979 ...",16771,615475,2724.9,2001-3000
4,Albay,Region V,"MULTIPOLYGON (((124.20992 13.16871, 124.20993 ...",14350,1374768,1043.8,1001-2000
...,...,...,...,...,...,...,...
82,Tawi-Tawi,Autonomous Region in Muslim Mindanao,"MULTIPOLYGON (((119.46876 4.59360, 119.46881 4...",752,440276,170.8,0-1000
83,Zambales,Region III,"MULTIPOLYGON (((120.11687 14.76309, 120.11689 ...",21518,909832,2365.1,2001-3000
84,Zamboanga del Norte,Region IX,"MULTIPOLYGON (((122.09474 7.53104, 122.09482 7...",12770,1047455,1219.1,1001-2000
85,Zamboanga del Sur,Region IX,"MULTIPOLYGON (((122.05710 6.87274, 122.05724 6...",46766,2027902,2306.1,2001-3000


In [17]:
provinces_final.dtypes

province          object
adm1_en           object
geometry        geometry
cases              int64
population         int64
case_per_pop     float64
percentiles     category
dtype: object

**Additional step**: Convert the contents of the percentiles into string. Not doing so will not be read by the GEOJSON file.

In [18]:
provinces_final.percentiles = provinces_final.percentiles.astype(str)

In [19]:
provinces_final

Unnamed: 0,province,adm1_en,geometry,cases,population,case_per_pop,percentiles
0,Abra,Cordillera Administrative Region,"POLYGON ((120.96109 17.95348, 120.97201 17.946...",6083,250985,2423.7,2001-3000
1,Agusan del Norte,Region XIII,"MULTIPOLYGON (((125.58886 9.45793, 125.59687 9...",22614,760413,2973.9,2001-3000
2,Agusan del Sur,Region XIII,"POLYGON ((125.88961 8.98195, 125.88896 8.96446...",16675,739367,2255.3,2001-3000
3,Aklan,Region VI,"MULTIPOLYGON (((122.43980 11.59717, 122.43979 ...",16771,615475,2724.9,2001-3000
4,Albay,Region V,"MULTIPOLYGON (((124.20992 13.16871, 124.20993 ...",14350,1374768,1043.8,1001-2000
...,...,...,...,...,...,...,...
82,Tawi-Tawi,Autonomous Region in Muslim Mindanao,"MULTIPOLYGON (((119.46876 4.59360, 119.46881 4...",752,440276,170.8,0-1000
83,Zambales,Region III,"MULTIPOLYGON (((120.11687 14.76309, 120.11689 ...",21518,909832,2365.1,2001-3000
84,Zamboanga del Norte,Region IX,"MULTIPOLYGON (((122.09474 7.53104, 122.09482 7...",12770,1047455,1219.1,1001-2000
85,Zamboanga del Sur,Region IX,"MULTIPOLYGON (((122.05710 6.87274, 122.05724 6...",46766,2027902,2306.1,2001-3000


# Save as GEOJSON file

In [24]:
#provinces_final.to_file('provinces_cases.geojson', driver='GeoJSON')

  pd.Int64Index,


# Simplified file

So we were successful in combining geometry files with our dataset, but the file is too big. We, therefore, use [mapshaper](https://mapshaper.org/) to simplify the precision of the map so that we have a smaller map size.

Below is the simplified json file. 

In [25]:
simplified = gpd.read_file('provinces_cases.json')
simplified

Unnamed: 0,province,adm1_en,cases,population,case_per_pop,percentiles,geometry
0,Abra,Cordillera Administrative Region,6083,250985,2423.7,2001-3000,"POLYGON ((121.11067 17.68489, 121.10883 17.739..."
1,Agusan del Norte,Region XIII,22614,760413,2973.9,2001-3000,"POLYGON ((125.74862 9.32604, 125.74844 9.33054..."
2,Agusan del Sur,Region XIII,16675,739367,2255.3,2001-3000,"POLYGON ((126.22779 8.00019, 126.22936 8.00703..."
3,Aklan,Region VI,16771,615475,2724.9,2001-3000,"MULTIPOLYGON (((122.43831 11.61646, 122.43540 ..."
4,Albay,Region V,14350,1374768,1043.8,1001-2000,"MULTIPOLYGON (((124.12996 13.23606, 124.12532 ..."
...,...,...,...,...,...,...,...
82,Tawi-Tawi,Autonomous Region in Muslim Mindanao,752,440276,170.8,0-1000,"MULTIPOLYGON (((119.46876 4.59360, 119.46798 4..."
83,Zambales,Region III,21518,909832,2365.1,2001-3000,"MULTIPOLYGON (((120.10269 14.76886, 120.10243 ..."
84,Zamboanga del Norte,Region IX,12770,1047455,1219.1,1001-2000,"MULTIPOLYGON (((123.50025 8.65729, 123.49780 8..."
85,Zamboanga del Sur,Region IX,46766,2027902,2306.1,2001-3000,"MULTIPOLYGON (((122.05710 6.87274, 122.05384 6..."


## Convert to GEOJSON

In [26]:
#simplified.to_file('simplified_provinces.geojson', driver='GeoJSON')

  pd.Int64Index,


In [27]:
simplified

Unnamed: 0,province,adm1_en,cases,population,case_per_pop,percentiles,geometry
0,Abra,Cordillera Administrative Region,6083,250985,2423.7,2001-3000,"POLYGON ((121.11067 17.68489, 121.10883 17.739..."
1,Agusan del Norte,Region XIII,22614,760413,2973.9,2001-3000,"POLYGON ((125.74862 9.32604, 125.74844 9.33054..."
2,Agusan del Sur,Region XIII,16675,739367,2255.3,2001-3000,"POLYGON ((126.22779 8.00019, 126.22936 8.00703..."
3,Aklan,Region VI,16771,615475,2724.9,2001-3000,"MULTIPOLYGON (((122.43831 11.61646, 122.43540 ..."
4,Albay,Region V,14350,1374768,1043.8,1001-2000,"MULTIPOLYGON (((124.12996 13.23606, 124.12532 ..."
...,...,...,...,...,...,...,...
82,Tawi-Tawi,Autonomous Region in Muslim Mindanao,752,440276,170.8,0-1000,"MULTIPOLYGON (((119.46876 4.59360, 119.46798 4..."
83,Zambales,Region III,21518,909832,2365.1,2001-3000,"MULTIPOLYGON (((120.10269 14.76886, 120.10243 ..."
84,Zamboanga del Norte,Region IX,12770,1047455,1219.1,1001-2000,"MULTIPOLYGON (((123.50025 8.65729, 123.49780 8..."
85,Zamboanga del Sur,Region IX,46766,2027902,2306.1,2001-3000,"MULTIPOLYGON (((122.05710 6.87274, 122.05384 6..."
