# Test Data Covid-19 Malaysia 
## how many cases happens in malaysia starting from 2020. 
### Metadata

1) `date`: yyyy-mm-dd format; data correct as of 1200hrs on that date
2) `state`: name of state (present in state file, but not country file)
3) `cases_new`: cases reported in the 24h since the last report
4) `cases_import`: imported cases reported in the 24h since the last report
5) `cases_active`: Covid+ individuals who have not recovered or died
6) `cases_recovered` recovered cases reported in the 24h since the last report
7) `cases_cluster`: number of cases attributable to clusters; the difference between `cases_new` and the sum of cases attributable to clusters is the number of sporadic cases
8) `cluster_x`: cases attributable to clusters under category `x`; possible values for `x` are import, religious, community, highRisk, education, detentionCentre, and workplace
9) `cases_agecat`: cases falling into one of 4 age categories, i.e. child (0-11), adolescent (12-17), adult (18-59), elderly (60+); note that the sum of cases by age may not equal the total cases for that day, as some cases are registered without ages or with unverifiable age data 
10) `cases_unvax`: number of unvaccinated individuals who tested positive for Covid (perfect subset of `cases_new`), where "unvaccinated" is defined as not having received any dose of a COVID-19 vaccine prior to testing positive.
12) `cases_pvax`: number of partially-vaccinated individuals who tested positive for Covid (perfect subset of `cases_new`), where "partially vaccinated" is defined as receiving at least 1 dose of a 2-dose vaccine at least 1 day prior to testing positive, or receiving the Cansino vaccine between 1-27 days before testing positive
13) `cases_fvax`: number of fully-vaccinated who tested positive for Covid (perfect subset of `cases_new`), where "fully vaccinated" is defined as receiving the 2nd dose of a 2-dose vaccine at least 14 days prior to testing positive, or receiving the Cansino vaccine at least 28 days before testing positive
14)  `cases_boost`: Number of individuals who tested positive for COVID-19 and had received a booster dose (perfect subset of `cases_new`), where "booster" is defined as receiving an additional dose of a COVID-19 vaccine beyond the original protocol at least 1 day prior to testing positive.

In [5]:
# Import libraries
import pandas as pd  
from IPython.display import display

### Covid-19 Data based on states in Malaysia from 2020.

In [9]:
cases_state = pd.read_csv('cases_state.csv')

display(cases_state.head(20))

# Get the total number of rows
total_rows = cases_state.shape[0]
print("Total number of rows:", total_rows)

total_columns = cases_state.shape[1]
print("Total number of columns:", total_columns)

Unnamed: 0,date,state,cases_new,cases_import,cases_recovered,cases_active,cases_cluster,cases_unvax,cases_pvax,cases_fvax,...,cases_0_4,cases_5_11,cases_12_17,cases_18_29,cases_30_39,cases_40_49,cases_50_59,cases_60_69,cases_70_79,cases_80
0,2020-01-25,Johor,4,4,0,4,0,4,0,0,...,0,0,0,0,0,1,0,0,0,0
1,2020-01-25,Kedah,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,2020-01-25,Kelantan,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,2020-01-25,Melaka,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,2020-01-25,Negeri Sembilan,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
5,2020-01-25,Pahang,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
6,2020-01-25,Perak,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
7,2020-01-25,Perlis,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
8,2020-01-25,Pulau Pinang,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
9,2020-01-25,Sabah,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Total number of rows: 27456
Total number of columns: 25


In [10]:
display(cases_state.tail(10))

Unnamed: 0,date,state,cases_new,cases_import,cases_recovered,cases_active,cases_cluster,cases_unvax,cases_pvax,cases_fvax,...,cases_0_4,cases_5_11,cases_12_17,cases_18_29,cases_30_39,cases_40_49,cases_50_59,cases_60_69,cases_70_79,cases_80
27446,2024-10-05,Perak,2,0,9,373,0,0,0,1,...,0,0,0,1,0,1,0,0,0,0
27447,2024-10-05,Perlis,0,0,2,41,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
27448,2024-10-05,Pulau Pinang,7,0,11,34,0,0,0,2,...,0,0,0,1,3,0,2,0,1,0
27449,2024-10-05,Sabah,2,0,0,3473,0,0,0,0,...,0,0,0,0,1,1,0,0,0,0
27450,2024-10-05,Sarawak,7,0,1,114,0,2,0,0,...,0,0,0,0,2,2,0,0,1,2
27451,2024-10-05,Selangor,32,0,0,184,0,8,0,6,...,1,2,3,7,5,3,7,1,1,2
27452,2024-10-05,Terengganu,3,0,0,316,0,1,0,0,...,0,1,0,0,2,0,0,0,0,0
27453,2024-10-05,W.P. Kuala Lumpur,23,0,17,632,0,4,0,4,...,2,1,0,10,4,3,1,0,0,2
27454,2024-10-05,W.P. Labuan,0,0,0,26,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
27455,2024-10-05,W.P. Putrajaya,0,0,3,239,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [16]:
columns_state = cases_state.columns
print (columns_state)

Index(['date', 'state', 'cases_new', 'cases_import', 'cases_recovered',
       'cases_active', 'cases_cluster', 'cases_unvax', 'cases_pvax',
       'cases_fvax', 'cases_boost', 'cases_child', 'cases_adolescent',
       'cases_adult', 'cases_elderly', 'cases_0_4', 'cases_5_11',
       'cases_12_17', 'cases_18_29', 'cases_30_39', 'cases_40_49',
       'cases_50_59', 'cases_60_69', 'cases_70_79', 'cases_80'],
      dtype='object')


In [17]:
# Check for missing values
print("Missing values:\n", cases_state.isnull().sum())

Missing values:
 date                0
state               0
cases_new           0
cases_import        0
cases_recovered     0
cases_active        0
cases_cluster       0
cases_unvax         0
cases_pvax          0
cases_fvax          0
cases_boost         0
cases_child         0
cases_adolescent    0
cases_adult         0
cases_elderly       0
cases_0_4           0
cases_5_11          0
cases_12_17         0
cases_18_29         0
cases_30_39         0
cases_40_49         0
cases_50_59         0
cases_60_69         0
cases_70_79         0
cases_80            0
dtype: int64
