### 1. Dataset Representation

- About the Dataset<br>

The data was provided by Our World in Data (OWID). The file contains different data values that could help paint a better image of a country’s status for COVID-19. The version used in this project will be the July 15, 2021 release of the dataset, however OWID attempts both daily and weekly update of data whenever possible, thus ensuring that the data they provide is the latest possible.

- Collection Process and its Implications<br>

The collection was done by the Our World in Data Group which is a research group that focuses on research and aggregation of data in a single accessible repository for the purposes of getting a better picture or even solving world problems that can benefit all of mankind. For the specific dataset, they made use of all possible available data that is publicly released by governments of all nations in the world. According to OWID, the data was collected from the following sources which include:
    
    1. COVID-19 Data Repository of Johns Hopkins University
    2. National Government Reports
    3. Oxford COVID-19 Government Response Tracker, Blavatnik School of Government
    4. United Nations Data (for demographics related data)
    5. World Bank Data (for demographics related data)
    6. Arroyo Marioli et al. (2020). https://doi.org/10.2139/ssrn.3581633 
    
The data implies that the data presented assumes to be the latest data possible, with its validity ultimately depending on each government's transparency and accuracy with the data they are reporting publicly and to John Hopkins University.
    <br>
- Structure of Dataset of the File<br>

    The dataset's structure consists of 102,475 observations with 60 variables available. The structure goes on every country's date when it reported either its first COVID-19 case or first COVID-19 test. The dataset was already distributed publicly on a single file containing all of the relevant information possible. There is however other datasets which contain specific and specialized versions of the current dataset we are using that is also available for use on OWID's Github repository.
    
    <br>
- About the Variables<br>
    
    The dataset has 60 variables, most of which relate to COVID-19 related numbers such as cases, deaths, recoveries, vaccinations among others, as well as demographic data such as GDP per capita, HDI, median age, population, population density among others.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.stats import norm
from scipy.stats import ttest_ind

#Code for data preparation
raw_df = pd.read_csv('COVID_7_15.csv')
print("Raw Dataframe Shape:", raw_df.shape)

Raw Dataframe Shape: (102475, 60)


### 2. Data Cleaning

Given that there are a lot of nations and variables to consider, it has been decided to reduce to scope of nations to just the ASEAN nations as well as the World as a baseline. The consideration for ASEAN nations was made because of the following reasons:

1. Near proximity
2. Economic integration
3. Similar level economies and populations

This could help us determine the COVID-19 status of the Philippines to its neighbors as well as the World if ever it is applicable.

In [6]:
#Code for data cleaning

#COLUMN FLAGS
identifiers = ['iso_code','continent','location','date']
toDrop = ['new_cases_smoothed','new_deaths_smoothed','total_cases_per_million','new_cases_per_million','new_cases_smoothed_per_million'
          ,'total_deaths_per_million','new_deaths_per_million','new_deaths_smoothed_per_million','reproduction_rate','icu_patients'
          ,'icu_patients_per_million','hosp_patients','hosp_patients_per_million','weekly_icu_admissions','weekly_icu_admissions_per_million'
          ,'weekly_hosp_admissions_per_million','female_smokers','male_smokers','excess_mortality','median_age','aged_65_older','aged_70_older'
          ,'handwashing_facilities','hospital_beds_per_thousand','population_density','median_age','total_vaccinations_per_hundred'
          ,'people_vaccinated_per_hundred','people_fully_vaccinated_per_hundred','new_vaccinations_smoothed_per_million'
          ,'new_vaccinations_smoothed','extreme_poverty','cardiovasc_death_rate','diabetes_prevalence', 'weekly_hosp_admissions'
          ,'new_tests','total_tests','total_tests_per_thousand','new_tests_per_thousand','new_tests_smoothed'
          ,'new_tests_smoothed_per_thousand','tests_per_case','tests_units','positive_rate','life_expectancy','human_development_index']
toRetain = ['total_cases','new_cases','total_deaths','new_deaths','total_vaccinations',
                'people_vaccinated','people_fully_vaccinated','new_vaccinations','stringency_index',
                'population','gdp_per_capita']
countryList = ['PHL','BRN','KHM','IDN','SGP','LAO','THA','MYS','MMR','VNM','OWID_WRL']

#REMOVAL OF UNNECESSARY COLUMNS
covid_df = covid_df.drop(columns=toDrop)

#COUNTRY REMOVAL
covid_df = covid_df[covid_df['iso_code'].str.contains('PHL|BRN|KHM|IDN|SGP|LAO|THA|MYS|MMR|VNM|OWID_WRL',regex=True)]
covid_df.sort_values(by=['iso_code','date'])
covid_df['iso_code'].unique()



#Ending dataframe called: covid_df

NameError: name 'covid_df' is not defined

### 3. Exploratory Data Analysis

EDA Questions<br>
1. Do case trends increase/decrease on every listed countries by month?
2. Is there a correlation between the GDP per capita to hospital and ICU patients of a country?
3. Do case numbers correlate negatively with the number of people being vaccinated?

**Numerical Summaries**

In [3]:
#Code for numerical summaries

**Visualizations**

In [4]:
#Code for visualizations

### 4. Research Question

1. Is there a significant difference between ASEAN member nations in total and new case numbers?<br><br>
    1. Scope in Dataset: Total cases and/or New cases
    2. Significance: This is in order to know how the Philippines fare against COVID-19 in comparison to our neighboring countries in the ASEAN as well as in the world.

2. Is the government meeting its half-way goal of vaccinating a significant number of people?<br><br>

### 5. Statistical Inference

**Hyptothesis**<br><br>

$H_0=$ 
<br>
$H_A=$ 
<br>

In [5]:
#Code for formulating statistical inference and hypothesis testing

### 6. Insights and Conclusions

{CONENT}