Introduction (What we want to do and what data we are looking at):
A novel corona virus (COVID 19) was identified in 2019 in Wuhan China. It has spread rapidly worldwide and was officially declared to be a pandemic by the WHO. To better understand the data available about it, we will be doing exploratory data analysis of the available COVID 19 data. The goal of the project is to study the impact of COVID 19 across the world using Python, Pandas and Matplotlib and present visualizations to show our analysis.

In [34]:
# Dependencies and Setup
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import requests
import json
import time
from scipy.stats import linregress

In [42]:
# File to Load 
file = "Data/owid-covid-data.csv"

# Read Purchasing File and store into Pandas data frame
data = pd.read_csv(file)

In [43]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 181463 entries, 0 to 181462
Data columns (total 67 columns):
 #   Column                                      Non-Null Count   Dtype  
---  ------                                      --------------   -----  
 0   iso_code                                    181463 non-null  object 
 1   continent                                   170857 non-null  object 
 2   location                                    181463 non-null  object 
 3   date                                        181463 non-null  object 
 4   total_cases                                 174693 non-null  float64
 5   new_cases                                   174468 non-null  float64
 6   new_cases_smoothed                          173299 non-null  float64
 7   total_deaths                                156469 non-null  float64
 8   new_deaths                                  156467 non-null  float64
 9   new_deaths_smoothed                         155320 non-null  float64
 

In [44]:
data.columns

Index(['iso_code', 'continent', 'location', 'date', 'total_cases', 'new_cases',
       'new_cases_smoothed', 'total_deaths', 'new_deaths',
       'new_deaths_smoothed', 'total_cases_per_million',
       'new_cases_per_million', 'new_cases_smoothed_per_million',
       'total_deaths_per_million', 'new_deaths_per_million',
       'new_deaths_smoothed_per_million', 'reproduction_rate', 'icu_patients',
       'icu_patients_per_million', 'hosp_patients',
       'hosp_patients_per_million', 'weekly_icu_admissions',
       'weekly_icu_admissions_per_million', 'weekly_hosp_admissions',
       'weekly_hosp_admissions_per_million', 'total_tests', 'new_tests',
       'total_tests_per_thousand', 'new_tests_per_thousand',
       'new_tests_smoothed', 'new_tests_smoothed_per_thousand',
       'positive_rate', 'tests_per_case', 'tests_units', 'total_vaccinations',
       'people_vaccinated', 'people_fully_vaccinated', 'total_boosters',
       'new_vaccinations', 'new_vaccinations_smoothed',
       't

In [27]:
data.head()

Unnamed: 0,iso_code,continent,location,date,total_cases,new_cases,new_cases_smoothed,total_deaths,new_deaths,new_deaths_smoothed,...,female_smokers,male_smokers,handwashing_facilities,hospital_beds_per_thousand,life_expectancy,human_development_index,excess_mortality_cumulative_absolute,excess_mortality_cumulative,excess_mortality,excess_mortality_cumulative_per_million
0,AFG,Asia,Afghanistan,2020-02-24,5.0,5.0,,,,,...,,,37.746,0.5,64.83,0.511,,,,
1,AFG,Asia,Afghanistan,2020-02-25,5.0,0.0,,,,,...,,,37.746,0.5,64.83,0.511,,,,
2,AFG,Asia,Afghanistan,2020-02-26,5.0,0.0,,,,,...,,,37.746,0.5,64.83,0.511,,,,
3,AFG,Asia,Afghanistan,2020-02-27,5.0,0.0,,,,,...,,,37.746,0.5,64.83,0.511,,,,
4,AFG,Asia,Afghanistan,2020-02-28,5.0,0.0,,,,,...,,,37.746,0.5,64.83,0.511,,,,


In [28]:
to_drop = ['iso_code', 'new_cases_smoothed', 'new_deaths_smoothed', 'new_cases_smoothed_per_million', 
          'new_deaths_smoothed_per_million', 'reproduction_rate', 'new_tests_smoothed', 'new_tests_smoothed_per_thousand',
          'new_people_vaccinated_smoothed','new_people_vaccinated_smoothed_per_hundred', 'stringency_index'
          ]

data.drop(to_drop, inplace=True, axis=1)

data.head()

Unnamed: 0,continent,location,date,total_cases,new_cases,total_deaths,new_deaths,total_cases_per_million,new_cases_per_million,total_deaths_per_million,...,female_smokers,male_smokers,handwashing_facilities,hospital_beds_per_thousand,life_expectancy,human_development_index,excess_mortality_cumulative_absolute,excess_mortality_cumulative,excess_mortality,excess_mortality_cumulative_per_million
0,Asia,Afghanistan,2020-02-24,5.0,5.0,,,0.126,0.126,,...,,,37.746,0.5,64.83,0.511,,,,
1,Asia,Afghanistan,2020-02-25,5.0,0.0,,,0.126,0.0,,...,,,37.746,0.5,64.83,0.511,,,,
2,Asia,Afghanistan,2020-02-26,5.0,0.0,,,0.126,0.0,,...,,,37.746,0.5,64.83,0.511,,,,
3,Asia,Afghanistan,2020-02-27,5.0,0.0,,,0.126,0.0,,...,,,37.746,0.5,64.83,0.511,,,,
4,Asia,Afghanistan,2020-02-28,5.0,0.0,,,0.126,0.0,,...,,,37.746,0.5,64.83,0.511,,,,


In [29]:
# number of countries
count_countries = data['location'].nunique()

# number of continents
count_continents = data['continent'].nunique()

print("This research includes data from ", count_countries, "countries from", count_continents, "continents.")

This research includes data from  243 countries from 6 continents.


In [30]:
# sample data, 50,000 rows of 181,463 rows
sample_data = data.sample(n=50000)
sample_data.head()

Unnamed: 0,continent,location,date,total_cases,new_cases,total_deaths,new_deaths,total_cases_per_million,new_cases_per_million,total_deaths_per_million,...,female_smokers,male_smokers,handwashing_facilities,hospital_beds_per_thousand,life_expectancy,human_development_index,excess_mortality_cumulative_absolute,excess_mortality_cumulative,excess_mortality,excess_mortality_cumulative_per_million
133704,Asia,Qatar,2021-06-23,221273.0,154.0,586.0,2.0,75506.292,52.55,199.964,...,0.8,26.9,,1.2,80.23,0.848,,,,
47773,Africa,Egypt,2021-07-10,282985.0,121.0,16383.0,15.0,2714.268,1.161,157.139,...,0.2,50.1,89.827,1.6,71.99,0.707,,,,
170676,Europe,United Kingdom,2021-07-26,5730989.0,24503.0,129258.0,14.0,84023.332,359.244,1895.081,...,20.0,24.7,,2.54,81.32,0.932,,,,
163519,Oceania,Tokelau,2021-07-28,,,,,,,,...,,,,,81.86,,,,,
117869,North America,Nicaragua,2021-02-24,6445.0,0.0,173.0,0.0,961.599,0.0,25.812,...,,,,0.9,74.48,0.66,,,,


# Research questions to answer:


Correlation between cases and amount of people vaccinated

GDP versus vaccines/cases 

Relation between rate of vaccination and mortality rate

Relation between rate of vaccination and mortality rate 

Age nd mortality rate 

Age and vaccination rate 

How effective the vaccination is in each country

New cases over time by country

Hospitalization rate by country

If people with preexisting conditions are more prone to be hospitalized or die