# COVID-19 Cases, Deaths, and Vaccination Progress


In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

df = pd.read_csv('/kaggle/input/covid19-global-dataset/worldometer_coronavirus_summary_data.csv')
daily_data = pd.read_csv('/kaggle/input/covid19-global-dataset/worldometer_coronavirus_daily_data.csv')
vaccination_data = pd.read_csv('/kaggle/input/covid-world-vaccination-progress/country_vaccinations.csv')
merged = vaccination_data.merge(df, how='left', on = 'country')
merged.head()

# Reading data sets in lines above

In [None]:
# merge the two data sets together
# clean data sets in order to merge
vaccination_progress = pd.read_csv('/kaggle/input/covid-world-vaccination-progress/country_vaccinations.csv')
vaccination_progress['date'] = pd.to_datetime(vaccination_progress['date'], format='%Y-%m-%d')
daily_data['date'] = pd.to_datetime(daily_data['date'], format='%Y-%m-%d')
general_usa_data_filter = daily_data['country'] == 'USA'
daily_data.replace('USA', 'United States', inplace=True)
print(daily_data[general_usa_data_filter])
daily_merged = vaccination_progress.merge(daily_data, how = 'inner', on = ['date', 'country'])
print(daily_merged)

In [None]:
# drop the rows with no value
merged.dropna()

Dropping the countries that will not be focused on

In [None]:
# drop rows with no value
daily_merged.dropna()
Italy = daily_merged['country']=='Italy'
United_States = daily_merged['country']=='United States'
Seychelles = daily_merged['country']=='Seychelles'
India = daily_merged['country']=='India'
Australia = daily_merged['country']=='Australia'
France = daily_merged['country']=='France'
#daily_merged=daily_merged[Italy|United_States|Seychelles|India|Australia|France]

In [None]:
vaccinated_per_country = merged.groupby('country').tail(1)[['country', 'total_vaccinations']]
print(vaccinated_per_country)

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

# plot the total vaccinations per country
plt.figure(figsize=(16, 6))
plt.xticks(rotation=90)
sns.barplot(data = vaccinated_per_country[:len(vaccinated_per_country)//2], x="country", y="total_vaccinations")

In [None]:
# plot total vaccinations per country - split into to graphs more clear
plt.figure(figsize=(16, 6))
plt.xticks(rotation=90)
sns.barplot(data = vaccinated_per_country[len(vaccinated_per_country)//2:], x="country", y="total_vaccinations")

In [None]:
# plot graphs of people vaccinated per hundred for each country
people_per_hundred = merged.groupby('country').tail(1)[['country', 'people_vaccinated_per_hundred']]
people_per_hundred.dropna()
print(people_per_hundred)

plt.figure(figsize=(16, 6))
plt.xticks(rotation=90)
sns.barplot(data = people_per_hundred[:len(people_per_hundred)//2], x="country", y="people_vaccinated_per_hundred")

In [None]:
# plot graphs of people vaccinated per hundred for each country
plt.figure(figsize=(16, 6))
plt.xticks(rotation=90)
sns.barplot(data = people_per_hundred[len(people_per_hundred)//2:], x="country", y="people_vaccinated_per_hundred")

## USA: Daily Cases, Vaccination Progress, and Deaths

This plot shows the daily new cases reported in the United States. This plot shows us that there were a lot more cases in the end of 2020 and the start of 2021. The cases started dropping rapidly in Febuary and March of 2021 and has now consistenly stayed below 100,000 cases per day.   

In [None]:
sns.relplot(data = daily_merged[daily_merged['country']=='United States'], x="date", y="daily_new_cases", hue = "country", kind = "line")
plt.xticks(rotation=-45)

This plot shows the total vaccinations in the United States as time went on. The number of vaccinations has been increasing pretty linearly, showing a trend that it will keep increasing as we move into summer. 

In [None]:
sns.relplot(data = daily_merged[daily_merged['country']=='United States'], x="date", y="total_vaccinations", hue = "country", kind = "line")
plt.xticks(rotation=-45)

This graph shows the amount of people vaccinated per hundred. The plot reveals that the United States is approaching around 50 percent of the population being vaccinated. 

In [None]:
sns.relplot(data = daily_merged[daily_merged['country']=="United States"], x="date", y="people_vaccinated_per_hundred", kind="line")
plt.ylim(0, 100)
plt.xticks(rotation=-45)

This plot illustrated the number of deaths per cases versus the date. The number of deaths per cases has been decreasing since the start (April/May 2020) and has stayed stable now, at around 1 in every 1000 people. One possible reason for this decrease could be that, when COVID-19 first hit, people did not know how to treat the virus and as cases were rapidly increasing, hospitals were also filling up really quickly. However, as time went on, it became less severe because we adapted and were able to find better solutions and treatments to lower the number of deaths per cases. 

In [None]:
import datetime
daily_data["deaths_per_cases"]=daily_data["daily_new_deaths"]/daily_data["active_cases"]
sns.relplot(data = daily_data[(daily_data['country']=='United States') & (daily_data['date']>"2020-04-01")], x="date", y="deaths_per_cases")
plt.xticks(rotation=-45)

## Seychelles: Cases, Vaccination Progress, and Deaths

This plot shows the daily new cases versus the date. It clearly illustrates that the number of daily new cases recently spiked (in May 2021). 

In [None]:
sns.relplot(data = daily_merged[daily_merged['country']=='Seychelles'], x="date", y="daily_new_cases", hue = "country", kind = "line")
plt.xticks(rotation=-45)

This plot is an illustration of the total vaccinations in Seychelles as time went on. It has been increasing ever since the start of this year. 

In [None]:
sns.relplot(data = daily_merged[daily_merged['country']=='Seychelles'], x="date", y="total_vaccinations", hue = "country", kind = "line")
plt.xticks(rotation=-45)

This plot represents the number of people vaccinated per hundred. It reveals that around 70% of the population in Seychelles has been vaccinated. There is a disconnect between the recent number of daily cases and the number of people vaccinated. This may be due to the fact that Seychelles is opening to tourists again or possible new COVID-19 variants. This recent spike in cases also makes us question the effectiveness of the vaccine. 

In [None]:
sns.relplot(data=daily_merged[daily_merged['country']=="Seychelles"], x="date", y="people_vaccinated_per_hundred", kind="line")
plt.ylim(0, 100)
plt.xticks(rotation=-45)

This plot shows that deaths per cases versus time in Seychelles. It shows that the deaths per cases has been consistently at zero, with occasional dates where the deaths have been around 2-6 people per 1000. 

In [None]:
daily_data.dropna()
sns.relplot(data = daily_data[(daily_data['country']=='Seychelles') & (daily_data['date']>"2020-04-01")], x="date", y="deaths_per_cases")
plt.xticks(rotation=-45)

## India: Cases, Vaccination Progress, and Deaths

This first plot shows the daily new cases in India versus the date. Unlike most countries, the data in India shows that the daily new cases has recently spiked (in Spring of 2021) and has been growing exponentially since the start of this year. They are now at around 400,000 new cases per day. 

In [None]:
sns.relplot(data = daily_merged[daily_merged['country']=='India'], x="date", y="daily_new_cases", hue = "country", kind = "line")
plt.xticks(rotation=-45)

This graph shows the number of total vaccinations versus time in India. Although there appears to be a large number of people vaccinated, India is also the largest country in the world in terms of their population. 

In [None]:
sns.relplot(data = daily_merged[daily_merged['country']=='India'], x="date", y="total_vaccinations", hue = "country", kind = "line")
plt.xticks(rotation=-45)

This graph shows the people vaccinated per hundred in India. It reveals that only 10 percent of the population has been vaccinated, which could be a reason behind why India's number of cases has not been improving like other countries.

In [None]:
sns.relplot(data = daily_merged[daily_merged['country']=='India'], x="date", y="people_vaccinated_per_hundred", kind = "line")
plt.ylim(0, 100)
plt.xticks(rotation=-45)

This graph reveals that the deaths per cases in India has been increasing slightly recently. The deaths per cases decreased to an average of 1 out of every 1000 people in fall and winter of 2020 but recently (in May 2021), there is a tail sticking out, suggesting an increasing trend towards more than 2 deaths out of every 1000 cases. These findings are consistent with what we find in the daily news, as cases in India have been spiking and the vaccination progress is very slow.


In [None]:
sns.relplot(data = daily_data[(daily_data['country']=='India') & (daily_data['date']>"2020-04-01")], x="date", y="deaths_per_cases")
plt.xticks(rotation=-45)

## Italy: Cases, Deaths, and Vaccination Progress

This plot shows the number of daily new cases in Italy, with the orange line being a 7 day average (to make the graph clearer). The number of daily new cases spiked in the beginning of this year, then decreased, and then spiked again around March of 2021. Recently, it has been decreasing again. 


In [None]:
import matplotlib.dates as mdates
fig,ax=plt.subplots()
sns.lineplot(data = daily_merged[daily_merged['country']=='Italy'], x="date", y="daily_new_cases", label = "daily", ax = ax)
daily_merged['7_day_avg'] = daily_merged.daily_new_cases.rolling(7).mean().shift(-3)
sns.lineplot(x="date", y="7_day_avg", data=daily_merged[daily_merged['country']=='Italy'], ax = ax, label="7_day_avg")
fig.set_figheight(9)
fig.set_figwidth(15)
ax.set_xticklabels(ax.get_xticks(), rotation = -45)
# assign locator and formatter for the xaxis ticks.
ax.xaxis.set_major_locator(mdates.AutoDateLocator())
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y.%m.%d'))

This plot shows the total vaccinations in Italy, and it has been increasing a lot since the start of this year. 

In [None]:
sns.relplot(data = daily_merged[daily_merged['country']=='Italy'], x="date", y="total_vaccinations", hue = "country", kind = "line")
plt.xticks(rotation=-45)

This is a plot of the number of people vaccinated per hundred versus time. It reveals that around 30 percent of people in Italy has been vaccinated. 

In [None]:
sns.relplot(data = daily_merged[daily_merged['country']=='Italy'], x="date", y="people_vaccinated_per_hundred", kind = "line")
plt.ylim(0, 100)
plt.xticks(rotation=-45)

This is a graph of the deaths per cases in Italy. The death rate during the spring and summer of 2020 was at its highest but it has been decreasing since and it is now consistently at 1 out of every 1000 .


In [None]:
sns.relplot(data = daily_data[(daily_data['country']=='Italy') & (daily_data['date']>"2020-04-01")], x="date", y="deaths_per_cases")
plt.xticks(rotation=-45)

## Australia: Cases, Deaths, and Vaccination Progress

This first plot is of the daily new cases in Australia versus the time. The scale of the y-axis reveals that there are not many new daily cases in Australia; however, recently, it has been growing slightly.

In [None]:
sns.relplot(data = daily_merged[daily_merged['country']=='Australia'], x="date", y="daily_new_cases", hue = "country", kind = "line")
plt.xticks(rotation=-45)

This graph of total vaccinations in Australia shows that the number has been incresasing linearly since March of 2021.

In [None]:
sns.relplot(data = daily_merged[daily_merged['country']=='Australia'], x="date", y="total_vaccinations", hue = "country", kind = "line")
plt.xticks(rotation=-45)

Although the total vaccinations in Australia has been increasing, this graph of people vaccinated per hundred reveals that less than 5 percent of the population in Australia has been vaccinated. 

In [None]:
sns.relplot(data = daily_merged[daily_merged['country']=='Australia'], x="date", y="people_vaccinated_per_hundred", kind = "line")
plt.ylim(0, 100)
plt.xticks(rotation=-45)

This graph of deaths per cases show that deaths per cases were high in the fall of 2020, but has been consistently at zero since the start of 2021

In [None]:
sns.relplot(data = daily_data[(daily_data['country']=='Australia') & (daily_data['date']>"2020-04-01")], x="date", y="deaths_per_cases")
plt.xticks(rotation=-45)

## France: Cases, Deaths, and Vaccination Progress

This graph of daily new cases in France shows that the number of daily new cases spiked in April 2021 (reaching highest of 60,000), has been decreasing since then. 

In [None]:
sns.relplot(data = daily_merged[daily_merged['country']=='France'], x="date", y="daily_new_cases", hue = "country", kind = "line")
plt.xticks(rotation=-45)

This graph of total vaccinations in France shows that, as time went on, the number of vaccinations also increased pretty consistently.

In [None]:
sns.relplot(data = daily_merged[daily_merged['country']=='France'], x="date", y="total_vaccinations", hue = "country", kind = "line")
plt.xticks(rotation=-45)

This graph is a plot of people vaccinated per hundred versus time. It shows that around 30 percent of the people in France has been vaccinated. 

In [None]:
sns.relplot(data = daily_merged[daily_merged['country']=='France'], x="date", y="people_vaccinated_per_hundred", kind = "line")
plt.ylim(0, 100)
plt.xticks(rotation=-45)

This graph of deaths per cases in France illustrates that deaths per cases reached a high of 3 per every 100 people in May of 2020. However, deaths have been decreasing exponentially ever since, and now it is consistently at 1 out of every 1000

In [None]:
sns.relplot(data = daily_data[(daily_data['country']=='France') & (daily_data['date']>"2020-04-01")], x="date", y="deaths_per_cases")
plt.xticks(rotation=-45)

This final plot shows a relationship between the number of daily cases of all the countries I looked more deeply into. It accurately depicts that the cases in India has been spiking, while most other countries have seen a decrease in the number of daily cases. This graph also reveals the difference between the number of daily cases between different sized countries, as the United States and India have seen many cases because of their large population. France, Italy, and Australia have relatively medium sized populations, while Seychelles is a very lightly populated country, leading to it having the lowest number of daily cases. 

In [None]:
subset=daily_merged[Italy|United_States|Seychelles|India|Australia|France]
sns.relplot(data = subset, x="date", y="7_day_avg", hue = "country", kind="line")
plt.xticks(rotation=-45)