In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

## ****1. Overview****
This notebook contains covid Data from the **World Health Organization** and vaccine data from **Our World in Data**. 

**Questions I hope to answer:**
* Is there a connection to a countries GDP and the number of Covid-related deaths it has?
* Is there a connection to the number of vaccines administered and the number of Covid-related deaths per country?
* Which country is the most likely to plummet in cases?
* In the US (by state), when can we predict our cases to plummet?

## ****2. Data Profile****

**Datasets:**
* Dataset One: Global Covid Stats — https://github.com/owid/covid-19-data/tree/master/public/data/ (Links to an external site.)
* Dataset Two: US Covid & Vaccine Stats by State — https://github.com/CSSEGISandData/COVID-19
        
**Prior Work**
* The CDC has used Covid data to visualize risk profiles and forecasting. I hope to also visualize a countries risk, by comparing the number of vaccines administered to Covid cases, as well connect it to a countries GDP and # of deaths. Like them, I want to forecast which country is on the brink of getting better, and which ones are likely to get worse. https://covid.cdc.gov/covid-data-tracker/#datatracker-home
* CNN posted an article recently, predicting "cases and deaths to plummet". It was backed by data and predictions on the percentage of vaccinations the country needs to be at. They state once the number of first time vaccinations reach 60%, we can expect positive change. I would also like to estimate when this may occur based off the trend of vaccines being administered. https://www.cnn.com/2021/05/10/health/us-coronavirus-monday/index.html

**Entities/terms/features needing to be extracted:**
I aim to pull data about new Covid cases, dates, new deaths, # of full vaccines (also 1st doses), countries GDP, and more.

**Are there restrictions or limitations to using the data?**
No 

**What would you need or not need from it to explore your question?**
Similar to what I said above, I would pull info from rows and columns on: new Covid cases, dates, new deaths, # of full vaccines (also 1st doses), countries GDP, and more.

**Questions I hope to answer:**
* Is there a connection to a countries GDP and the number of Covid-related deaths it has?
* Is there a connection to the number of vaccines administered and the number of Covid-related deaths per country?
* Which country is the most likely to plummet in cases?
* In the US (by state), when can we predict our cases to plummet?

**How will this data be analyzed?**
I plan to use Plotly. I'll use aggregations, correlation to compare data to look for patterns. 

**How will I use the entities/terms/features?**
I hope to use what I collect to predict a country's future in relation to Covid, and what factors can affect this prediction.

## ****3. Analysis****

Using plotly to answer the questions:
* Is there a connection to a countries GDP and the number of Covid-related deaths it has?
* Is there a connection to the number of vaccines administered and the number of Covid-related deaths per country?
* Which country is the most likely to plummet in cases?
* In the US (by state), when can we predict our cases to plummet?

**First I need to read the csv's, and then clean then up a bit to reduce unneeded data for my analysis.**

In [None]:
#Opening the global covid CSV file and printing the head

global_data = pd.read_csv("../input/globalcovidcsv/owid-covid-data (2).csv")
global_data.head()

In [None]:
#Cleaning up the data, and seeing the most recent cases to date

global_data = global_data[["continent", "location", "date", "total_cases", "new_cases", "new_deaths", "gdp_per_capita", "people_vaccinated", "people_fully_vaccinated"]]
global_data.tail(10)

In [None]:
#Opening the state level covid and vaccine stats by state CSV file and printing the head

state_data = pd.read_csv("../input/covidusbystatecsv/01-01-2021.csv")
state_data.head()

In [None]:
#Cleaning up the data, and seeing the most recent cases to date

state_data = state_data[["Province_State", "Confirmed", "Deaths", "Case_Fatality_Ratio", "Recovered"]]
state_data.tail(10)

****Is there a connection to a countries GDP and the number of Covid-related deaths it has?****

It looks like the higher a GDP is for a country, the more covid cases they have. 

In [None]:
import plotly.express as px

#Remove the null values
global_data = global_data[global_data.continent.notnull()]

#Create line chart
fig = px.line(global_data, x= "gdp_per_capita", y="new_cases", color="continent", line_group="continent", hover_name="continent",
        line_shape="spline", render_mode="svg", labels={'new_cases':'# of Covid Cases', 'continent': 'Continent', 'gdp_per_capita': 'GDP Per Capita'})
fig.show()

****Is there a connection to the number of vaccines administered and the number of Covid-related deaths per country?****

Based off the data, it doesn't make much of a difference whether or not a country gets vaccinated, as the number of deaths remain. This could however be because vaccinations rolled out not too long ago, and it may be another few months or years before the virus stops spreading and/or everyone gets vaccinated. 

In [None]:
import plotly.express as px

#Create line chart
fig = px.line(global_data, x= "people_fully_vaccinated", y="new_deaths", color="continent", line_group="continent", hover_name="continent",
        line_shape="spline", render_mode="svg", labels={'new_deaths':'New Deaths', 'continent': 'Continent', 'people_fully_vaccinated': '# of People Vaccinated'})
fig.show()


****Which country is the most likely to plummet in cases?****

According to the pie chart, Africa has the lowest number of cases recorded. This could possibly indicate that Africa would beat the virus before any other country.

In [None]:
#Create pie chart
fig = px.pie(global_data, values='new_cases', names='continent', title='Total Cases of Covid Per Continent')
fig.show()

import plotly.express as px

#Create line chart
fig = px.line(global_data, x="date", y="new_cases", color="continent", line_group="continent", hover_name="continent",
        line_shape="spline", render_mode="svg", labels={'new_cases':'# of Covid Cases', 'continent': 'Continent', 'date': 'Timeline by Month'})
fig.show()

#Isolate the United States
united_states = global_data[global_data.location == 'United States']

#Create Bar Chart
fig = px.bar(united_states, x='date', y='new_cases',
             hover_data=['date', 'new_cases'], color='new_deaths',
             labels={'new_cases':'# of US Covid Cases', 'new_deaths': '# of Deaths', 'date': 'Timeline by Month'}, height=400)
fig.show()

****In the US, when can we predict our cases to plummet? Also, which state is most likely to end their covid cases?****

California and Texas would take the longest, since they have the most numbers. Florida comes in third.

In [None]:
state_data = state_data[state_data.Province_State.notnull()]

#Create line chart
fig = px.line(state_data, x= "Province_State", y="Confirmed",
        line_shape="spline", render_mode="svg", labels={'new_cases':'# of Covid Cases', 'continent': 'Continent', 'date': 'Timeline by Month'})
fig.show()

# Conclusions/Directions for future work

Summary:

****Is there a connection to a countries GDP and the number of Covid-related deaths it has?****
It looks like the higher a GDP is for a country, the more covid cases they have. 

****Is there a connection to the number of vaccines administered and the number of Covid-related deaths per country?****
Based off the data, it doesn't make much of a difference whether or not a country gets vaccinated, as the number of deaths remain. This could however be because vaccinations rolled out not too long ago, and it may be another few months or years before the virus stops spreading and/or everyone gets vaccinated. 

****Which country is the most likely to plummet in cases?****
According to the pie chart, Africa has the lowest number of cases recorded. This could possibly indicate that Africa would beat the virus before any other country.

****In the US, when can we predict our cases to plummet? Also, which state is most likely to end their covid cases?****
California and Texas would take the longest, since they have the most numbers. Florida comes in third.

**Final Thoughts**

I initially wanted to predict when covid cases would get better, but it was hard to do this only using Plotly. If I could do it again I would navigate using different graphs and machine learning tools. This was a good excerise to comb through datasets and provide visualizations.

The hardest part was plotting the GDP on a graph. Since it doesn't change (almost at all), the X axis was just a long line. I needed to do a bit more research on which graph would have visualized this better. I also had throuble building horizontal bar charts and feel adding one could have visualized the number of cases, number of deaths, and number of vaccines per country.