<h1>Covid-19 Data Analysis and Predcitions</h1>


<h2>Overview</h2>

- in 2019 a newly virus called Coronavirus was first discovered in the city of Wuhuan, China.
- Coronavirus(COVID-19) is an infectious disease caused by a mild to moderate respotory illness and mostly recovered without requiring treatement.
- older people however, in particularly with medical history of cardiovascular diseases, diabetes, chronic respiratory disease and cancer are more likly to serious illness.
- the virus was transmitable from one human to the other by droplets of saliva or discharge from the nose when an infected person coughs or sneezes

<h2>Acknowledgements and data source</h2>

- Science and Engineering department at John Hopkins University - [JHU_Covid-19_Dataset](https://github.com/CSSEGISandData/COVID-19)
- Worldmeters - [Worldmeters](https://www.worldometers.info)

<h3>install and import packages to the enviroment<\h3>

In [None]:
#install packages required for the enviroment.
!pip install chart_studio
!pip install autoviz
!pip install xlrd
!pip install -q sklearn

#import packages to the enviroment
import pandas as pd
import numpy as np
import chart_studio.plotly as py
import cufflinks as cf
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
from autoviz.AutoViz_Class import AutoViz_Class
import h2o
from h2o.automl import H2OAutoML
import pandas as pd
h2o.init()


import tensorflow as tf

from tensorflow import feature_column
from tensorflow.keras import layers
from sklearn.model_selection import train_test_split


%matplotlib inline

# Make Plotly work in your Jupyter Notebook
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
init_notebook_mode(connected=True)
# Use Plotly locally
cf.go_offline()

print("Packages installed in enviroment successfully")

<h3>Load Dataset and inspect<\h4>


    
<h5>Now that we have installed and imported the packages that we will use for the enviroment. lets go ahead and load the dataset for inspection and analysis.<\h5>

In [None]:
#load the dataset
df = pd.read_csv("../input/corona-virus-report/country_wise_latest.csv")

print("'Dataset information'")
print()
df.info()

In [None]:
#columns in the dataset
df.columns

In [None]:
#describe the dataset
df.describe()


<h4>Lets examine the top 5 countries with max confirmed numbers.<\h4>

In [None]:
#max number of confirmed, recovered, deaths and active.
maxConfirmed = df.loc[df["Confirmed"].idxmax()]
maxRecovered = df.loc[df["Recovered"].idxmax()]
maxDeaths = df.loc[df["Deaths"].idxmax()]
maxActive = df.loc[df["Active"].idxmax()]

print("-----------Max Confirmed number--------------")
print(maxConfirmed)
print()
print("-----------Max Deaths number-----------------")
print(maxDeaths)
print()
print("-----------Max Recovered number--------------")
print(maxRecovered)
print()
print("-----------Max Active number-----------------")
print(maxActive)

**Lets visualise the top 10 confirmed countries in descending order**

In [None]:

#dislay confirmed in first 10 countries 
df = df.sort_values("Confirmed", ascending = False)
fig = px.bar(df.head(10), y = "Confirmed", x = "Country/Region",
            text = "Confirmed", color = "Country/Region")
#total values bar with 2 precision values
fig.update_traces(texttemplate = "%{text:.2s}", textposition = "outside")
#set fontsize and uniformText
fig.update_layout(uniformtext_minsize = 8)
#rotate label 45 degrees
fig.update_layout(xaxis_tickangle=-45)
fig.update_layout(legend_title_text='confirmed (Covid-19)')
fig.update_layout(title_text='Top 10 Confirmed cases in the world')
fig

**Top 10 Deaths reported countries in the world**

In [None]:
#dislay deaths in first 10 countries 
df = df.sort_values("Deaths", ascending = False)
fig = px.bar(df.head(10), y = "Deaths", x = "Country/Region",
            text = "Deaths", color = "Country/Region")
#total values bar with 2 precision values
fig.update_traces(texttemplate = "%{text:.2s}", textposition = "outside")
#set fontsize and uniformText
fig.update_layout(uniformtext_minsize = 8)
#rotate label 45 degrees
fig.update_layout(xaxis_tickangle=-45)
fig.update_layout(legend_title_text='Deaths (Covid-19)')
fig.update_layout(title_text='Top 10 Death cases in the world')
fig

**Top 10 recovered countries in the world**

In [None]:
#dislay Recovered in first 10 countries 
df = df.sort_values("Recovered", ascending = False)
fig = px.bar(df.head(10), y = "Recovered", x = "Country/Region",
            text = "Recovered", color = "Country/Region")
#total values bar with 2 precision values
fig.update_traces(texttemplate = "%{text:.2s}", textposition = "outside")
#set fontsize and uniformText
fig.update_layout(uniformtext_minsize = 8)
#rotate label 45 degrees
fig.update_layout(xaxis_tickangle=-45)
fig.update_layout(legend_title_text='Recovered (Covid-19)')
fig.update_layout(title_text='Top 10 Recovered cases in the world')
fig

Scatter matrix plot between the confirmed, deaths, recovered and active cases

In [None]:

#Scatter matrix plot between the confirmed, deaths and recovery category.
fig = px.scatter_matrix(df.head(10), dimensions=["Confirmed","Deaths","Recovered","Active"], color = "Country/Region")
fig

Scatter matrix of new cases, new deaths and new recovery.

In [None]:
#scatter matrix of new cases, new deaths and new recovery.
fig = px.scatter_matrix(df.head(10), dimensions = ["New cases","New deaths","New recovered"], color = "Country/Region")
fig

Automatic visualisation of the entire data to find hidden patterns or insight.

In [None]:
av = AutoViz_Class()
df = "../input/corona-virus-report/country_wise_latest.csv"
sep = ","
draftAutoViz = av.AutoViz(
            df,
            sep = ",",
            depVar="",
            dfte = None,
            header = 0,
            verbose = 0,
            lowess = False,
            chart_format = "svg",
            max_cols_analyzed=30,
            max_rows_analyzed=1500000,
            )


create a prediction

In [None]:
df = pd.read_csv("../input/corona-virus-report/country_wise_latest.csv")

df = df.sort_values("Confirmed", ascending = False)
df

In [None]:
#split dataset to train and test dataset
x_train, y_train = train_test_split(df, test_size=0.2)
x_test, y_test = train_test_split(df, test_size = 0.2)
print(len(x_train), "x_train Dataset")
print(len(y_train), "y_train Dataset")
print(len(x_test), "x_test Dataset")
print(len(y_test), "y_test Dataset")


In [None]:
x_train = np.array(x_train)
y_train = np.array(y_train)
x_test = np.array(x_test)
y_test = np.array(y_test)

x_train


In [None]:
df = h2o.import_file("../input/corona-virus-report/country_wise_latest.csv")

df2 = df.sort(1, ascending=False)
df2


In [None]:


train,test,valid = df2.split_frame(ratios=[.7, .15])

x = train.columns
y = "Confirmed"
x.remove(y)



train[y] = train[y].asfactor()
test[y] = test[y].asfactor()

aml = H2OAutoML(max_models=20, seed = 1)
aml.train(x=x,y=y,training_frame=train)


In [None]:
lb = aml.leaderboard
lb.head(rows=lb.nrows)

In [None]:
test = test.drop("Confirmed")
test
predictions = aml.leader.predict(test)

In [None]:
predictions.describe()