In [0]:
# %% [markdown]
# ### What is COVID-19?

# %% [markdown]
# COVID-19 is a respiratory illness caused by a new virus. Symptoms include fever, coughing, sore throat and shortness of breath. The virus can spread from person to person, but good hygiene can prevent infection.

# %% [markdown]
# ### Related Information about COVID-19

# %% [markdown]
# COVID-19 may not be fatal but it spreads faster than other diseases, like common cold. Every virus has Basic Reproduction number (R0) which implies how many people will get the disease from the infected person. As per inital reseach work R0 of COVID-19 is 2.7.
# 
# Currently the goal of all scientists around the world is to "Flatten the Curve". COVID-19 currently has exponential growth rate around the world which we will be seeing in the notebook ahead. Flattening the Curve typically implies even if the number of Confirmed Cases are increasing but the distribution of those cases should be over longer timestamp. To put it in simple words if say suppose COVID-19 is going infect 100K people then those many people should be infected in 1 year but not in a month. 
# 
# The sole reason to Flatten the Curve is to reudce the load on the Medical Systems so as to increase the focus of Research to find the Medicine for the disease.
# 
# Every Pandemic has four stages:
# 
# Stage 1: Confirmed Cases come from other countries
# 
# Stage 2: Local Transmission Begins
# 
# Stage 3: Communities impacted with local transimission
# 
# Stage 4: Significant Transmission with no end in sight
# 
# Italy, USA, UK and France are the two countries which are currently in Stage 4
# While India is in on the edge of Stage 3.
# 
# Other ways to tackle the disease like Corona other than Travel Ban, Cross-Border shutdown, Ban on immigrants are Testing, Contact Tracing and Quarantine.

# %% [markdown]
# #### Interesting YouTube Videos related to COVID-19
# Gravitas: Why is the WHO Director General toeing China's line? | Coronavirus: 
# https://www.youtube.com/watch?v=O1NGzmDVWxA
# 
# Gravitas: Demand grows for W.H.O Boss' resignation:
# https://www.youtube.com/watch?v=J8TfKok9Rns&t=157s
# 
# Gravitas: Wuhan Coronavirus, countries that are setting an example for the world:
# https://www.youtube.com/watch?v=peBNIHRtUY8&t=298s
# 
# Gravitas: Wuhan Coronavirus, Is China downplaying the number of victims?:
# https://www.youtube.com/watch?v=0VEcyEhrtgI
# 
# Gravitas: UNSC fails to pin responsibility on China | Coronavirus:
# https://www.youtube.com/watch?v=op43xZ1XMQY
# 
# Gravitas: Will China change its eating habits? | Wuhan Coronavirus:
# https://www.youtube.com/watch?v=wjGw36K0RCU
# 
# Gravitas: Taiwan unearths China's 'Fake apology' plot | Coronavirus outbreak
# https://www.youtube.com/watch?v=3kLYAiv4TkA
# 
# Leaders around the world determined to flatten the COVID-19 curve | Coronavirus | World News:
# https://www.youtube.com/watch?v=licXTAyMx3c&list=WL&index=2&t=164s

# %% [markdown]
# ### Objective of the Notebook

# %% [markdown]
# Objective of this notebook is to study COVID-19 outbreak with the help of some basic visualizations techniques. Comparison of China where the COVID-19 originally originated from with the Rest of the World. Perform predictions and Time Series forecasting in order to study the impact and spread of the COVID-19 in comming days.

# %% [markdown]
# ## Let's get Started

# %% [markdown]
# ## Importing required Python Packages and Libraries

# %% [code]
import warnings
warnings.filterwarnings('ignore')
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import datetime as dt
from datetime import timedelta
from sklearn.model_selection import GridSearchCV
from sklearn.linear_model import LinearRegression,Ridge,Lasso
from sklearn.svm import SVR
from sklearn.metrics import mean_squared_error,r2_score
import statsmodels.api as sm
from statsmodels.tsa.api import Holt,SimpleExpSmoothing,ExponentialSmoothing
from fbprophet import Prophet
from sklearn.preprocessing import PolynomialFeatures
from statsmodels.tsa.stattools import adfuller
from statsmodels.tsa.ar_model import AR
from statsmodels.tsa.arima_model import ARIMA
from statsmodels.graphics.tsaplots import plot_acf,plot_pacf
#pd.set_option('display.float_format', lambda x: '%.6f' % x)

# %% [code]
COUNTRY = "Brazil"
df = pd.read_csv("../input/novel-corona-virus-2019-dataset/covid_19_data.csv")
covid=pd.read_csv("../input/novel-corona-virus-2019-dataset/covid_19_data.csv")

df['Country/Region'] = df['Country/Region'].astype('category')
poland = df[df.loc[:, 'Country/Region'] == COUNTRY]
covid.head()

# %% [code]
poland.head()

# %% [code]
#covid = poland
covid

# %% [code]
print("Size/Shape of the dataset: ",covid.shape)
print("Checking for null values:\n",covid.isnull().sum())
print("Checking Data-type of each column:\n",covid.dtypes)

# %% [code]
#Dropping column as SNo is of no use, and "Province/State" contains too many missing values
covid.drop(["SNo"],1,inplace=True)

# %% [code]
#Converting "Observation Date" into Datetime format
covid["ObservationDate"]=pd.to_datetime(covid["ObservationDate"])

# %% [markdown]
# ## Datewise analysis

# %% [code]
#Grouping different types of cases as per the date
datewise=covid.groupby(["ObservationDate"]).agg({"Confirmed":'sum',"Recovered":'sum',"Deaths":'sum'})

# %% [code]
print("****************************************************************************")
print("Basic Information about the COVID-19 Spread across the world")
print("****************************************************************************")
print(" ")


print("Total number of countries with Disease Spread: ",len(covid["Country/Region"].unique()))
print("Total number of Confirmed Cases around the World: {:.0f} ".format(datewise["Confirmed"].iloc[-1]))
print("Total number of Recovered Cases around the World: {:.0f}".format(datewise["Recovered"].iloc[-1]))
print("Total number of Deaths Cases around the World: {:.0f}".format(datewise["Deaths"].iloc[-1]))
print("Total number of Active Cases around the World: ",int((datewise["Confirmed"].iloc[-1]-datewise["Recovered"].iloc[-1]-datewise["Deaths"].iloc[-1])))
print("Total number of Closed Cases around the World: ",int(datewise["Recovered"].iloc[-1]+datewise["Deaths"].iloc[-1]))
print("Approximate number of Confirmed Cases per Day around the World: ",int(np.round(datewise["Confirmed"].iloc[-1]/datewise.shape[0])))
print("Approximate number of Recovered Cases per Day around the World: ",int(np.round(datewise["Recovered"].iloc[-1]/datewise.shape[0])))
print("Approximate number of Death Cases per Day around the World: ",int(np.round(datewise["Deaths"].iloc[-1]/datewise.shape[0])))
print("Approximate number of Confirmed Cases per hour around the World: ",int(np.round(datewise["Confirmed"].iloc[-1]/((datewise.shape[0])*24))))
print("Approximate number of Recovered Cases per hour around the World: ",int(np.round(datewise["Recovered"].iloc[-1]/((datewise.shape[0])*24))))
print("Approximate number of Death Cases per hour around the World: ",int(np.round(datewise["Deaths"].iloc[-1]/((datewise.shape[0])*24))))
print("****************************************************************************")
print(" ")
print("Acknowledgements:")
print("Thanks to the WHO and Johns Hopkins University for making the ")
print("data available for educational and academic research purposes - Jair Ribeiro")

# %% [markdown]
# ## Poland Datewise analysis

# %% [code]
plt.figure(figsize=(15,5))
sns.barplot(x=datewise.index.date, y=datewise["Confirmed"]-datewise["Recovered"]-datewise["Deaths"])
plt.title("Distribution Plot for Active Cases Cases over Date")
plt.xticks(rotation=90)

# %% [markdown]
# #### Active Cases = Number of Confirmed Cases - Number of Recovered Cases - Number of Death Cases
# #### Increase in number of Active Cases is probably an indication of Recovered case or Death case number is dropping in comparison to number of Confirmed Cases drastically. Will look for the conclusive evidence for the same in the notebook ahead.

# %% [code]
plt.figure(figsize=(15,5))
sns.barplot(x=datewise.index.date, y=datewise["Recovered"]+datewise["Deaths"])
plt.title("Distribution Plot for Closed Cases Cases over Date")
plt.xticks(rotation=90)

# %% [markdown]
# #### Closed Cases = Number of Recovered Cases + Number of Death Cases 
# #### Increase in number of Closed classes imply either more patients are getting recovered from the disease or more pepole are dying because of COVID-19

# %% [code]
datewise["WeekOfYear"]=datewise.index.weekofyear

week_num=[]
weekwise_confirmed=[]
weekwise_recovered=[]
weekwise_deaths=[]
w=1
for i in list(datewise["WeekOfYear"].unique()):
    weekwise_confirmed.append(datewise[datewise["WeekOfYear"]==i]["Confirmed"].iloc[-1])
    weekwise_recovered.append(datewise[datewise["WeekOfYear"]==i]["Recovered"].iloc[-1])
    weekwise_deaths.append(datewise[datewise["WeekOfYear"]==i]["Deaths"].iloc[-1])
    week_num.append(w)
    w=w+1

plt.figure(figsize=(8,5))
plt.plot(week_num,weekwise_confirmed,linewidth=3)
plt.plot(week_num,weekwise_recovered,linewidth=3)
plt.plot(week_num,weekwise_deaths,linewidth=3)
plt.ylabel("Number of Cases")
plt.xlabel("Week Number")
plt.title("Weekly progress of Different Types of Cases")
plt.xlabel

# %% [code]
fig, (ax1,ax2) = plt.subplots(1, 2,figsize=(15,5))
sns.barplot(x=week_num,y=pd.Series(weekwise_confirmed).diff().fillna(0),ax=ax1)
sns.barplot(x=week_num,y=pd.Series(weekwise_deaths).diff().fillna(0),ax=ax2)
ax1.set_xlabel("Week Number")
ax2.set_xlabel("Week Number")
ax1.set_ylabel("Number of Confirmed Cases")
ax2.set_ylabel("Number of Death Cases")
ax1.set_title("Weekly increase in Number of Confirmed Cases")
ax2.set_title("Weekly increase in Number of Death Cases")

# %% [markdown]
# ##### Please note the 12th week has just started

# %% [markdown]
# #### Growth rate of Confirmed, Recovered and Death Cases

# %% [code]
plt.figure(figsize=(12,6))
plt.plot(datewise["Confirmed"],marker="o",label="Confirmed Cases")
plt.plot(datewise["Recovered"],marker="*",label="Recovered Cases")
plt.plot(datewise["Deaths"],marker="^",label="Death Cases")
plt.ylabel("Number of Patients")
plt.xlabel("Timestamp")
plt.xticks(rotation=90)
plt.title("Growth of different Types of Cases over Time")
plt.legend()

# %% [markdown]
# #### Moratality and Recovery Rate analysis around the World

# %% [code]
#Calculating the Mortality Rate and Recovery Rate
datewise["Mortality Rate"]=(datewise["Deaths"]/datewise["Confirmed"])*100
datewise["Recovery Rate"]=(datewise["Recovered"]/datewise["Confirmed"])*100
datewise["Active Cases"]=datewise["Confirmed"]-datewise["Recovered"]-datewise["Deaths"]
datewise["Closed Cases"]=datewise["Recovered"]+datewise["Deaths"]

#Plotting Mortality and Recovery Rate 
fig, (ax1, ax2) = plt.subplots(1, 2,figsize=(20,6))
ax1.plot(datewise["Mortality Rate"],label='Mortality Rate',linewidth=3)
ax1.axhline(datewise["Mortality Rate"].mean(),linestyle='--',color='black',label="Mean Mortality Rate")
ax1.set_ylabel("Mortality Rate")
ax1.set_xlabel("Timestamp")
ax1.set_title("Overall Datewise Mortality Rate")
ax1.legend()
for tick in ax1.get_xticklabels():
    tick.set_rotation(90)
ax2.plot(datewise["Recovery Rate"],label="Recovery Rate",linewidth=3)
ax2.axhline(datewise["Recovery Rate"].mean(),linestyle='--',color='black',label="Mean Recovery Rate")
ax2.set_ylabel("Recovery Rate")
ax2.set_xlabel("Timestamp")
ax2.set_title("Overall Datewise Recovery Rate")
ax2.legend()
for tick in ax2.get_xticklabels():
    tick.set_rotation(90)
    
print("Average Mortality Rate",datewise["Mortality Rate"].mean())
print("Median Mortality Rate",datewise["Mortality Rate"].median())
print("Average Recovery Rate",datewise["Recovery Rate"].mean())
print("Median Recovery Rate",datewise["Recovery Rate"].median())

# %% [markdown]
# #### Mortality rate = (Number of Death Cases / Number of Confirmed Cases) x 100
# #### Recovery Rate= (Number of Recoverd Cases / Number of Confirmed Cases) x 100
# #### Mortality rate increment is pretty significant along with drastic drop in recovery rate falling even below the average Recovery Rate around the World. That's a conclusive evidence why number of Active Cases are rising, also there is increase in number of Closed Cases as the mortality rate is a clear indication of increase number of Death Cases

# %% [code]
plt.figure(figsize=(15,6))
plt.plot(datewise["Confirmed"].diff().fillna(0),label="Daily increase in Confiremd Cases",linewidth=3)
plt.plot(datewise["Recovered"].diff().fillna(0),label="Daily increase in Recovered Cases",linewidth=3)
plt.plot(datewise["Deaths"].diff().fillna(0),label="Daily increase in Death Cases",linewidth=3)
plt.xlabel("Timestamp")
plt.ylabel("Daily Increment")
plt.title("Daily increase in different Types of Cases Worldwide")
plt.xticks(rotation=90)
plt.legend()

print("Average increase in number of Confirmed Cases every day: ",np.round(datewise["Confirmed"].diff().fillna(0).mean()))
print("Average increase in number of Recovered Cases every day: ",np.round(datewise["Recovered"].diff().fillna(0).mean()))
print("Average increase in number of Deaths Cases every day: ",np.round(datewise["Deaths"].diff().fillna(0).mean()))

# %% [markdown]
# ### Growth Factor
# Growth factor is the factor by which a quantity multiplies itself over time. The formula used is:
# 
# **Formula: Every day's new (Confirmed,Recovered,Deaths) / new (Confirmed,Recovered,Deaths) on the previous day.**
# 
# A growth factor **above 1 indicates an increase correspoding cases**.
# 
# A growth factor **above 1 but trending downward** is a positive sign, whereas a **growth factor constantly above 1 is the sign of exponential growth**.
# 
# A growth factor **constant at 1 indicates there is no change in any kind of cases**.

# %% [code]
daily_increase_confirm=[]
daily_increase_recovered=[]
daily_increase_deaths=[]
for i in range(datewise.shape[0]-1):
    daily_increase_confirm.append(((datewise["Confirmed"].iloc[i+1]/datewise["Confirmed"].iloc[i])))
    daily_increase_recovered.append(((datewise["Recovered"].iloc[i+1]/datewise["Recovered"].iloc[i])))
    daily_increase_deaths.append(((datewise["Deaths"].iloc[i+1]/datewise["Deaths"].iloc[i])))
daily_increase_confirm.insert(0,1)
daily_increase_recovered.insert(0,1)
daily_increase_deaths.insert(0,1)

plt.figure(figsize=(15,7))
plt.plot(datewise.index,daily_increase_confirm,label="Growth Factor Confiremd Cases",linewidth=3)
plt.plot(datewise.index,daily_increase_recovered,label="Growth Factor Recovered Cases",linewidth=3)
plt.plot(datewise.index,daily_increase_deaths,label="Growth Factor Death Cases",linewidth=3)
plt.xlabel("Timestamp")
plt.ylabel("Growth Factor")
plt.title("Growth Factor of different Types of Cases Worldwide")
plt.axhline(1,linestyle='--',color='black',label="Baseline")
plt.xticks(rotation=90)
plt.legend()

# %% [markdown]
# #### Growth Factor constantly above 1 is an clear indication of Exponential increase in all form of cases.

# %% [markdown]
# ## Countrywise Analysis

# %% [code]
#Calculating countrywise Moratality and Recovery Rate
countrywise=covid[covid["ObservationDate"]==covid["ObservationDate"].max()].groupby(["Country/Region"]).agg({"Confirmed":'sum',"Recovered":'sum',"Deaths":'sum'}).sort_values(["Confirmed"],ascending=False)
countrywise["Mortality"]=(countrywise["Deaths"]/countrywise["Confirmed"])*100
countrywise["Recovery"]=(countrywise["Recovered"]/countrywise["Confirmed"])*100

# %% [code]
fig, (ax1, ax2) = plt.subplots(1, 2,figsize=(27,10))
top_15_confirmed=countrywise.sort_values(["Confirmed"],ascending=False).head(15)
top_15_deaths=countrywise.sort_values(["Deaths"],ascending=False).head(15)
sns.barplot(x=top_15_confirmed["Confirmed"],y=top_15_confirmed.index,ax=ax1)
ax1.set_title("Top 15 countries as per Number of Confirmed Cases")
sns.barplot(x=top_15_deaths["Deaths"],y=top_15_deaths.index,ax=ax2)
ax2.set_title("Top 15 countries as per Number of Death Cases")

# %% [markdown]
# Tourist Data: https://worldpopulationreview.com/countries/most-visited-countries/
# 
# International Students Data: https://www.easyuni.com/advice/top-countries-with-most-international-students-1184/
# #### If we check the list of countries in accordance to number tourists visiters from link mentioned above, Top countries are mainly France, Spain, USA, China, Italy, Mexico, UK, Turkey, Germany, Thailand. Another thing to take into account most of the countries mentioned above also have highest number of International Students. All of the them are the most affected countries because of COVID-19

# %% [markdown]
# #### Another interesting thing to see is the median age of worst affected countries.
# We can check that here
# *Countrywise Median Age*: https://ourworldindata.org/age-structure

# %% [markdown]
# #### Top 25 Countries as per Mortatlity Rate and Recovery Rate with more than 500 Confirmed Cases

# %% [code]
fig, (ax1, ax2) = plt.subplots(1, 2,figsize=(27,10))
countrywise_plot_mortal=countrywise[countrywise["Confirmed"]>500].sort_values(["Mortality"],ascending=False).head(15)
sns.barplot(x=countrywise_plot_mortal["Mortality"],y=countrywise_plot_mortal.index,ax=ax1)
ax1.set_title("Top 15 Countries according High Mortatlity Rate")
ax1.set_xlabel("Mortality (in Percentage)")
countrywise_plot_recover=countrywise[countrywise["Confirmed"]>500].sort_values(["Recovery"],ascending=False).head(15)
sns.barplot(x=countrywise_plot_recover["Recovery"],y=countrywise_plot_recover.index, ax=ax2)
ax2.set_title("Top 15 Countries according High Recovery Rate")
ax2.set_xlabel("Recovery (in Percentage)")

# %% [code]
fig, (ax1, ax2) = plt.subplots(1, 2,figsize=(27,10))
countrywise_plot_mortal=countrywise[countrywise["Confirmed"]>500].sort_values(["Mortality"],ascending=False).tail(15)
sns.barplot(x=countrywise_plot_mortal["Mortality"],y=countrywise_plot_mortal.index,ax=ax1)
ax1.set_title("Top 15 Countries according Low Mortatlity Rate")
ax1.set_xlabel("Mortality (in Percentage)")
countrywise_plot_recover=countrywise[countrywise["Confirmed"]>500].sort_values(["Recovery"],ascending=False).tail(15)
sns.barplot(x=countrywise_plot_recover["Recovery"],y=countrywise_plot_recover.index, ax=ax2)
ax2.set_title("Top 15 Countries according Low Recovery Rate")
ax2.set_xlabel("Recovery (in Percentage)")

# %% [markdown]
# #### Countries with more than 50 Confirmed and Cases with No Recovered Patients with considerable Mortality Rate

# %% [code]
no_recovered_countries=countrywise[(countrywise["Confirmed"]>50)&(countrywise["Recovered"]==0)][["Confirmed","Deaths"]]
no_recovered_countries["Mortality Rate"]=(no_recovered_countries["Deaths"]/no_recovered_countries["Confirmed"])*100
no_recovered_countries[no_recovered_countries["Mortality Rate"]>0].sort_values(["Mortality Rate"],ascending=False)

# %% [markdown]
# #### Serbia is the country we need to look after as the number of Positive cases are well above 1000 with considerable number of death cases with sign of Recovered Patients.

# %% [markdown]
# #### Countries with more than 100 Confirmed Cases and No Deaths with considerably high Recovery Rate

# %% [code]
no_deaths=countrywise[(countrywise["Confirmed"]>100)&(countrywise["Deaths"]==0)]
no_deaths[no_deaths["Recovery"]>0].sort_values(["Recovery"],ascending=False).drop(["Mortality"],1)

# %% [markdown]
# #### Vietnam has able to contain COVID-19 pretty well with no Deaths recorded so far with pretty healthy Recovery Rate. Just for information Vietnam was the first country to inform World Health Organization about Human to Human Transmission of COVID-19. 
# 
# Gravitas: Wuhan Coronavirus: Taiwan's big claim against WHO:
# https://www.youtube.com/watch?v=USTJUqe_fdk
# 
# WHO releases statement after senior staff’s awkward interview
# https://www.youtube.com/watch?v=wFRHB-wP9SU&feature=youtu.be&fbclid=IwAR1_wXFXq_qG17VZhA4nivmlm8ZWjHD1W0ozYS70YjgBsmfXwRGE_l26ZVU

# %% [code]
fig, (ax1, ax2) = plt.subplots(1, 2,figsize=(27,10))
countrywise["Active Cases"]=(countrywise["Confirmed"]-countrywise["Recovered"]-countrywise["Deaths"])
countrywise["Outcome Cases"]=(countrywise["Recovered"]+countrywise["Deaths"])
top_15_active=countrywise.sort_values(["Active Cases"],ascending=False).head(15)
top_15_outcome=countrywise.sort_values(["Outcome Cases"],ascending=False).head(15)
sns.barplot(x=top_15_active["Active Cases"],y=top_15_active.index,ax=ax1)
sns.barplot(x=top_15_outcome["Outcome Cases"],y=top_15_outcome.index,ax=ax2)
ax1.set_title("Top 15 Countries with Most Number of Active Cases")
ax2.set_title("Top 15 Countries with Most Number of Closed Cases")

# %% [code]
country_date=covid.groupby(["Country/Region","ObservationDate"]).agg({"Confirmed":'sum',"Recovered":'sum',"Deaths":'sum'})
confirm_rate=[]
for country in countrywise.index:
    days=country_date.ix[country].shape[0]
    confirm_rate.append((countrywise.ix[country]["Confirmed"])/days)
countrywise["Confirm Cases/Day"]=confirm_rate

# %% [code]
fig, (ax1, ax2) = plt.subplots(1, 2,figsize=(27,10))
top_15_ccpd=countrywise.sort_values(["Confirm Cases/Day"],ascending=False).head(15)
sns.barplot(y=top_15_ccpd.index,x=top_15_ccpd["Confirm Cases/Day"],ax=ax1)
ax1.set_title("Top 15 countries as per high number Confirmed Cases per Day")
bottom_15_ccpd=countrywise[countrywise["Confirmed"]>1000].sort_values(["Confirm Cases/Day"],ascending=False).tail(15)
sns.barplot(y=bottom_15_ccpd.index,x=bottom_15_ccpd["Confirm Cases/Day"],ax=ax2)
ax2.set_title("Top 15 countries as per Lowest Confirmed Cases per Day having more than 1000 Confirmed Cases")

# %% [markdown]
# #### Mainland China has recorded highest number of Closed cases as thier Recovery Rate is staggering recording 85%+, the reason why Italy has been ranked second among countries with highest number of closed cases is because of number of Deaths in Italy, the Mortality rate in Italy is whooping 10%+ where the COVID-19 has the mortality rate of 2-3%
# #### Confirmed Cases/Day is clear indication of why US has highest number of Active Cases currently. The rate is 5000+ cases per day. Showing increase in that value every day.

# %% [code]
fig, (ax1, ax2) = plt.subplots(1, 2,figsize=(27,8))
countrywise["Survival Probability"]=(1-(countrywise["Deaths"]/countrywise["Confirmed"]))*100
top_25_survival=countrywise[countrywise["Confirmed"]>1000].sort_values(["Survival Probability"],ascending=False).head(15)
sns.barplot(x=top_25_survival["Survival Probability"],y=top_25_survival.index,ax=ax1)
ax1.set_title("Top 25 Countries with Maximum Survival Probability having more than 1000 Confiremed Cases")
print('Mean Survival Probability across all countries',countrywise["Survival Probability"].mean())
print('Median Survival Probability across all countries',countrywise["Survival Probability"].median())
print('Mean Death Probability across all countries',100-countrywise["Survival Probability"].mean())
print('Median Death Probability across all countries',100-countrywise["Survival Probability"].median())

Bottom_5_countries=countrywise[countrywise["Confirmed"]>100].sort_values(["Survival Probability"],ascending=True).head(15)
sns.barplot(x=Bottom_5_countries["Survival Probability"],y=Bottom_5_countries.index,ax=ax2)
plt.title("Bottom 15 Countries as per Survival Probability")

# %% [markdown]
# #### Survival Probability is the only graph that looks the most promising! Having average survival probability of 95%+ across all countries but it's dropping by slight margin everyday. The difference between Mean and Death Probability is an clear indication that there few countries with really high mortality rate e.g. Italy, Algeria, UK etc.

# %% [markdown]
# ### Comparison of China, Italy, US, Spain and Rest of the World

# %% [code]
china_data=covid[covid["Country/Region"]=="Mainland China"]
Italy_data=covid[covid["Country/Region"]=="Italy"]
US_data=covid[covid["Country/Region"]=="US"]
spain_data=covid[covid["Country/Region"]=="Spain"]
rest_of_world=covid[(covid["Country/Region"]!="Mainland China")&(covid["Country/Region"]!="Italy")&(covid["Country/Region"]!="US")&(covid["Country/Region"]!="Spain")]

datewise_china=china_data.groupby(["ObservationDate"]).agg({"Confirmed":'sum',"Recovered":'sum',"Deaths":'sum'})
datewise_Italy=Italy_data.groupby(["ObservationDate"]).agg({"Confirmed":'sum',"Recovered":'sum',"Deaths":'sum'})
datewise_US=US_data.groupby(["ObservationDate"]).agg({"Confirmed":'sum',"Recovered":'sum',"Deaths":'sum'})
datewise_Spain=spain_data.groupby(["ObservationDate"]).agg({"Confirmed":'sum',"Recovered":'sum',"Deaths":'sum'})
datewise_restofworld=rest_of_world.groupby(["ObservationDate"]).agg({"Confirmed":'sum',"Recovered":'sum',"Deaths":'sum'})

# %% [code]
fig, (ax1, ax2, ax3) = plt.subplots(1, 3,figsize=(28,10))
ax1.plot(datewise_china["Confirmed"],label="Confirmed Cases of Mainland China",linewidth=3)
ax1.plot(datewise_Italy["Confirmed"],label="Confirmed Cases of Italy",linewidth=3)
ax1.plot(datewise_US["Confirmed"],label="Confirmed Cases of USA",linewidth=3)
ax1.plot(datewise_Spain["Confirmed"],label="Confirmed Cases of Spain",linewidth=3)
ax1.plot(datewise_restofworld["Confirmed"],label="Confirmed Cases of Rest of the World",linewidth=3)
ax1.set_title("Confirmed Cases Plot")
ax1.set_ylabel("Number of Patients")
ax1.set_xlabel("Timestamp")
ax1.legend()
for tick in ax1.get_xticklabels():
    tick.set_rotation(90)
ax2.plot(datewise_china["Recovered"],label="Recovered Cases of Mainland China",linewidth=3)
ax2.plot(datewise_Italy["Recovered"],label="Recovered Cases of Italy",linewidth=3)
ax2.plot(datewise_US["Recovered"],label="Recovered Cases of US",linewidth=3)
ax2.plot(datewise_Spain["Recovered"],label="Recovered Cases Spain",linewidth=3)
ax2.plot(datewise_restofworld["Recovered"],label="Recovered Cases of Rest of the World",linewidth=3)
ax2.set_title("Recovered Cases Plot")
ax2.set_ylabel("Number of Patients")
ax2.set_xlabel("Timestamp")
ax2.legend()
for tick in ax2.get_xticklabels():
    tick.set_rotation(90)
ax3.plot(datewise_china["Deaths"],label='Death Cases of Mainland China',linewidth=3)
ax3.plot(datewise_Italy["Deaths"],label='Death Cases of Italy',linewidth=3)
ax3.plot(datewise_US["Deaths"],label='Death Cases of US',linewidth=3)
ax3.plot(datewise_Spain["Deaths"],label='Death Cases Spain',linewidth=3)
ax3.plot(datewise_restofworld["Deaths"],label="Deaths Cases of Rest of the World",linewidth=3)
ax3.set_title("Death Cases Plot")
ax3.set_ylabel("Number of Patients")
ax3.set_xlabel("Timestamp")
ax3.legend()
for tick in ax3.get_xticklabels():
    tick.set_rotation(90)

# %% [markdown]
# #### China has been able to "flatten the curve" looking at their graphs of Confirmed and Death Cases. With staggering Recovery Rate.
# #### US seems to have good control on Deaths, but number of people getting affected is going way out of hand.

# %% [code]
datewise_china["Mortality"]=(datewise_china["Deaths"]/datewise_china["Confirmed"])*100
datewise_Italy["Mortality"]=(datewise_Italy["Deaths"]/datewise_Italy["Confirmed"])*100
datewise_US["Mortality"]=(datewise_US["Deaths"]/datewise_US["Confirmed"])*100
datewise_Spain["Mortality"]=(datewise_Spain["Deaths"]/datewise_Spain["Confirmed"])*100
datewise_restofworld["Mortality"]=(datewise_restofworld["Deaths"]/datewise_restofworld["Confirmed"])*100

datewise_china["Recovery"]=(datewise_china["Recovered"]/datewise_china["Confirmed"])*100
datewise_Italy["Recovery"]=(datewise_Italy["Recovered"]/datewise_Italy["Confirmed"])*100
datewise_US["Recovery"]=(datewise_US["Recovered"]/datewise_US["Confirmed"])*100
datewise_Spain["Recovery"]=(datewise_Spain["Recovered"]/datewise_Spain["Confirmed"])*100
datewise_restofworld["Recovery"]=(datewise_restofworld["Recovered"]/datewise_restofworld["Confirmed"])*100

# %% [code]
fig, (ax1,ax2) = plt.subplots(1, 2,figsize=(28,10))
ax1.plot(datewise_china["Mortality"],label="Mortality Rate of Mainland China",linewidth=3)
ax1.plot(datewise_Italy["Mortality"],label="Mortality Rate of Italy",linewidth=3)
ax1.plot(datewise_US["Mortality"],label="Mortality Rate of USA",linewidth=3)
ax1.plot(datewise_Spain["Mortality"],label="Mortality Rate of Spain",linewidth=3)
ax1.plot(datewise_restofworld["Mortality"],label="Mortality Rate of Rest of the World",linewidth=3)
ax1.set_ylabel("Mortality Rate")
ax1.set_xlabel("Timestamp")
ax1.set_title("Mortality Rate comparison of Mainland China, Italy, US, Spain and Rest of the World")
ax1.legend()
for tick in ax1.get_xticklabels():
    tick.set_rotation(90)
ax2.plot(datewise_china["Recovery"],label="Recovery Rate of Mainland China",linewidth=3)
ax2.plot(datewise_Italy["Recovery"],label="Recovery Rate Italy",linewidth=3)
ax2.plot(datewise_US["Recovery"],label="Recovery Rate of USA",linewidth=3)
ax2.plot(datewise_Spain["Recovery"],label="Recovery Rate of Spain",linewidth=3)
ax2.plot(datewise_restofworld["Recovery"],label="Recovery Rate Rest of the World",linewidth=3)
ax2.set_ylabel("Recovery Rate")
ax2.set_xlabel("Timestamp")
ax2.set_title("Recovery Rate comparison of Mainland China, Italy, US, Spain and Rest of the World")
ax2.legend()
for tick in ax2.get_xticklabels():
    tick.set_rotation(90)

# %% [markdown]
# #### Taking off Recovery Rate of Spain is a good sign but it's nowhere in comparison to the Moratality Rate. Its alarming sign for USA as Recovery Rate is dropping down with Mortality Rate taking off

# %% [code]
fig, (ax1,ax2) = plt.subplots(1, 2,figsize=(30,12))
ax1.plot(datewise_china["Confirmed"].diff().fillna(0),label='Daily increase in Number of Confiremd Cases (China)',linewidth=3)
ax1.plot(datewise_Italy["Confirmed"].diff().fillna(0),label='Daily increase in Number of Confiremd Cases (Italy)',linewidth=3)
ax1.plot(datewise_US["Confirmed"].diff().fillna(0),label='Daily increase in Number of Confiremd Cases (USA)',linewidth=3)
ax1.plot(datewise_Spain["Confirmed"].diff().fillna(0),label='Daily increase in Number of Confiremd Cases (Spain)',linewidth=3)
ax1.plot(datewise_restofworld["Confirmed"].diff().fillna(0),label='Daily increase in Number of Confiremd Cases (Rest of the World)',linewidth=3)
ax1.set_xlabel('Date')
ax1.set_ylabel("Increase in Number of Confirmed Cases")
ax1.set_title("Daily increase in Confirmed Cases")
ax1.legend()
for tick in ax1.get_xticklabels():
    tick.set_rotation(90)
ax2.plot(datewise_china["Deaths"].diff().fillna(0),label='Daily increase in Number of Death Cases (China)',linewidth=3)
ax2.plot(datewise_Italy["Deaths"].diff().fillna(0),label='Daily increase in Number of Death Cases (Italy)',linewidth=3)
ax2.plot(datewise_US["Deaths"].diff().fillna(0),label='Daily increase in Number of Death Cases (USA)',linewidth=3)
ax2.plot(datewise_Spain["Deaths"].diff().fillna(0),label='Daily increase in Number of Death Cases (Spain)',linewidth=3)
ax2.plot(datewise_restofworld["Deaths"].diff().fillna(0),label='Daily increase in Number of Death Cases (Rest of the World)',linewidth=3)
ax2.set_xlabel('Date')
ax2.set_ylabel("Increase in Number of Death Cases")
ax2.set_title("Daily increase in Death Cases")
ax2.legend()
for tick in ax2.get_xticklabels():
    tick.set_rotation(90)

# %% [markdown]
# #### We can clearly notice the decreasing trend in the number of Daily Confirmed and Death Cases of Spain and Italy. That's really positive sign for both th countries.

# %% [code]


# %% [code]


# %% [raw]
# ## Data Analysis for Poland

# %% [markdown]
# The notebook consists of detailed data analysis specific to Poland, Comparison of Poland's situation with other countries, Comparison with worst affected countries in this pandemic and try and build Machine Learnig Prediction and Time Series and Forecasting models to try and understand the how the numbers are going to be in near future.

# %% [code]
country = "Poland"
poland_data=covid[covid["Country/Region"]==country]
datewise_poland=poland_data.groupby(["ObservationDate"]).agg({"Confirmed":'sum',"Recovered":'sum',"Deaths":'sum'})
print(datewise_poland.iloc[-1])
print("Total Active Cases: ",datewise_poland["Confirmed"].iloc[-1]-datewise_poland["Recovered"].iloc[-1]-datewise_poland["Deaths"].iloc[-1])
print("Total Closed Cases: ",datewise_poland["Recovered"].iloc[-1]+datewise_poland["Deaths"].iloc[-1])

# %% [code]
print("****************************************************************************")
print("Basic Information about the COVID-19 Spread in " +country +".")
print("****************************************************************************")
print(" ")

print("Total number of Confirmed Cases: {:.0f} ".format(datewise_poland["Confirmed"].iloc[-1]))
print("Total number of Recovered Cases: {:.0f}".format(datewise_poland["Recovered"].iloc[-1]))
print("Total number of Deaths Cases: {:.0f}".format(datewise_poland["Deaths"].iloc[-1]))
print("Total number of Active Cases: ",int((datewise_poland["Confirmed"].iloc[-1]-datewise_poland["Recovered"].iloc[-1]-datewise_poland["Deaths"].iloc[-1])))
print("Total number of Closed Cases: ",int(datewise_poland["Recovered"].iloc[-1]+datewise_poland["Deaths"].iloc[-1]))
print("Approximate number of Confirmed Cases per Day: ",int(np.round(datewise_poland["Confirmed"].iloc[-1]/datewise_poland.shape[0])))
print("Approximate number of Recovered Cases per Day: ",int(np.round(datewise_poland["Recovered"].iloc[-1]/datewise_poland.shape[0])))
print("Approximate number of Death Cases per Day: ",int(np.round(datewise_poland["Deaths"].iloc[-1]/datewise_poland.shape[0])))
print("Approximate number of Confirmed Cases per hour: ",int(np.round(datewise_poland["Confirmed"].iloc[-1]/((datewise.shape[0])*24))))
print("Approximate number of Recovered Cases per hour: ",int(np.round(datewise_poland["Recovered"].iloc[-1]/((datewise_poland.shape[0])*24))))
print("Approximate number of Death Cases per hour: ",int(np.round(datewise_poland["Deaths"].iloc[-1]/((datewise_poland.shape[0])*24))))
print("****************************************************************************")
print(" ")
print("Acknowledgements:")
print("Thanks to the WHO and Johns Hopkins University for making the ")
print("data available for educational and academic research purposes - Jair Ribeiro")

# %% [code]
fig, (ax1,ax2) = plt.subplots(2, 1,figsize=(25,20))
ax1.plot(datewise_poland["Confirmed"],marker='o',label="Confirmed Cases")
ax1.plot(datewise_poland["Recovered"],marker='*',label="Recovered Cases")
ax1.plot(datewise_poland["Deaths"],marker='^',label="Death Cases")
ax1.set_ylabel("Number of Patients")
ax1.set_xlabel("Date")
ax1.legend()
ax1.set_title("Growth Rate Plot for different Types of cases in Poland")
for tick in ax1.get_xticklabels():
    tick.set_rotation(90)
sns.barplot(datewise_poland.index.date,datewise_poland["Confirmed"]-datewise_poland["Recovered"]-datewise_poland["Deaths"],ax=ax2)
ax2.set_xlabel("Date")
ax2.set_ylabel("Number of Active Cases")
ax2.set_title("Distribution of Number of Active Cases over Date")
for tick in ax2.get_xticklabels():
    tick.set_rotation(90)

# %% [code]
poland_increase_confirm=[]
poland_increase_recover=[]
poland_increase_deaths=[]
for i in range(datewise_poland.shape[0]-1):
    poland_increase_confirm.append(((datewise_poland["Confirmed"].iloc[i+1])/datewise_poland["Confirmed"].iloc[i]))
    poland_increase_recover.append(((datewise_poland["Recovered"].iloc[i+1])/datewise_poland["Recovered"].iloc[i]))
    poland_increase_deaths.append(((datewise_poland["Deaths"].iloc[i+1])/datewise_poland["Deaths"].iloc[i]))
poland_increase_confirm.insert(0,1)
poland_increase_recover.insert(0,1)
poland_increase_deaths.insert(0,1)

plt.figure(figsize=(15,7))
plt.plot(datewise_poland.index,poland_increase_confirm,label="Growth Factor of Confirmed Cases",linewidth=3)
plt.plot(datewise_poland.index,poland_increase_recover,label="Growth Factor of Recovered Cases",linewidth=3)
plt.plot(datewise_poland.index,poland_increase_deaths,label="Growth Factor of Death Cases",linewidth=3)
plt.axhline(1,linestyle='--',color="black",label="Baseline")
plt.xticks(rotation=90)
plt.title("Datewise Growth Factor of different Types of Cases")
plt.ylabel("Growth Rate")
plt.xlabel("Date")
plt.legend()

# %% [code]
plt.figure(figsize=(25,6))
plt.plot(datewise_poland["Confirmed"].diff().fillna(0),linewidth=3)
plt.plot(datewise_poland["Recovered"].diff().fillna(0),linewidth=3)
plt.plot(datewise_poland["Deaths"].diff().fillna(0),linewidth=3)
plt.ylabel("Number of Confirmed Cases")
plt.xlabel("Date")
plt.title("Daily increase in Number of Confirmed Cases in Poland")
plt.xticks(rotation=90)

# %% [code]
datewise_poland["WeekOfYear"]=datewise_poland.index.weekofyear

week_num_poland=[]
poland_weekwise_confirmed=[]
poland_weekwise_recovered=[]
poland_weekwise_deaths=[]
w=1
for i in list(datewise_poland["WeekOfYear"].unique()):
    poland_weekwise_confirmed.append(datewise_poland[datewise_poland["WeekOfYear"]==i]["Confirmed"].iloc[-1])
    poland_weekwise_recovered.append(datewise_poland[datewise_poland["WeekOfYear"]==i]["Recovered"].iloc[-1])
    poland_weekwise_deaths.append(datewise_poland[datewise_poland["WeekOfYear"]==i]["Deaths"].iloc[-1])
    week_num_poland.append(w)
    w=w+1
    
plt.figure(figsize=(10,5))
plt.plot(week_num_poland,poland_weekwise_confirmed,linewidth=3,label="Weekly Growth of Confirmed Cases")
plt.plot(week_num_poland,poland_weekwise_recovered,linewidth=3,label="Weekly Growth of Recovered Cases")
plt.plot(week_num_poland,poland_weekwise_deaths,linewidth=3,label="Weekly Growth of Death Cases")
plt.xlabel('Week Number')
plt.ylabel("Number of Cases")
plt.title("Weekly Growth of different types of Cases in Poland")
plt.legend()

# %% [code]
fig, (ax1,ax2) = plt.subplots(1, 2,figsize=(20,5))
sns.barplot(x=week_num_poland,y=pd.Series(poland_weekwise_confirmed).diff().fillna(0),ax=ax1)
sns.barplot(x=week_num_poland,y=pd.Series(poland_weekwise_deaths).diff().fillna(0),ax=ax2)
ax1.set_xlabel("Week Number")
ax2.set_xlabel("Week Number")
ax1.set_ylabel("Number of Confirmed Cases")
ax2.set_ylabel("Number of Death Cases")
ax1.set_title("Poland's Weekwise increase in Number of Confirmed Cases")
ax2.set_title("Poland's Weekwise increase in Number of Death Cases")

# %% [code]
max_ind=datewise_poland["Confirmed"].max()
plt.figure(figsize=(12,6))
plt.plot(datewise_Italy[(datewise_Italy["Confirmed"]>0)&(datewise_Italy["Confirmed"]<=max_ind)]["Confirmed"],label="Confirmed Cases Italy",linewidth=3)
plt.plot(datewise_US[(datewise_US["Confirmed"]>0)&(datewise_US["Confirmed"]<=max_ind)]["Confirmed"],label="Confirmed Cases USA",linewidth=3)
plt.plot(datewise_Spain[(datewise_Spain["Confirmed"]>0)&(datewise_Spain["Confirmed"]<=max_ind)]["Confirmed"],label="Confirmed Cases Spain",linewidth=3)
plt.plot(datewise_poland[datewise_poland["Confirmed"]>0]["Confirmed"],label="Confirmed Cases Poland",linewidth=3)
plt.xlabel("Date")
plt.ylabel("Number of Confirmed Cases")
plt.title("Growth of Confirmed Cases")
plt.legend()
plt.xticks(rotation=90)

print("It took",datewise_Italy[(datewise_Italy["Confirmed"]>0)&(datewise_Italy["Confirmed"]<=max_ind)].shape[0],"days in Italy to reach number of Confirmed Cases equivalent to Poland")
print("It took",datewise_US[(datewise_US["Confirmed"]>0)&(datewise_US["Confirmed"]<=max_ind)].shape[0],"days in USA to reach number of Confirmed Cases equivalent to Poland")
print("It took",datewise_Spain[(datewise_Spain["Confirmed"]>0)&(datewise_Spain["Confirmed"]<=max_ind)].shape[0],"days in Spain to reach number of Confirmed Cases equivalent to Poland")
print("It took",datewise_poland[datewise_poland["Confirmed"]>0].shape[0],"days in Poland to reach",max_ind,"Confirmed Cases")

# %% [markdown]
# #### Comparison of Daily Increase in Number of Cases of Italy, Spain, USA and Poland, where maximum number of Confirmed Cases are equivalent to maximum number of Confirmed Cases in Poland

# %% [code]
plt.figure(figsize=(20,6))
plt.plot(datewise_Italy[(datewise_Italy["Confirmed"]>0)&(datewise_Italy["Confirmed"]<=max_ind)]["Confirmed"].diff().fillna(0),label="Daily increase number of Cases in Italy",linewidth=3)
plt.plot(datewise_Spain[(datewise_Spain["Confirmed"]>0)&(datewise_Spain["Confirmed"]<=max_ind)]["Confirmed"].diff().fillna(0),label="Daily increase number of Cases in Spain",linewidth=3)
plt.plot(datewise_US[(datewise_US["Confirmed"]>0)&(datewise_US["Confirmed"]<=max_ind)]["Confirmed"].diff().fillna(0),label="Daily increase number of Cases in US",linewidth=3)
plt.plot(datewise_poland[datewise_poland["Confirmed"]>0]["Confirmed"].diff().fillna(0),label="Confirmed Cases Poland",linewidth=3)
plt.ylabel("Number of Cases")
plt.xlabel("Date")
plt.title("Daily increase in Confirmed Cases")
plt.legend()
plt.xticks(rotation=90)

# %% [markdown]
# #### Videos related to COVID-19 Pandemic in India
# Wuhan Coronavirus: WION breaks down the growing numbers | Gravitas: 
# https://www.youtube.com/watch?v=xqAPDD8sw-g

# %% [markdown]
# ## Prediction using Machine Learning Models

# %% [markdown]
# #### Linear Regression Model for Confirm Cases Prediction

# %% [code]
poland_data=covid[covid["Country/Region"]=="Brazil"]
datewise_poland=poland_data.groupby(["ObservationDate"]).agg({"Confirmed":'sum',"Recovered":'sum',"Deaths":'sum'})
print(datewise_poland.iloc[-1])
print("Total Active Cases: ",datewise_poland["Confirmed"].iloc[-1]-datewise_poland["Recovered"].iloc[-1]-datewise_poland["Deaths"].iloc[-1])
print("Total Closed Cases: ",datewise_poland["Recovered"].iloc[-1]+datewise_poland["Deaths"].iloc[-1])

# %% [code]
datewise = datewise_poland
datewise["Days Since"]=datewise.index-datewise.index[0]
datewise["Days Since"]=datewise["Days Since"].dt.days

# %% [code]
train_ml=datewise.iloc[:int(datewise.shape[0]*0.90)]
valid_ml=datewise.iloc[int(datewise.shape[0]*0.90):]
model_scores=[]

# %% [code]
lin_reg=LinearRegression(normalize=True)

# %% [code]
lin_reg.fit(np.array(train_ml["Days Since"]).reshape(-1,1),np.array(train_ml["Confirmed"]).reshape(-1,1))

# %% [code]
prediction_valid_linreg=lin_reg.predict(np.array(valid_ml["Days Since"]).reshape(-1,1))

# %% [code]
model_scores.append(np.sqrt(mean_squared_error(valid_ml["Confirmed"],prediction_valid_linreg)))
print("Root Mean Square Error for Linear Regression: ",np.sqrt(mean_squared_error(valid_ml["Confirmed"],prediction_valid_linreg)))

# %% [code]
plt.figure(figsize=(11,6))
prediction_linreg=lin_reg.predict(np.array(datewise["Days Since"]).reshape(-1,1))
plt.plot(datewise["Confirmed"],label="Actual Confirmed Cases")
plt.plot(datewise.index,prediction_linreg, linestyle='--',label="Predicted Confirmed Cases using Linear Regression",color='black')
plt.xlabel('Time')
plt.ylabel('Confirmed Cases')
plt.title("Confirmed Cases Linear Regression Prediction")
plt.xticks(rotation=90)
plt.legend()

# %% [markdown]
# #### The Linear Regression Model seems to be really falling aprat. As it is clearly visible that the trend of Confirmed Cases in not at all Linear

# %% [markdown]
# #### Support Vector Machine ModelRegressor for Prediction of Confirmed Cases

# %% [code]
#Intializing SVR Model and with hyperparameters for GridSearchCV
svm=SVR(C=1,degree=6,kernel='poly',epsilon=0.01)

# %% [code]
#Performing GridSearchCV to find the Best Estimator
svm.fit(np.array(train_ml["Days Since"]).reshape(-1,1),np.array(train_ml["Confirmed"]).reshape(-1,1))

# %% [code]
prediction_valid_svm=svm.predict(np.array(valid_ml["Days Since"]).reshape(-1,1))

# %% [code]
model_scores.append(np.sqrt(mean_squared_error(valid_ml["Confirmed"],prediction_valid_svm)))
print("Root Mean Square Error for Support Vectore Machine: ",np.sqrt(mean_squared_error(valid_ml["Confirmed"],prediction_valid_svm)))

# %% [code]
plt.figure(figsize=(11,6))
prediction_svm=svm.predict(np.array(datewise["Days Since"]).reshape(-1,1))
plt.plot(datewise["Confirmed"],label="Train Confirmed Cases",linewidth=3)
plt.plot(datewise.index,prediction_svm, linestyle='--',label="Best Fit for SVR",color='black')
plt.xlabel('Time')
plt.ylabel('Confirmed Cases')
plt.title("Confirmed Cases Support Vector Machine Regressor Prediction")
plt.xticks(rotation=90)
plt.legend()

# %% [code]
new_date=[]
new_prediction_lr=[]
new_prediction_svm=[]
for i in range(1,18):
    new_date.append(datewise.index[-1]+timedelta(days=i))
    new_prediction_lr.append(lin_reg.predict(np.array(datewise["Days Since"].max()+i).reshape(-1,1))[0][0])
    new_prediction_svm.append(svm.predict(np.array(datewise["Days Since"].max()+i).reshape(-1,1))[0])

# %% [code]
pd.set_option('precision', 0)
pd.options.display.float_format = '{:,.0f}'.format
pd.set_option('display.float_format', lambda x: '%.6f' % x)
model_predictions=pd.DataFrame(zip(new_date,new_prediction_lr,new_prediction_svm),columns=["Dates","Linear Regression Prediction","SVM Prediction"])
#model_predictions.head()

# %% [code]
df = pd.DataFrame(zip(new_date,new_prediction_lr,new_prediction_svm),columns=["Dates","Linear Regression Prediction","SVM Prediction"])

df['Dates'] = pd.to_datetime(df['Dates'])
pd.options.display.float_format = '{:,.0f}'.format
pd.set_option('precision', 0)

#df = df[(df['yhat']>0)]
df.rename(columns={'Linear Regression Prediction': 'LRP', 'SVM Prediction': 'SVM'}, inplace=True)
df

# %% [markdown]
# #### Predictions of Linear Regression are nowhere close to actual numbers

# %% [markdown]
# ## Time Series Forecasting

# %% [markdown]
# #### Holt's Linear Model

# %% [code]
model_train=datewise.iloc[:int(datewise.shape[0]*0.90)]
valid=datewise.iloc[int(datewise.shape[0]*0.90):]

# %% [code]
holt=Holt(np.asarray(model_train["Confirmed"])).fit(smoothing_level=1.3, smoothing_slope=0.9)
y_pred=valid.copy()

# %% [code]
y_pred["Holt"]=holt.forecast(len(valid))
model_scores.append(np.sqrt(mean_squared_error(y_pred["Confirmed"],y_pred["Holt"])))
print("Root Mean Square Error Holt's Linear Model: ",np.sqrt(mean_squared_error(y_pred["Confirmed"],y_pred["Holt"])))

# %% [code]
plt.figure(figsize=(10,5))
plt.plot(model_train.Confirmed,label="Train Set",marker='o')
valid.Confirmed.plot(label="Validation Set",marker='*')
y_pred.Holt.plot(label="Holt's Linear Model Predicted Set",marker='^')
plt.ylabel("Confirmed Cases")
plt.xlabel("Date Time")
plt.title("Confirmed Holt's Linear Model Prediction")
plt.xticks(rotation=90)
plt.legend()

# %% [code]
holt_new_date=[]
holt_new_prediction=[]
for i in range(1,18):
    holt_new_date.append(datewise.index[-1]+timedelta(days=i))
    holt_new_prediction.append(holt.forecast((len(valid)+i))[-1])

model_predictions["Holts Linear Model Prediction"]=holt_new_prediction
model_predictions.head()

# %% [code]
df = pd.DataFrame(model_predictions,columns=["Dates","Linear Regression Prediction","SVM Prediction","Holts Linear Model Prediction"])

df['Dates'] = pd.to_datetime(df['Dates'])
pd.options.display.float_format = '{:,.0f}'.format
pd.set_option('precision', 0)

#df = df[(df['yhat']>0)]
df.rename(columns={'Linear Regression Prediction': 'LRP', 'SVM Prediction': 'SVM','Holts Linear Model Prediction': 'Holts'}, inplace=True)
df

# %% [markdown]
# #### Holt's Winter Model for Daily Time Series

# %% [code]
model_train=datewise.iloc[:int(datewise.shape[0]*0.90)]
valid=datewise.iloc[int(datewise.shape[0]*0.90):]
y_pred=valid.copy()

# %% [code]
es=ExponentialSmoothing(np.asarray(model_train['Confirmed']),seasonal_periods=5,trend='add', seasonal='add').fit()

# %% [code]
y_pred["Holt's Winter Model"]=es.forecast(len(valid))

# %% [code]
model_scores.append(np.sqrt(mean_squared_error(y_pred["Confirmed"],y_pred["Holt's Winter Model"])))
print("Root Mean Square Error for Holt's Winter Model: ",np.sqrt(mean_squared_error(y_pred["Confirmed"],y_pred["Holt's Winter Model"])))

# %% [code]


# %% [code]
plt.figure(figsize=(10,5))
plt.plot(model_train.Confirmed,label="Train Set",marker='o')
valid.Confirmed.plot(label="Validation Set",marker='*')
y_pred["Holt\'s Winter Model"].plot(label="Holt's Winter Model Predicted Set",marker='^')
plt.ylabel("Confirmed Cases")
plt.xlabel("Date Time")
plt.title("Confiremd Cases Holt's Winter Model Prediction")
plt.xticks(rotation=90)
plt.legend()

# %% [code]
holt_winter_new_prediction=[]
for i in range(1,18):
    holt_winter_new_prediction.append(es.forecast((len(valid)+i))[-1])
model_predictions["Holts Winter Model Prediction"]=holt_winter_new_prediction
model_predictions.head()
model_predictions

# %% [code]
df = pd.DataFrame(model_predictions,columns=["Dates","Linear Regression Prediction","SVM Prediction"\
                                             ,"Holts Linear Model Prediction","Holts Winter Model Prediction"])

df['Dates'] = pd.to_datetime(df['Dates'])
pd.options.display.float_format = '{:,.0f}'.format
pd.set_option('precision', 0)

#df = df[(df['yhat']>0)]
df.rename(columns={'Linear Regression Prediction': 'LRP', 'SVM Prediction': 'SVM','Holts Linear Model Prediction': 'HLM'\
,'Holts Winter Model Prediction': 'HWM'}, inplace=True)
df

# %% [code]
y_pred["Holt\'s Winter Model"].head()

# %% [code]
model_train=datewise.iloc[:int(datewise.shape[0]*0.90)]
valid=datewise.iloc[int(datewise.shape[0]*0.90):]
y_pred=valid.copy()

# %% [code]
from pandas.plotting import autocorrelation_plot
plt.figure(figsize=(10, 5))
autocorrelation_plot(datewise["Confirmed"])

# %% [code]
#fig, (ax1,ax2,ax3) = plt.subplots(3, 1,figsize=(11,7))
#import statsmodels.api as sm
#results=sm.tsa.seasonal_decompose(model_train["Confirmed"])
#ax1.plot(results.trend)
#ax2.plot(results.seasonal)
#ax3.plot(results.resid)

# %% [code]
print("Results of Dickey-Fuller test for Original Time Series")
dftest = adfuller(model_train["Confirmed"], autolag='AIC')
dfoutput = pd.Series(dftest[0:4], index=['Test Statistic','p-value','#Lags Used','Number of Observations Used'])
for key,value in dftest[4].items():
    dfoutput['Critical Value (%s)'%key] = value
print(dfoutput)

# %% [code]
log_series=np.log(model_train["Confirmed"])

# %% [markdown]
# ### AR Model

# %% [markdown]
# ### MA Model

# %% [markdown]
# ### ARIMA Model

# %% [markdown]
# ### Facebook's Prophet Model for forecasting

# %% [code]
prophet_c=Prophet(interval_width=0.95,weekly_seasonality=True,)
prophet_confirmed=pd.DataFrame(zip(list(datewise.index),list(datewise["Confirmed"])),columns=['ds','y'])

# %% [code]
prophet_c.fit(prophet_confirmed)

# %% [code]
forecast_c=prophet_c.make_future_dataframe(periods=17)
forecast_confirmed=forecast_c.copy()

# %% [code]
confirmed_forecast=prophet_c.predict(forecast_c)
#print(confirmed_forecast[['ds','yhat', 'yhat_lower', 'yhat_upper']])

# %% [code]
model_scores.append(np.sqrt(mean_squared_error(datewise["Confirmed"],confirmed_forecast['yhat'].head(datewise.shape[0]))))
print("Root Mean Squared Error for Prophet Model: ",np.sqrt(mean_squared_error(datewise["Confirmed"],confirmed_forecast['yhat'].head(datewise.shape[0]))))

# %% [code]
print(prophet_c.plot(confirmed_forecast))

# %% [code]
print(prophet_c.plot_components(confirmed_forecast))

# %% [markdown]
# #### Summarization of Forecasts using different Models

# %% [code]
model_names=["Linear Regression","Support Vector Machine Regressor","Holt's Linear","Holt's Winter Model",
            "Auto Regressive Model (AR)","Moving Average Model (MA)","ARIMA Model","Facebook's Prophet Model"]
pd.DataFrame(zip(model_names,model_scores),columns=["Model Name","Root Mean Squared Error"]).sort_values(["Root Mean Squared Error"])
print(datewise_poland.iloc[-1])

# %% [code]
model_predictions["Prophet's Prediction"]=list(confirmed_forecast["yhat"].tail(17))
model_predictions["Prophet's Upper Bound"]=list(confirmed_forecast["yhat_upper"].tail(17))
#model_predictions.head()

# %% [code]
df = pd.DataFrame(model_predictions,columns=["Dates","Linear Regression Prediction","SVM Prediction"\
                                             ,"Holts Linear Model Prediction","Holts Winter Model Prediction",\
                                             "AR Model Prediction","MA Model Prediction","Prophet's Prediction","Prophet's Upper Bound"])

df['Dates'] = pd.to_datetime(df['Dates'])
pd.options.display.float_format = '{:,.0f}'.format
pd.set_option('precision', 0)

#df = df[(df['yhat']>0)]
df.rename(columns={'Linear Regression Prediction': 'LRP', 'SVM Prediction': 'SVM','Holts Linear Model Prediction': 'HLM'\
,'Holts Winter Model Prediction': 'HWM','AR Model Prediction': 'ARIMA',\
                   'MA Model Prediction': 'MA','Prophet\'s Prediction': 'Prophet','Prophet\'s Upper Bound': 'PUB'}, inplace=True)
forecast_table = df

df

# %% [code]
start_date = "2020-04-26"
end_date = "2020-04-26"
df = forecast_table
after_start_date = df["Dates"] >= start_date
before_end_date = df["Dates"] <= end_date
between_two_dates = after_start_date & before_end_date
filtered_dates = df.loc[between_two_dates]

filtered_dates

# %% [code]
preview = df.loc[between_two_dates].iloc[-1]
preview

# %% [code]
raw_data = {'Dates': [filtered_dates.Dates],'LRP': [filtered_dates.LRP],'SVM': [filtered_dates.SVM]} 

df_forecast = pd.DataFrame(raw_data, columns = ['Dates', 'LRP' , 'SVM'])
#df_forecast 

# %% [code]
confirmed = int(datewise_poland["Confirmed"].iloc[-1])

myList = preview

myNumber = int(datewise_poland["Confirmed"].iloc[-1])
 
from bisect import bisect_left

def take_closest(myList, myNumber):
    """
    Assumes myList is sorted. Returns closest value to myNumber.

    If two numbers are equally close, return the smallest number.
    """
    pos = bisect_left(myList, myNumber)
    if pos == 0:
        return myList[0]
    if pos == len(myList):
        return myList[-1]
    before = myList[pos - 1]
    after = myList[pos]
    if after - myNumber < myNumber - before:
       return after
    else:
       return before


take_closest(myList, myNumber)

print("Real Number of cases: " + str(confirmed))

print("Closest Prediction:   " + str(int(take_closest(myList, myNumber))))


# %% [markdown]
# ### Time Series Forecasting for Death Cases

# %% [code]
plt.figure(figsize=(20,5))
plt.plot(datewise["Deaths"],marker='o')
plt.ylabel("Death Cases")
plt.xlabel("Datetime")
plt.title("Plot for Death Cases")
plt.xticks(rotation=90)

# %% [code]
model_train=datewise.iloc[:int(datewise.shape[0]*0.85)]
valid=datewise.iloc[int(datewise.shape[0]*0.85):]
y_pred=valid.copy()

# %% [code]
print("Results of Dickey-Fuller test for Original Time Series")
dftest = adfuller(model_train["Deaths"], autolag='AIC')
dfoutput = pd.Series(dftest[0:4], index=['Test Statistic','p-value','#Lags Used','Number of Observations Used'])
for key,value in dftest[4].items():
    dfoutput['Critical Value (%s)'%key] = value
print(dfoutput)

# %% [code]
log_deaths=np.log(model_train["Deaths"])
print("Results of Dickey-Fuller test for Original Time Series")
dftest = adfuller((log_deaths.diff().diff().diff()).dropna(), autolag='AIC')
dfoutput = pd.Series(dftest[0:4], index=['Test Statistic','p-value','#Lags Used','Number of Observations Used'])
for key,value in dftest[4].items():
    dfoutput['Critical Value (%s)'%key] = value
print(dfoutput)

# %% [code]
series=(log_deaths.diff().diff().diff()).dropna()
plt.figure(figsize=(8,4))
plt.plot(series)
plt.title("Stationary Series plot")
plt.xticks(rotation=90)

# %% [code]
plot_acf(series)
plt.show()
plot_pacf(series)
plt.show()

# %% [code]
#model_arima_deaths=ARIMA(log_deaths,(5,2,1))     
#model_arima_deaths_fit=model_arima_deaths.fit()

# %% [code]
#predictions_deaths=np.exp(model_arima_deaths_fit.forecast(len(valid))[0])
#y_pred["ARIMA Death Prediction"]=predictions_deaths

# %% [code]
#print("Root Mean Square Error: ",np.sqrt(mean_squared_error(valid["Deaths"],predictions_deaths)))

# %% [code]
#plt.figure(figsize=(10,5))
#plt.plot(model_train["Deaths"],label="Train Set",marker="o")
#plt.plot(valid["Deaths"],label="Validation Set",marker="*")
#plt.plot(y_pred["ARIMA Death Prediction"],label="ARIMA Prediction Set",marker="^")
#plt.legend()
#plt.xlabel("Date")
#plt.ylabel("Number of Death Cases")
#plt.title("Death Cases Forecasting using ARIMA Model")
#plt.xticks(rotation=90)

# %% [code]
#ARIMA_model_death_forecast=[]
#for i in range(1,18):
#    ARIMA_model_death_forecast.append(np.exp(model_arima_deaths_fit.forecast(len(valid)+i)[0][-1]))

# %% [code]
#pd.DataFrame(zip(new_date,ARIMA_model_death_forecast),columns=["Deaths","ARIMA Model Death Forecast"]).head()

# %% [markdown]
# Poland Datawise Analysis

# %% [code]


# %% [markdown]
# ### Facebook's Prophet Model for Deaths forecasting

# %% [code]
prophet_c=Prophet(interval_width=0.95,weekly_seasonality=True,)
prophet_deaths=pd.DataFrame(zip(list(datewise.index),list(datewise["Deaths"])),columns=['ds','y'])

# %% [code]
prophet_c.fit(prophet_deaths)

# %% [code]
forecast_c=prophet_c.make_future_dataframe(periods=3)
forecast_deaths=forecast_c.copy()

# %% [code]
deaths_forecast=prophet_c.predict(forecast_c)
forecast_preview = deaths_forecast[['ds','yhat']]
#print(deaths_forecast[['ds','yhat']])

# %% [code]

#pd.DataFrame(forecast_preview, columns = ['ds','yhat'] ).style.background_gradient(cmap='Blues')

# %% [markdown]
# **Prepare dataframe for exibition**

# %% [code]
pd.set_option('precision', 0)
df = pd.DataFrame(forecast_preview)

df['ds'] = pd.to_datetime(df['ds'])
pd.options.display.float_format = '{:,.0f}'.format

df = df[(df['yhat']>0)]

df

# %% [code]
plt.figure(figsize=(15,5))
sns.lineplot(x=forecast_preview.ds, y=forecast_preview.yhat)
plt.title("Distribution Plot for Closed Cases Cases over Date")
plt.xticks(rotation=90)

# %% [code]
model_scores.append(np.sqrt(mean_squared_error(datewise["Deaths"],deaths_forecast['yhat'].head(datewise.shape[0]))))
print("Root Mean Squared Error for Prophet Model: ",np.sqrt(mean_squared_error(datewise["Deaths"],deaths_forecast['yhat'].head(datewise.shape[0]))))

# %% [code]
print(prophet_c.plot(deaths_forecast))

# %% [code]
print(prophet_c.plot_components(deaths_forecast))