**Corona Viruses**

The 2019–20 coronavirus pandemic is an ongoing pandemic of coronavirus disease 2019 (COVID-19) caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The virus was first reported in Wuhan, Hubei, China on 17 November 2019,[1] and on 11 March 2020, the World Health Organization (WHO) declared the outbreak a pandemic.[4] As of 13 March 2020, over 144,000 cases have been confirmed in more than 130 countries and territories, with major outbreaks in mainland China, Italy, South Korea, and Iran.[2] As of 13 March, at least 5,300 people have died from the disease and more than 70,900 have recovered.[2]

A novel coronavirus (nCoV) is a new strain that has not been previously identified in humans.

*** Source - Wikipedia

Latest status - https://en.wikipedia.org/wiki/2019%E2%80%9320_coronavirus_pandemic#/media/File:COVID-19-outbreak-timeline.gif


**This notebook is mainly drives for a timeseries forecasting for confirmed, death and recovered cases.**

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load in 

import numpy as np
import pandas as pd 
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import plotly.express as px
import plotly.graph_objs as go
from plotly.subplots import make_subplots
from fbprophet.plot import plot_plotly, add_changepoints_to_plot
import plotly.offline as py
from datetime import date, timedelta
from statsmodels.tsa.arima_model import ARIMA
from sklearn.cluster import KMeans
from fbprophet import Prophet

# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# Any results you write to the current directory are saved as output.

In [None]:
covid19_df=pd.read_csv("/kaggle/input/novel-corona-virus-2019-dataset/covid_19_data.csv")
covid19_df = covid19_df.rename(columns={"ObservationDate": "date","Country/Region": "country", "Province/State": "state", "Confirmed":"confirm", "Deaths": "death","Recovered":"recover"})
covid19_df.head()


In [None]:
covid19_df.shape

** Exploratory Data Analysis ******

In [None]:
covid19_df.isnull().sum()

Get latest data set as the data is cumulative

In [None]:
daily_df = covid19_df.sort_values(['date', 'country', 'state'])
latest_data = covid19_df[covid19_df.date == daily_df.date.max()]
latest_data.sample(10)

Rename columns select desired column list for the latest data set

In [None]:
columns_list = ["state", "country", "date", "confirm", "death", "recover"]
latest_data = covid19_df[columns_list]
latest_data.sample(10)

In [None]:
latest_data_groupby_country = latest_data.groupby("country")[["confirm", "death", "recover"]].sum().reset_index()
latest_data_groupby_country.sample(5)

Bar plot of confirmed cases on contry level

In [None]:
fig = px.bar(latest_data_groupby_country, 
             y="confirm", x="country", color='country', 
             hover_data = ['confirm', 'death', 'recover'],
             log_y=True, template='ggplot2')
annotations = []
annotations.append(dict(xref='paper', yref='paper', x=0.0, y=1.05,
                              xanchor='left', yanchor='bottom',
                              text='Confirmed bar plot on Country',
                              font=dict(family='Arial',
                                        size=30,
                                        color='rgb(37,37,37)'),
                              showarrow=False))
fig.update_layout(annotations=annotations)
fig.show()

Bar plot of confirmed cases on a daily level for China

In [None]:
covid19_df['country'].unique()

In [None]:
fig = px.bar(covid19_df.loc[covid19_df['country'] == 'Mainland China'], x='date', y='confirm', 
             hover_data=['state', 'confirm', 'recover'], color='state', template='ggplot2')
annotations = []
annotations.append(dict(xref='paper', yref='paper', x=0.0, y=1.05,
                              xanchor='left', yanchor='bottom',
                              text='Confirmed bar plot for Mainland China over time',
                              font=dict(family='Arial',
                                        size=30,
                                        color='rgb(37,37,37)'),
                              showarrow=False))
fig.update_layout(annotations=annotations)
fig.show()

Bar plot for recovered cases over country

In [None]:
fig = px.bar(latest_data_groupby_country, 
             y="recover", x="country", color='country', 
             hover_data = ['confirm', 'death', 'recover'],
             log_y=True, template='ggplot2')
annotations = []
annotations.append(dict(xref='paper', yref='paper', x=0.0, y=1.05,
                              xanchor='left', yanchor='bottom',
                              text='Recovered bar plot on Country',
                              font=dict(family='Arial',
                                        size=30,
                                        color='rgb(37,37,37)'),
                              showarrow=False))
fig.update_layout(annotations=annotations)
fig.show()

Bar plot for death on country level

In [None]:
fig = px.bar(latest_data_groupby_country, 
             y="death", x="country", color='country', 
             hover_data = ['confirm', 'death', 'recover'],
             log_y=True, template='ggplot2')
annotations = []
annotations.append(dict(xref='paper', yref='paper', x=0.0, y=1.05,
                              xanchor='left', yanchor='bottom',
                              text='Death bar plot on Country',
                              font=dict(family='Arial',
                                        size=30,
                                        color='rgb(37,37,37)'),
                              showarrow=False))
fig.update_layout(annotations=annotations)
fig.show()

** EDA for Germany**

In [None]:
covid19_df.head()

In [None]:
de_data = covid19_df[covid19_df['country'] == 'Germany']
de_data.tail(5)

Time series analysis using --- Facebook Prophet on confirmed cases for Germany

Load pandas DF to prophet DF. As the documentation says - the input to Prophet is always a dataframe with two columns: ds and y. The ds (datestamp) column should be of a format expected by Pandas, ideally YYYY-MM-DD for a date or YYYY-MM-DD HH:MM:SS for a timestamp. The y column must be numeric, and represents the measurement we wish to forecast. Hence adjusted the prophet data frame accordingly.

In [None]:
prophet_de_confirmed=de_data.iloc[: , [4,5 ]]
prophet_de_confirmed.head()
prophet_de_confirmed.columns = ['ds','y']
prophet_de_confirmed.head()

We fit the model by instantiating a new Prophet object. Any settings to the forecasting procedure are passed into the constructor. Then we call its fit method and pass in the historical dataframe.
Predictions are then made on a dataframe with a column ds containing the dates for which a prediction is to be made. We can get a suitable dataframe that extends into the future a specified number of days using the helper method Prophet.make_future_dataframe. By default it will also include the dates from the history, so we will see the model fit as well.

In [None]:
model_de_confirmed = Prophet()
model_de_confirmed.fit(prophet_de_confirmed)
future_de_confirmed = model_de_confirmed.make_future_dataframe(periods=365)
future_de_confirmed.sample(10)

The predict method will assign each row in future a predicted value which it names that. If we pass in historical dates, it will provide an in-sample fit. The forecast object here is a new dataframe that includes a column that with the forecast, as well as columns for components and uncertainty intervals.

In [None]:
forecast_de_confirmed=model_de_confirmed.predict(future_de_confirmed)
forecast_de_confirmed.sample(5)

We can now plot the forecast by calling the Prophet.plot method and passing in forecast dataframe.



In [None]:
figure_de_confirmed = model_de_confirmed.plot(forecast_de_confirmed)

To see the forecast components, we can use the Prophet.plot_components method. By default we’ll see the trend, yearly seasonality, and weekly seasonality of the time series.

In [None]:
figure_de_confirmed_2 = model_de_confirmed.plot_components(forecast_de_confirmed)

An interactive figure of the forecast can be created with plotly.

In [None]:
py.init_notebook_mode()

figure_de_confirmed_2 = plot_plotly(model_de_confirmed, forecast_de_confirmed)  # This returns a plotly Figure
py.iplot(figure_de_confirmed_2)

** Forecast for recovered cases **

In [None]:
prophet_de_recover=covid19_df.iloc[: , [4,7 ]]
prophet_de_recover.head()
prophet_de_recover.columns = ['ds','y']
prophet_de_recover.tail()

In [None]:
model_de_recover=Prophet()
model_de_recover.fit(prophet_de_recover)
future_de_recover=model_de_recover.make_future_dataframe(periods=365)
forecast_de_recover=model_de_recover.predict(prophet_de_recover)
forecast_de_recover.sample(5)

In [None]:
figure_de_recover_1 = plot_plotly(model_de_recover, forecast_de_recover)
py.iplot(figure_de_recover_1) 

figure_de_recover_2 = model_de_recover.plot(forecast_de_recover,xlabel='Date',ylabel='Recovery Count')

In [None]:
figure_de_recover_3=model_de_recover.plot_components(forecast_de_recover)

** Forecasting for Death**


In [None]:
prophet_de_death = covid19_df.iloc[:, [4, 6]]
prophet_de_death.columns=['ds', 'y']
prophet_de_death.tail(5)

In [None]:
model_de_death=Prophet()
model_de_death.fit(prophet_de_death)
future_de_death=model_de_death.make_future_dataframe(periods=365)
forecast_de_death=model_de_death.predict(future_de_death)
forecast_de_death.sample(5)

In [None]:
figure_de_death_1 = plot_plotly(model_de_death, forecast_dth)
py.iplot(figure_de_death_1) 
    
figure_de_death_2 = model_de_death.plot(forecast_de_death,xlabel='Date',ylabel='Death Count')

In [None]:
figure_de_recover_3 = model_de_death.plot_components(forecast_de_death)