What is Coronavirus?

Researchers first isolated a coronavirus in 1937. They found a coronavirus responsible for an infectious bronchitis virus in birds that had the ability to devastate poultry stocks. Scientists first found evidence of human coronaviruses (HCoV) in the 1960s in the noses of people with the common cold. Two human coronaviruses are responsible for a large proportion of common colds: OC43 and 229E. The name “coronavirus” comes from the crown-like projections on their surfaces. “Corona” in Latin means “halo” or “crown.” Among humans, coronavirus infections most often occur during the winter months and early spring. People regularly become ill with a cold due to a coronavirus and may catch the same one about 4 months later. This is because coronavirus antibodies do not last for a long time. Also, the antibodies for one strain of coronavirus may be ineffective against another one.

References: https://www.medicalnewstoday.com/articles/256521#definition


The **coronavirus pandemic** is an ongoing pandemic of coronavirus disease 2019 (COVID-19), caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2).The outbreak was first identified in Wuhan, Hubei, China, in December 2019, and was recognised as a pandemic by the World Health Organization (WHO) on 11 March 2020. As of 18 April, more than 2,225,156 cases of COVID-19 have been reported in more than 200 countries and territories, resulting in more than 152,392 deaths and more than 567,279 have been recovered.

https://www.worldometers.info/coronavirus/

**Symptoms**

Common symptoms:

fever, tiredness, dry cough


Some people may experience:

aches and pains, nasal congestion, runny nose, sore throat, diarrhoea.


On average it takes 5–6 days from when someone is infected with the virus for symptoms to show, however it can take up to 14 days.

# Time Series Models

In time series models we have two important models and they are ARIMA and SARIMA models, they both belongs to the concepts of ARMA models. 

**ARMA**: 
Think of this model like we have to predict tomorrow confirm cases in a country, so for that we will be taking its previous month records/data and we will also consider error rate of that time period. 

AR = Autoregression

MA = Moving Average

An **ARIMA** model is a class of statistical models for analyzing and forecasting time series data.

ARIMA is an acronym that stands for AutoRegressive Integrated Moving Average. It is a generalization of the simpler AutoRegressive Moving Average and adds the notion of integration.

AR: Autoregression. A model that uses the dependent relationship between an observation and some number of lagged observations.

I: Integrated. The use of differencing of raw observations (e.g. subtracting an observation from an observation at the previous time step) in order to make the time series stationary.

MA: Moving Average. A model that uses the dependency between an observation and a residual error from a moving average model applied to lagged observations.

The parameters of the ARIMA model are:

**p:** The number of lag observations included in the model, also called the lag order.

**d:** The number of times that the raw observations are differenced, also called the degree of differencing.

**q:** The size of the moving average window, also called the order of moving average.

**SARIMA**
Seasonal Autoregressive Integrated Moving Average, SARIMA or Seasonal ARIMA, is an extension of ARIMA that explicitly supports univariate time series data with a seasonal component.

It adds three new hyperparameters to specify the autoregression (AR), differencing (I) and moving average (MA) for the seasonal component of the series, as well as an additional parameter for the period of the seasonality.

There are three trend elements that require configuration.

**p:** Trend autoregression order.
**d:** Trend difference order.
**q:** Trend moving average order.


There are four additional variables which belong to seasonal elements:

**P:** Seasonal autoregressive order.
**D:** Seasonal difference order.
**Q:** Seasonal moving average order.
**m:** The number of time steps for a single seasonal period.

Notation for an SARIMA model is specified as:

SARIMA(p,d,q)(P,D,Q)m


**Video Tutorial**

https://www.youtube.com/watch?v=ZNi_3bcutkY

In [None]:
pip install pmdarima

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import folium
from statsmodels.tsa.statespace.sarimax import SARIMAX
from statsmodels.tsa.arima_model import ARIMA
import plotly.graph_objects as go
from pmdarima import auto_arima    
import warnings
warnings.filterwarnings("ignore")

%matplotlib inline

# KAGGLE Competion

In [None]:
train = pd.read_csv('../input/covid19-global-forecasting-week-4/train.csv')
test = pd.read_csv('../input/covid19-global-forecasting-week-4/test.csv')

In [None]:
train.head()

In [None]:
test.head()

**Preprocessing **

In [None]:
#Changing Date column to datetime
train['Date']= pd.to_datetime(train['Date']) 
test['Date']= pd.to_datetime(test['Date']) 
#set index to date column
new_train = train.set_index(['Date'])
new_test = test.set_index(['Date'])

In [None]:
new_train.head()

In [None]:
new_train.isnull().sum()

In [None]:
new_train[['Province_State']] = new_train[['Province_State']].fillna('')
new_train.isnull().sum()

In [None]:
#dropping forcast id and id columns
new_test = new_test.drop(["ForecastId"], axis=1)
new_train = new_train.drop(["Id"], axis=1)

In [None]:
# Creating a dataframe with total no of cases for every country
confirmiedcases = pd.DataFrame(train.groupby('Country_Region')['ConfirmedCases'].sum())
confirmiedcases['Country_Region'] = confirmiedcases.index
confirmiedcases.index = np.arange(1,185)
global_confirmiedcases = confirmiedcases[['Country_Region','ConfirmedCases']]
fig = px.bar(global_confirmiedcases.sort_values('ConfirmedCases',ascending=False)[:40][::-1],
             x='ConfirmedCases',y='Country_Region',title='Worldwide Confirmed Cases',text='ConfirmedCases', height=900, orientation='h')
fig.show()

In [None]:
# Creating a dataframe with total no of cases for every country
confirmiedcases = pd.DataFrame(new_train.groupby('Country_Region')['Fatalities'].sum())
confirmiedcases['Country_Region'] = confirmiedcases.index
confirmiedcases.index = np.arange(1,185)
global_confirmiedcases = confirmiedcases[['Country_Region','Fatalities']]
fig = px.bar(global_confirmiedcases.sort_values('Fatalities',ascending=False)[:40][::-1],
             x='Fatalities',y='Country_Region',title='Worldwide Deaths',text='Fatalities', height=900, orientation='h')
fig.show()

In [None]:
formated_gdf = train.groupby(['Date', 'Country_Region'])['ConfirmedCases'].sum()
formated_gdf = formated_gdf.reset_index()
formated_gdf['Date'] = pd.to_datetime(formated_gdf['Date'])
formated_gdf['Date'] = formated_gdf['Date'].dt.strftime('%m/%d/%Y')
formated_gdf['size'] = formated_gdf['ConfirmedCases'].pow(0.3)

fig = px.scatter_geo(formated_gdf, locations="Country_Region", locationmode='country names', 
                     color="ConfirmedCases", size='size', hover_name="Country_Region", 
                     range_color= [0, 1500], 
                     projection="natural earth", animation_frame="Date", 
                     title='CORONA: Spread Over Time From Jan 2020 to Apr 2020', color_continuous_scale="portland")
fig.show()

In [None]:
formated_gdf = train.groupby(['Date', 'Country_Region'])['Fatalities'].sum()
formated_gdf = formated_gdf.reset_index()
formated_gdf['Date'] = pd.to_datetime(formated_gdf['Date'])
formated_gdf['Date'] = formated_gdf['Date'].dt.strftime('%m/%d/%Y')
formated_gdf['size'] = formated_gdf['Fatalities'].pow(0.3)

fig = px.scatter_geo(formated_gdf, locations="Country_Region", locationmode='country names', 
                     color="Fatalities", size='size', hover_name="Country_Region", 
                     range_color= [0, 1500], 
                     projection="natural earth", animation_frame="Date", 
                     title='CORONA: Spread Over Time From Jan 2020 to Apr 2020', color_continuous_scale="portland")
fig.show()

In [None]:
new_train.columns

In [None]:
countries = new_train['Country_Region'].unique()
for country in countries:
    if country == 'Turkey':
        train_df = new_train[new_train['Country_Region'] == country]
        test_df = new_test[new_test['Country_Region'] == country]

        #********* Farecasting ConfirmedCases ********

        X_train_conf = train_df['ConfirmedCases'].values
        p,d,q = auto_arima(X_train_conf).order
        
        #For trying out ARIMA
        #ARIMA(X_train_conf,order=(p,d,q))

        model_conf = SARIMAX(X_train_conf,order=(p,d,q),seasonal_order=(0,0,0,0))
        result_conf = model_conf.fit()
        fcast_conf = result_conf.predict(len(X_train_conf)-13,len(X_train_conf)+len(test_df)-14,typ='levels')
        test.loc[test['Country_Region']==country,'ConfirmedCases'] = np.rint(fcast_conf)
       
        
        #********* Farecasting Fatalities ********
        

        X_train_fat = train_df['Fatalities'].values
        p,d,q = auto_arima(X_train_fat).order
        model_fat = SARIMAX(X_train_fat,order=(p,d,q),seasonal_order=(0,0,0,0))
        result_fat = model_fat.fit()
        fcast_fat = result_fat.predict(len(X_train_fat)-13,len(X_train_fat)+len(test_df)-14,typ='levels')

        test.loc[test['Country_Region']==country,'Fatalities'] = np.rint(fcast_fat)
        

In [None]:
#test.loc[test['Country_Region']=='Pakistan']


In [None]:
turkey_data = test.loc[test['Country_Region']=='Turkey']
turkey_data.columns

In [None]:
plot_turkey_data = turkey_data.filter(["Date","ConfirmedCases", "Fatalities"])
plot_turkey_data.head()

In [None]:

fig = go.Figure(go.Scatter(x=plot_turkey_data['Date'],y=plot_turkey_data['ConfirmedCases'],
                      text='Total Confirmed Cases'))
fig.update_layout(title_text='Total Number of Coronavirus Cases by Date')
fig.update_yaxes(showticklabels=False)

fig.show()


In [None]:

fig = go.Figure(go.Scatter(x=plot_turkey_data['Date'],y=plot_turkey_data['Fatalities'],
                      text='Total Confirmed Cases'))
fig.update_layout(title_text='Total Number Fatalities of Coronavirus by Date')
fig.update_yaxes(showticklabels=False)

fig.show()


# Global Covid 19 Data of Deaths, Recovered, Confirmed Cases

In [None]:
# download the latest data sets
global_confirmed_cases = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv')
global_deaths = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv')
global_recovered = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_recovered_global.csv')

In [None]:
global_confirmed_cases.head()

In [None]:
global_deaths.head()

In [None]:
global_recovered.head()

In [None]:
dates = global_confirmed_cases.columns[4:]

In [None]:
cc_df = global_confirmed_cases.melt(id_vars=['Province/State', 'Country/Region', 'Lat', 'Long'], 
                            value_vars=dates, var_name='Date', value_name='Confirmed')
print(cc_df.head())

In [None]:
# create complete data

cc_df = global_confirmed_cases.melt(id_vars=['Province/State', 'Country/Region', 'Lat', 'Long'], 
                            value_vars=dates, var_name='Date', value_name='Confirmed')


deaths_df = global_deaths.melt(id_vars=['Province/State', 'Country/Region', 'Lat', 'Long'], 
                            value_vars=dates, var_name='Date', value_name='Deaths')

recv_df = global_recovered.melt(id_vars=['Province/State', 'Country/Region', 'Lat', 'Long'], 
                            value_vars=dates, var_name='Date', value_name='Recovered')

print(cc_df.shape)
print(deaths_df.shape)
print(recv_df.shape)

complete_data = pd.merge(left=cc_df, right=deaths_df, how='left',
                      on=['Province/State', 'Country/Region', 'Date', 'Lat', 'Long'])
complete_data = pd.merge(left=complete_data, right=recv_df, how='left',
                      on=['Province/State', 'Country/Region', 'Date', 'Lat', 'Long'])

complete_data.head()

In [None]:
# Active cases 
complete_data['Active'] = complete_data['Confirmed'] - complete_data['Recovered'] - complete_data['Deaths']


In [None]:
#check for null/nan values

complete_data.isna().sum()


In [None]:

complete_data['Recovered'] = complete_data['Recovered'].fillna(0)
complete_data['Recovered'] = complete_data['Recovered'].astype('int')
complete_data['Active'] = complete_data['Active'].fillna(0)
complete_data['Active'] = complete_data['Active'].astype('int')
complete_data.isna().sum()

In [None]:
complete_data = complete_data.rename(columns={"Province/State":"State","Country/Region": "Country"})

In [None]:
complete_data.loc[complete_data['Country'] == "US", "Country"] = "USA"

complete_data.loc[complete_data['Country'] == 'Korea, South', "Country"] = 'South Korea'

complete_data.loc[complete_data['Country'] == 'Taiwan*', "Country"] = 'Taiwan'

complete_data.loc[complete_data['Country'] == 'Congo (Kinshasa)', "Country"] = 'Democratic Republic of the Congo'

complete_data.loc[complete_data['Country'] == "Cote d'Ivoire", "Country"] = "Côte d'Ivoire"

complete_data.loc[complete_data['Country'] == "Reunion", "Country"] = "Réunion"

complete_data.loc[complete_data['Country'] == 'Congo (Brazzaville)', "Country"] = 'Republic of the Congo'

complete_data.loc[complete_data['Country'] == 'Bahamas, The', "Country"] = 'Bahamas'

complete_data.loc[complete_data['Country'] == 'Gambia, The', "Country"] = 'Gambia'


In [None]:
df_date = complete_data.filter(["Date",  "Confirmed", "Deaths", "Recovered"])
df_date = df_date.groupby(df_date["Date"]).sum()
df_date.head()

In [None]:
plt.figure(figsize=(15,6))
plt.plot(df_date, marker='o')
plt.title('Total Number of Coronavirus Cases by Date')
plt.legend(df1_date.columns)
plt.xticks(rotation=75)
plt.show()

In [None]:
countries_grouped = complete_data.groupby('Country')['Confirmed', 'Deaths', 'Recovered'].sum().reset_index()
temp = countries_grouped[['Country', 'Deaths']]
temp = temp.sort_values(by='Deaths', ascending=False)
temp = temp.reset_index(drop=True)
temp = temp[temp['Deaths']>0]
temp.style.background_gradient(cmap='Pastel1_r')

In [None]:
countries = complete_data['Country'].unique()
for country in countries:
    if(country == 'Turkey'):

        train_df = complete_data[complete_data['Country'] == country]
        data = train_df.Recovered.astype('int32').tolist()
        
        # fit model
        p,d,q = auto_arima(data).order
        model = SARIMAX(data, order=(p,d,q), seasonal_order=(0,0,0,0),measurement_error=True)#seasonal_order=(1, 1, 1, 1))
        model_fit = model.fit(disp=False)
        
        # make prediction
        predicted = model_fit.predict(len(data), len(data)+13)
       
        print(predicted)
       

#  *THANK YOU*

Lets connect on other mediums as well:

Github: https://github.com/uzairaj

YouTube Channel: https://www.youtube.com/channel/UCCxSpt0KMn17sMn8bQxWZXA

Blog: http://uzairadamjee.com/blog/