# COVID ANALYSIS AND PREDICTION :
                                                            
Corona Virus disease (COVID-19) is an infectious disease caused by a newly discovered virus, which emerged in Wuhan, China in December of 2019.Most people infected with the COVID-19 virus will experience mild to moderate respiratory illness and recover without requiring special treatment.  Older people and those with underlying medical problems like cardiovascular disease, diabetes, chronic respiratory disease, and cancer are more likely to develop serious illness.
The COVID-19 virus spreads primarily through droplets of saliva or discharge from the nose when an infected person coughs or sneezes, so you might have heard caution to practice respiratory etiquette (for example, by coughing into a flexed elbow).
# Libraries

We import a few important libraries that we shall use in the model. Pandas is an extremely fast and flexible data analysis and allows you to allow you to store and manipulate tabular data. We also import visualisation libraries such as matplotlib, seaborn and plotly.

# Prediction



For the prediction purpose we have use Prophet library produced by Facebook which is used for Time series Forecasting. Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects. It works best with time series that have strong seasonal effects and several seasons of historical data.

Prophet is available in both Python and R but in this project we have use python. Prophet() function is used do define a Prophet forecasting model in Python.  Input to Prophet is a dataframe with minimum two columns : ds and y.  ds is datestamp column and should be as per pandas datatime format, YYYY-MM-DD or YYYY-MM-DD HH:MM:SS for a timestamp and y is the numeric column we want to predict or forecast. We can get a suitable dataframe that extends into the future a specified number of days using the helper method Prophet.make_future_dataframe. By default it will also include the dates from the history.

Prophet time series = Trend + Seasonality + Holiday + error. 
 Trend models non periodic changes in the value of the time series.
 Seasonality is the periodic changes like daily, weekly, or yearly seasonality.
Holiday effect which occur on irregular schedules over a day or a period of days.
Error terms is what is not explained by the model.



T

# IMPORTING VARIOUS LIBRARIES FOR COVID PREDCITION AND ANALYSIS

In [None]:

import numpy as np 
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px
from plotly.offline import iplot, init_notebook_mode
import plotly.figure_factory as ff
import datetime

# Predictions
from fbprophet import Prophet
from fbprophet.plot import plot_plotly, add_changepoints_to_plot
print("Modules are Imported ")


# LOADING THE DATA

In [None]:
age_data = pd.read_csv('../input/covid19-in-india/AgeGroupDetails.csv')
icmrTestLabs = pd.read_csv('../input/covid19-in-india/ICMRTestingLabs.csv') #Test labs data
covid_data = pd.read_csv('../input/covid19-in-india/covid_19_india.csv')
world_data = pd.read_csv('../input/corona-virus-report/covid_19_clean_complete.csv')
hospital_data = pd.read_csv('../input/covid19-in-india/HospitalBedsIndia.csv')

# DATA OF INDIA (COVID)

In [None]:
covid_data.head() 

In [None]:
covid_data.tail(5) #Recent Cases 

In [None]:
 covid_data.isnull().sum() #checking for null values


In [None]:
state_cases = covid_data.groupby('State/UnionTerritory')['Confirmed','Deaths','Cured'].max().reset_index()

#Creating new columns 'Active', 'Death rate per 100' and 'Cure rate per 100'
state_cases['Active'] = state_cases['Confirmed'] - (state_cases['Cured'] + state_cases['Deaths'])


In [None]:
#Sort by maximum Confirmed Cases
state_cases = state_cases.sort_values(by='Confirmed', ascending=False) 
state_cases.style.background_gradient(cmap='Blues',subset=["Confirmed"])\
                        .background_gradient(cmap='Reds',subset=["Deaths"])\
                        .background_gradient(cmap='Greens',subset=["Cured"])\
                        .background_gradient(cmap='Blues', subset=["Active"])\
                        

In [None]:
#plot for Total Deaths in India (State wise Anlaysis)
import datetime 
today = datetime.date.today() # today date
yesterday = '02/09/20'
df1= covid_data[covid_data['Date']==yesterday] # Below the plot I have shown the dataframe
fig = px.bar(df1, x='State/UnionTerritory', y='Deaths', height=600)
fig.update_layout(
    title='Till {} Total Deaths in India'.format(yesterday))
fig.show()
df1.head()



In [None]:
# Plot for Recovered Cases in India (State Wise Analysis)

import datetime 
today = datetime.date.today() # today date
yesterday = '02/09/20'
df1= covid_data[covid_data['Date']==yesterday] # Selecting the yesterday date 
fig = px.bar(df1, x='State/UnionTerritory', y='Cured', height=600)
fig.update_layout(
    title='Till {} Total Recovered Cases in India'.format(yesterday))
fig.show() # plot 
df1.head() # Dataframe df1

# AGE DATA

In [None]:
age_data.head(15) # It shows which age group is more affected by Covid 

In [None]:
age_data.info()

In [None]:
percent = age_data['Percentage'] #percentage of people affected
percent


In [None]:
# Plot 
plt.figure(figsize=(14,8)) #This Shows age group between 20 -29 is most affected by Covid
sns.barplot(data= age_data,x='AgeGroup',y='TotalCases')
plt.title('Age Group Distribution')
plt.xlabel('Age Group')
plt.ylabel('Total Nuumber of  Cases')


# ICMR TEST LAB DATA


In [None]:
icmrTestLabs.head() #This will show everything bout Labs i.e Name , Address , State , pincode etc.

In [None]:
icmrTestLabs['state'].value_counts() #Just to Show which state has more Labs



In [None]:
icmrTestLabs['type'].unique() # Just to check which type of labs i.e Private , Government or Collection Site

In [None]:
Labs_type_by_State = icmrTestLabs[['type', 'state']].groupby(['type', 'state']).sum()
display(Labs_type_by_State)

Statewise Analysis of Test Labs

In [None]:
state=list(icmrTestLabs['state'].value_counts().index) #states
count=list(icmrTestLabs['state'].value_counts()) #number of labs
plt.figure(figsize=(14,8))
sns.barplot(x=count,y=state)
plt.xlabel('Counts')
plt.ylabel('States')
plt.title('ICMR Test labs per States')
plt.tight_layout()

# HOSPITAL DATA

In [None]:
hospital_data.head(10)

In [None]:
hospital_data.describe()# Mean , total , percentile etc on all columns

In [None]:
hospital_data.isnull().sum() #checking Null values 
#  our data has only one column which is missing values i.e  NumSubDistrictHospitals_HMIS 


In [None]:
hospital_data['NumSubDistrictHospitals_HMIS'] = hospital_data['NumSubDistrictHospitals_HMIS'].fillna(0)
#fill all the missing values with zero (0)

In [None]:
# Public Beds in Hospital  
fig = px.bar(hospital_data,                                  
             x=hospital_data['State/UT'].iloc[:36], 
             y=hospital_data['NumPublicBeds_HMIS'].iloc[:36], 
             color=hospital_data['NumPublicBeds_HMIS'].iloc[:36])

fig.update_layout(title={
        'text': "Num of Public Beds in Each State",
        'y':0.95,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
        xaxis_title="State/UT",
        yaxis_title="Number of Public Beds",
        plot_bgcolor='white')

<div style="color:white;
           padding:8px 10px 0 10px;
           display:inline-block;
           border-radius:5px;
           background-color:#5642C5;
           font-size:110%;
           font-family:Verdana">
    <h1 style='color:white;'>5. World Data</h1>
</div>

In [None]:
world_data.head() #World Data

In [None]:
world_data['Date'] = world_data['Date'].apply(pd.to_datetime, dayfirst=True)

In [None]:
world_data['Date'] = world_data['Date'].dt.strftime('%Y/%m/%d') 

<div style="color:white;
           padding:8px 10px 0 10px;
           display:inline-block;
           border-radius:5px;
           background-color:#5642C5;
           font-size:110%;
           font-family:Verdana">
    <h1 style='color:white;'>6. Making Predictions (India)</h1>
</div>

 Data forecasting is done by using Prophet library .Prophet is an open source library published by Facebook which is good for Time Series Forecasting. We will predict the coronavirus cases till ** Mid August  2020**.

In [None]:
india_data = world_data[world_data["Country/Region"]=="India"] #Selecting India from World Data
india_data.head()

In [None]:
pred_data = india_data.groupby('Date')[['Confirmed', 'Recovered', 'Deaths']].sum().reset_index()
pred_data['Date'] = pred_data['Date'].apply(pd.to_datetime, dayfirst=True)
#Grouping the data Based upon Date and Cnfirmed , Recovered Casea and Deaths

<div style="color:white;
           padding:8px 10px 0 10px;
           display:inline-block;
           border-radius:5px;
           background-color:#EC2566;
           font-size:90%;
           font-family:Verdana">
    <h1 style='color:white;'>6.1. Confirmed Cases in India</h1>
</div>

In [None]:
pred_confirm = pred_data.loc[:, ['Date', 'Confirmed']] # for Prophet model we are required to have atleast 2 columns
# Date and Object which we are  Predicting
pred_confirm.tail()

<div style="color:white;
           padding:8px 10px 0 10px;
           display:inline-block;
           border-radius:5px;
           background-color:#5E7B81;
           font-size:90%;
           font-family:Verdana">
    <h1 style='color:#ffffff;'>6.1.1. Creating Model</h1>
</div>

Defining our Prophet() model.

In [None]:
model = Prophet()

In [None]:
# Dataframe must have columns "ds" and "y" with the dates and values  for prophet prediction
pred_confirm.columns = ['ds', 'y']
model.fit(pred_confirm)

In [None]:
future = model.make_future_dataframe(periods=45) # helper function to extend the dataframe for specified days
future.tail()

<div style="color:white;
           padding:8px 10px 0 10px;
           display:inline-block;
           border-radius:5px;
           background-color:#5E7B81;
           font-size:90%;
           font-family:Verdana">
    <h1 style='color:#ffffff;'>6.1.2. Making Predictions</h1>
</div>

In [None]:
# yhat represents the prediction, while yhat_lower and yhat_upper represent the lower and upper bound
forecast_india_conf = model.predict(future)
forecast_india_conf

In [None]:
fig = plot_plotly(model, forecast_india_conf) 

fig.update_layout(template='plotly_white')

iplot(fig) 

We can clearly see the prediction of our model that on **'16 August, 2020'** there will be a total of **'~1.2M (12 lakh)'** confirmed cases in India if the number of confirmed cases goes on increasing like this.

In [None]:
fig = model.plot(forecast_india_conf)

In [None]:
# Plot the various component i.e how the trend goes 
fig = model.plot_components(forecast_india_conf)

In [None]:
cnfrm = forecast_india_conf.loc[:,['ds','trend']]
cnfrm = cnfrm[cnfrm['trend']>0] # trend goes like present trend
cnfrm.columns = ['Date','Confirm'] # name the columns
cnfrm.tail(15)

# Recovered Cases


Just like we predict the no. of confirmed Cases we are gonna predict Recovered and Death Cases

In [None]:
pred_recovered_cases = pred_data.loc[:, ['Date', 'Recovered']]

pred_recovered_cases.head(10)


In [None]:
model = Prophet()

pred_recovered_cases.columns = ['ds', 'y']
model.fit(pred_recovered_cases)

In [None]:
future = model.make_future_dataframe(periods=45)
future.tail()

In [None]:
forecast_india_recover = model.predict(future)

forecast_india_recover

In [None]:
fig = plot_plotly(model, forecast_india_recover)
fig.update_layout(template='plotly_white')
iplot(fig) 

We can clearly see the prediction of our model that on '16 August, 2020' there will be a total of ~ 0.7M 
(7 lakh)' recovered  cases in India if the number of recovered cases goes on increasing like this.

In [None]:
pred_deaths = pred_data.loc[:, ['Date', 'Deaths']]
pred_deaths.tail(3)

In [None]:
model = Prophet()
pred_deaths.columns = ['ds', 'y']
model.fit(pred_deaths)

In [None]:
future = model.make_future_dataframe(periods=45)
future.tail()

In [None]:
forecast_india_death = model.predict(future)
forecast_india_death

In [None]:
fig = plot_plotly(model, forecast_india_death)
fig.update_layout(template='plotly_white')
iplot(fig) 

We can clearly see the prediction of our model that on '16 August, 2020' there will be a total of ~ 36k  Deaths cases in India if the number of Deaths cases goes on increasing like this.