
# <font color=red> COVID 19 </font>


Coronavirus disease 2019 (COVID-19) is an infectious disease caused by the virus strain "severe acute respiratory syndrome coronavirus 2" (SARS-CoV-2). The disease was first identified in 2019 in Wuhan, China, and has since spread globally, resulting in the 2019–20 coronavirus pandemic.

The infection is typically spread from one person to another via respiratory droplets produced during coughing and sneezing.Time from exposure to onset of symptoms is generally between 2 and 14 days, with an average of 5 days.

The World Health Organization (WHO) declared the 2019–20 coronavirus outbreak a pandemic and a Public Health Emergency of International Concern (PHEIC).

##  Importing Libraries

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
from sklearn import preprocessing
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score

## Loading data

Johns Hopkins University has made an excellent dashboard using the affected cases data. Data is extracted from the google sheets associated.
The data is available from 22 Jan, 2020.
### Column Description
 - Sno - Serial number
 - ObservationDate - Date of the observation in MM/DD/YYYY
 - Province/State - Province or state of the observation (Could be empty when missing)
 - Country/Region - Country of observation
 - Last Update - Time in UTC at which the row is updated for the given province or country. (Not standardised and so please clean before using it)
 - Confirmed - Cumulative number of confirmed cases till that date
 - Deaths - Cumulative number of of deaths till that date
 - Recovered - Cumulative number of recovered cases till that date

In [None]:
covid19=pd.read_csv('../input/novel-corona-virus-2019-dataset/covid_19_data.csv')
# Converting dates to datetime
covid19['ObservationDate']=pd.to_datetime(covid19['ObservationDate'])

## Data Visualization

Visualization helps in understanding the spread of corona virus from one country to other and for how much.

In [None]:
df=covid19.groupby(['ObservationDate'])[['Confirmed','Recovered','Deaths']].sum()
plt.figure(figsize=(20,10))
plt.title('Cases of Novel covid_19 since 22 Jan 2020',fontsize=30)
plt.xlabel('Date',fontsize=20)
plt.ylabel('Number of cases',fontsize=20)
plt.plot(df.index,df['Confirmed'],label='Infected',linewidth=3)
plt.plot(df.index,df['Recovered'],label='Recovered',linewidth=3,color='green')
plt.plot(df.index,df['Deaths'],label='Deaths',linewidth=3,color='red')
plt.bar(df.index,df['Confirmed'],alpha=0.2,color='c')
plt.xticks(fontsize=15,rotation=90)
plt.yticks(fontsize=15)
plt.style.use('ggplot')
plt.legend()

In [None]:
df1=covid19.groupby(['Country/Region'])[['ObservationDate','Confirmed','Recovered','Deaths']]
india_cases=df1.get_group('India')
plt.figure(figsize=(20,8))
plt.title('Cases of Novel covid_19 in India',fontsize=20)

plt.ylabel('Number of cases',fontsize=20)
plt.xlabel('Dates',fontsize=20)
plt.plot(india_cases['ObservationDate'],india_cases['Confirmed'],'-o',linewidth=2,markersize=10,mfc='red',mew=2.9,mec='black')
plt.xticks(rotation=90)
plt.grid()

The first case of the 2019–20 coronavirus pandemic in India was reported on 30 January 2020, originating from China. After 3 March 2020, the infected cases increases gradually in India. As of 20 March 2020, the Ministry of Health and Family Welfare has confirmed a total of 223 cases and 4 deaths in the country.

In [None]:
country=covid19.groupby(['Country/Region'])[['Confirmed','Recovered','Deaths']].sum()
top_5=country.nlargest(5,['Confirmed'])
plt.figure(figsize=(20,16))
plt.subplot(311)
plt.title('Top 5 Countries with confirmed, recovered and death cases',fontsize=20)
plt.barh(top_5.index,top_5['Confirmed'],color='blue')
plt.yticks(fontsize=20)
plt.xlabel('Confirmed',fontsize=20)
plt.subplot(312)
plt.barh(top_5.index,top_5['Deaths'],color='red')
plt.yticks(fontsize=20)
plt.xlabel('Deaths',fontsize=20)
plt.subplot(313)
plt.barh(top_5.index,top_5['Recovered'],color='green')
plt.yticks(fontsize=20)
plt.xlabel('Recovered',fontsize=20)

In [None]:
covid19['day']=covid19['ObservationDate'].dt.day
import matplotlib.ticker as ticker
cv1=covid19[covid19['ObservationDate']>'2020-03']
fig, ax=plt.subplots(figsize=(15,8))
def draw_barchart(date):
    df=cv1[cv1['day'].eq(date)].sort_values(by='Confirmed',ascending=True).tail(10)
    ax.clear()
    ax.text(0,1.12,'The top most infected Countries on 18 March 2020',size=24,weight=600,transform=ax.transAxes,ha='left')
    ax.barh(df['Country/Region'],df['Confirmed'],color='orange')
    for i, (country,value) in enumerate(zip(df['Country/Region'],df['Confirmed'])):
        ax.text(value,i, country, size=14, ha='right',va='bottom')
        ax.text(value,i,f'{value:.0f}', size=14, ha='left',va='center')
    ax.xaxis.set_major_formatter(ticker.StrMethodFormatter('{x:,.0f}')) 
    ax.set_yticks([])
    ax.set_axisbelow(True)
    ax.margins(0,0.1)
    ax.tick_params(axis='x',labelsize=15,colors='blue')
    ax.grid(which='major',axis='x',linestyle='--')
    plt.box(False) 
draw_barchart(18)


In [None]:
rank=country.nlargest(179,['Confirmed']).head(10)
confirmed=[]
recovered=[]
death=[]
for i in rank.index:
    df1=covid19[covid19['Country/Region']==i]
    confirmed.append(df1['Confirmed'].mean())
    recovered.append(df1['Recovered'].mean())
    death.append(df1['Deaths'].mean())
plt.figure(figsize=(20,20))

plt.subplot(311)
plt.title('Top 10 countries with mean confirmed, recovered and death cases',fontsize=20,color='green')
plt.plot(rank.index,confirmed,'-o',mfc='black')
plt.ylabel('Confirmed',fontsize=20)
plt.grid()
plt.subplot(312)
plt.plot(rank.index,recovered,'-o',color='green',mfc='black')
plt.ylabel('Recovered',fontsize=20)
plt.grid()
plt.subplot(313)
plt.plot(rank.index,death,'-o',color='red',mfc='black')
plt.ylabel('Death',fontsize=20)
plt.grid()    

In [None]:
rank1=country.nlargest(179,['Confirmed']).head(20)
confirmed_perc=[]
for i in rank1['Confirmed']:
    confirmed_perc.append(i/rank1['Confirmed'].sum())
plt.figure(figsize=(20,20))    
plt.title('Distribution of confirmed cases',fontsize=20)    
plt.pie(confirmed_perc,autopct='%1.1f%%')
plt.legend(rank1.index,loc='best')
plt.show()    

## Symptoms

Although those infected with the virus may be asymptomatic, many develop flu-like symptoms including fever, cough, and shortness of breath.

In [None]:
symptoms={'symptom':['Fever',
        'Dry cough',
        'Fatigue',
        'Sputum production',
        'Shortness of breath',
        'Muscle pain',
        'Sore throat',
        'Headache',
        'Chills',
        'Nausea or vomiting',
        'Nasal congestion',
        'Diarrhoea',
        'Haemoptysis',
        'Conjunctival congestion'],'percentage':[87.9,67.7,38.1,33.4,18.6,14.8,13.9,13.6,11.4,5.0,4.8,3.7,0.9,0.8]}

symptoms=pd.DataFrame(data=symptoms,index=range(14))
symptoms

In [None]:
plt.figure(figsize=(20,15))
plt.title('Distribution of Symptoms',fontsize=20)    
plt.pie(symptoms['percentage'],autopct='%1.1f%%')
plt.legend(symptoms['symptom'],loc='best')
plt.show() 

## Prevention
You can protect yourself and help prevent spreading the virus to others if you:

Do
 - Wash your hands regularly for 20 seconds, with soap and water or alcohol-based hand rub
 - Cover your nose and mouth with a disposable tissue or flexed elbow when you cough or sneeze
 - Avoid close contact (1 meter or 3 feet) with people who are unwell
 - Stay home and self-isolate from others in the household if you feel unwell
  
Don't

Touch your eyes, nose, or mouth if your hands are not clean

## Building Model

We have to build a  strong model that predicts how the virus could spread across different countries and regions may be able to help mitigation efforts. The goal of this task is to build a model that predicts the progression of the virus throughout March 2020.

In [None]:
covid19['Country/Region']=covid19['Country/Region'].astype('str')
covid19['Province/State']=covid19['Province/State'].astype('str')
covid19['day']=covid19['ObservationDate'].dt.day
covid19['month']=covid19['ObservationDate'].dt.month
lbl=preprocessing.LabelEncoder()
for c in ['Province/State','Country/Region']:
    lbl.fit(covid19[c].unique())
    covid19[c]=lbl.transform(covid19[c])
x=covid19.drop(['Confirmed','SNo','Last Update','ObservationDate'],1)
y=covid19['Confirmed']
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2,random_state=0)

In [None]:
model=LinearRegression()
model.fit(x_train,y_train)

In [None]:
y_pred=model.predict(x_test)
from sklearn.metrics import r2_score

print(' R2 Score   : ',r2_score(y_test, y_pred))