# Introduction

The novel coronavirus (provisionally named 2019-nCoV) is a contagious virus that causes respiratory infection. It has been identified as the causative agent of the ongoing 2019â€“20 Wuhan coronavirus outbreak. 

As many early cases were linked to a large seafood and animal market, the virus is thought to have a zoonotic origin, but this has not been confirmed. Comparisons of the genetic sequences of this virus and other virus samples have shown similarities to SARS-CoV (79.5%) and bat coronaviruses (96%), which makes an ultimate origin in bats likely.

The first known human infection occurred in December 8, 2019. An outbreak of 2019-nCoV was first detected in Wuhan, China, in mid-December 2019.The virus subsequently spread to all other provinces of China and to more than twenty other countries in Asia, Europe, North America, and Oceania. Human-to-human spread of the virus has been confirmed in China, Germany, Thailand, Taiwan, Japan, and the United States. 

As of 1 February 2020, there were 12,024 confirmed cases of infection, of which 11,860 were within mainland China. Cases outside China, to date, were people who have either travelled from Wuhan, or were in direct contact with someone who travelled from the area. The number of deaths was 259 as of 1 February 2020.

Source: https://en.wikipedia.org/wiki/Novel_coronavirus_(2019-nCoV)

In [None]:
import numpy as np 
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly as py
import plotly.express as px
import plotly.graph_objs as go
from plotly.subplots import make_subplots
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
init_notebook_mode(connected=True) 

import warnings
warnings.filterwarnings('ignore')

In [None]:
df_data = pd.read_csv("/kaggle/input/2019-coronavirus-dataset-01212020-01262020/2019_nCoV_20200121_20200131.csv", parse_dates=["Last Update"])
df_data["UpdateDate"] = df_data["Last Update"].dt.date.astype(str)
df_data_6Feb = pd.read_csv("/kaggle/input/2019-coronavirus-dataset-01212020-01262020/2019_nCoV_20200121_20200206.csv", parse_dates=["Last Update"])
df_data_6Feb["UpdateDate"] = df_data_6Feb["Last Update"].dt.date.astype(str)
df_data2 = pd.read_csv("/kaggle/input/novel-corona-virus-2019-dataset/2019_nCoV_data.csv", parse_dates=["Last Update"])
df_data2["UpdateDate"] = df_data2["Last Update"].dt.date.astype(str)
df_data.head()

Hubei is a province in Mainland China which capital is Wuhan. This dataset is contains data until the January 31 currently. And the first row above shows Wuhan's data on the last day of January.

In [None]:
df_data.describe()

The data on max columns shows Wuhan's status on January 31.Expect "Suspected". it is belong to Hong Kong, it can seen below.

In [None]:
df_data[df_data["Suspected"]>=1].sort_values("Suspected", ascending=False).head()

In [None]:
df_countries = df_data.groupby(['Country/Region', 'Last Update']).sum().reset_index().sort_values('Last Update', ascending=False)
df_countries = df_countries.drop_duplicates(subset = ['Country/Region'])
df_countries = df_countries[df_countries["Confirmed"]>0]
df_countries

In [None]:
data = [ dict(
        type = 'choropleth',
        locations = df_countries['Country/Region'],
        locationmode = 'country names',
        z = df_countries['Confirmed'],
        colorscale=
            [[0.0, "rgb(250, 237, 235)"],
            [0.09, "rgb(245, 211, 206)"],
            [0.12, "rgb(239, 179, 171)"],
            [0.15, "rgb(236, 148, 136)"],
            [0.22, "rgb(239, 117, 100)"],
            [0.35, "rgb(235, 90, 70)"],
            [0.45, "rgb(207, 81, 61)"],
            [0.65, "rgb(176, 70, 50)"],
            [0.85, "rgb(147, 59, 39)"],
            [1.00, "rgb(110, 47, 26)"]],
        autocolorscale = False,
        reversescale = False,
        marker = dict(
            line = dict (
                color = 'rgb(180,180,180)',
                width = 0.5
            ) 
        ),
        colorbar = dict(
            autotick = False,
            tickprefix = '',
            title = 'Participant'),
      ) ]

layout = dict(
    title = "Last Confirmed Cases (Till January 31, 2020)",
    geo = dict(
        showframe = False,
        showcoastlines = True,
        projection = dict(type = 'Mercator'),
        width=500,height=400)
)

w_map = dict( data=data, layout=layout)
iplot( w_map, validate=False)

By the end of January 2020, there was above 11k confirmed Corona cases in the China. If you look at the rest of the world, the virus appeared on all continents except Afrika and South America.

In [None]:
df_countrybydate = df_data.groupby(['Country/Region', 'Last Update', 'UpdateDate']).sum().reset_index().sort_values('Last Update', ascending=False)
df_countrybydate = df_countrybydate.groupby(['Country/Region', 'UpdateDate']).max().reset_index().sort_values('Last Update')
df_countrybydate["Size"] = np.where(df_countrybydate['Country/Region']=='Mainland China', df_countrybydate['Confirmed'], df_countrybydate['Confirmed']*200)

I used 200 times "confirmed" for Size because China was suppressing other countries on the bubble chart.

In [None]:
df = px.data.gapminder()
fig = px.scatter_geo(df_countrybydate, locations="Country/Region", locationmode = "country names",
                     hover_name="Country/Region", size="Size", color="Confirmed",
                     animation_frame="UpdateDate", 
                     projection="natural earth",
                     title="Progression of Coronavirus in Confirmed Cases",template="none")
fig.show()

Interactive map above shows spread of the Virus day by day. You can click play button to see. 

In [None]:
df_provincebydate = df_data.groupby(['Province/State', 'Last Update', 'UpdateDate']).max().reset_index().sort_values('Last Update', ascending=False)
df_CHProvinces = df_provincebydate[df_provincebydate['Country/Region']=="Mainland China"]
df_chprovincelastcases = df_CHProvinces.drop_duplicates(subset = ['Province/State']).sort_values("Province/State")

df_CHRecDead = df_chprovincelastcases.loc[:,["Province/State", "Recovered", "Death"]]
df_CHRecDeadHb = df_CHRecDead[df_CHRecDead["Province/State"]=="Hubei"]

In [None]:
import seaborn as sns
plt.figure(figsize=(20,8))
sns.barplot(x=df_chprovincelastcases['Province/State'], y = df_chprovincelastcases['Confirmed'])

Most cases occurred in Hubei which its capital is Wuhan. This is very normal considering that the place where the disease outbreak in Wuhan. Fewer cases than Hubei were seen in other provinces in China. Nevertheless, compared to other countries, the cases is very high in these provinces too.

In [None]:
death = df_CHRecDeadHb['Death'].sum()
recoverd = df_CHRecDeadHb['Recovered'].sum()
import matplotlib.pyplot as plt
labels = 'Death', 'Recoverd'
sizes = [death, recoverd]
colors = 'red', 'green'
explode = [0.1,0]
plt.pie(sizes, explode=explode, labels=labels, colors=colors,autopct='%1.1f%%', shadow=True, startangle=100)
plt.title('death/recovered rate in hubei(wuhan) till january31 2020')
plt.axis('equal')
plt.show()

In the Wuhan, the number of dead people is more than healed people.

In [None]:
df_CHRecDeadNotHb = df_CHRecDead[((df_CHRecDead["Province/State"]!="Hubei") & ((df_CHRecDead["Recovered"]>=1) | (df_CHRecDead["Death"]>=1)))].sort_values("Recovered", ascending=False)

In [None]:
df_CHRecDeadNotHb.head()

In [None]:
plt.figure(figsize=(18,8))
sns.barplot(x=df_CHRecDeadNotHb['Province/State'],y=df_CHRecDeadNotHb['Recovered'], color='blue',label='RECOVERED')
sns.barplot(x=df_CHRecDeadNotHb['Province/State'], y=df_CHRecDeadNotHb['Death'], color='red', label='DEATH')
plt.title('death recoverd rate in other china provinces')
plt.legend()

In [None]:
df_CHProvincesByDate = df_CHProvinces.groupby(['Province/State', 'UpdateDate']).max().reset_index().sort_values('UpdateDate') 
df_CHProvincesByDateHB = df_CHProvincesByDate[df_CHProvincesByDate["Province/State"]=="Hubei"]

In [None]:
#confirmed = df_CHRecDeadHb['Confirmed']
plt.figure(figsize=(15,5))
sns.lineplot(x=df_CHProvincesByDateHB["UpdateDate"],y=df_CHProvincesByDateHB['Confirmed'])
plt.title('confirmed cases in hubei')

In [None]:
df_CHProvincesByDateHB.set_index('UpdateDate')

In [None]:
Data = df_CHProvincesByDateHB.iloc[:,-2:]

In [None]:
plt.figure(figsize=(10,5))
sns.lineplot(data=Data, markers=True,)
plt.title('recovered/deaths')

The number of confirmed cases, dead and recovered people increases over time. I chose analysis on Hubei (Wuhan) because of its data steady. Datasets after January 31, there are some missing data. That interrupts the time series.

In the other China Provinces, the number of healed people is more than dead people generally.

In [None]:
df_6feb = df_data_6Feb.groupby(['Country/Region', 'Last Update']).sum().reset_index().sort_values('Last Update', ascending=False)
df_6feb = df_6feb.drop_duplicates(subset = ['Country/Region'])
df_6feb = df_6feb[df_6feb["Confirmed"]>0]
df_6febNotCh = df_6feb[df_6feb['Country/Region']!='Mainland China']
df_6febNotCh = df_6febNotCh.sort_values("Confirmed")

In [None]:
plt.figure(figsize=(10,10))
sns.barplot(x=df_6febNotCh['Confirmed'], y = df_6febNotCh['Country/Region'])
plt.title('no. of cases outside mainland china until 6feb')

The most cases have seen in the far east. Outside of the far east, Germany and United Arab Emirates are the areas most cases have occurred.

### To be continued... If you like, Please upvote.