# Event Deacription
On 12/31/2019, there were 59 cases of pneumonia of unknown cause in Wuhan,China. A week after, Chinese authorities identified new virus behind the illness called Coronavirus. 41 of the the initial 59 patients are attacked by this new virus. refer: https://www.sciencedirect.com/science/article/pii/S0140673620301835

On 1/11/2020, the first death case was reported. 

On 1/23/2020, the city Wuhan has been closed until now. All citizens are restricted in their apartment and community. After, many other major cities are close as well. 

As of 2/15/2020, there are over 66581 confirmed cases, 1524 death cases and 8494 recovery cases. Refer: https://ncov.dxy.cn/ncovh5/view/pneumonia

In this project, I want to visulize the virus breakout and use Monte Carlo model to simulate its spread.

In [None]:
import numpy as np 
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
import random

# **Import dataset**

In [None]:
total = pd.read_csv('/kaggle/input/novel-corona-virus-2019-dataset/2019_nCoV_data.csv')
confirm = pd.read_csv('/kaggle/input/novel-corona-virus-2019-dataset/time_series_2019_ncov_confirmed.csv')
death = pd.read_csv('/kaggle/input/novel-corona-virus-2019-dataset/time_series_2019_ncov_deaths.csv')
recovered = pd.read_csv('/kaggle/input/novel-corona-virus-2019-dataset/time_series_2019_ncov_recovered.csv')

In [None]:
total.head()

In [None]:
confirm.head()

#**Clean Data**

In [None]:
confirm = confirm.fillna(0)
death = death.fillna(0)
recovered = recovered.fillna(0)

In [None]:
np.any(confirm.isnull())

In [None]:
np.any(death.isnull())

In [None]:
np.any(recovered.isnull())

Prepare dataset for data visulization

In [None]:
confirm_total = confirm.sum()[2:]
death_total = death.sum()[2:]
recovered_total = recovered.sum()[2:]

In [None]:
df_confirm_total = pd.DataFrame(confirm_total).reset_index()
df_confirm_total = df_confirm_total.rename(columns = {'index': 'Time', 0: 'Confirmed'})

df_death_total = pd.DataFrame(death_total).reset_index()
df_death_total = df_death_total.rename(columns = {'index': 'Time', 0: 'Death'})

df_recovered_total = pd.DataFrame(recovered_total).reset_index()
df_recovered_total = df_recovered_total.rename(columns = {'index': 'Time', 0: 'Recovered'})

In [None]:
df_total = pd.concat([df_confirm_total['Time'], df_confirm_total['Confirmed'], df_death_total['Death'], df_recovered_total['Recovered']], axis=1)
df_total['Time'] = pd.to_datetime(df_total['Time'])
df_total['Time'] = df_total['Time'].dt.normalize()
df_total.set_index('Time')
df_total['Death_ratio'] = df_total['Death']/df_total['Confirmed'] * 100
df_total['Recovery_ratio'] = df_total['Recovered']/df_total['Confirmed'] * 100
df_total['change_in_confirmed'] = df_total['Confirmed'].pct_change()
df_total.head()

# **Data Visulization**

In [None]:
fig = plt.figure()
ax1 = fig.add_subplot(121)
ax2 = fig.add_subplot(122)

ax1.set_title('The number of people in Confirmed, Death and Recovered', fontsize = 18)
ax1.set_xlabel('Date', fontsize = 14)
ax1.set_ylabel('The number of people', fontsize = 14)

ax2.set_xlabel('Date', fontsize = 14)
ax2.set_ylabel('Percentage (%)', fontsize = 14)
ax2.set_title('Death rate VS Recovery rate', fontsize = 18)

df_total.plot(kind = 'line', x = 'Time', y = 'Confirmed', color = 'blue', ax = ax1, figsize = (20,10))
df_total.plot(kind = 'line', x = 'Time', y = 'Death', color = 'red', ax = ax1, figsize = (20,10))
df_total.plot(kind = 'line', x = 'Time', y = 'Recovered', color = 'Green', ax = ax1, figsize = (20,10))


df_total.plot(kind = 'line', x = 'Time', y = 'Death_ratio', color = 'red', ax = ax2, figsize = (20,10))
df_total.plot(kind = 'line', x = 'Time', y = 'Recovery_ratio', color = 'Green', ax = ax2, figsize = (20,10))

fig.text(0.4,0.05, 'Data Source: Johns Hopkins University', fontsize = 10)

From the two lines chart above, we can tell the virus has high infection rate. The death rate stays at 2% and recovery rate is growing now. Hopefully the situation is getting better. 

# **Simulation** 

Question:

When will the number of confirmed patients reduced to the half of the maximum number of confirmed patients?.

Assumption： 
1. Patients cannot be cured without special treatment involved
2. Patients cannot infect others once dead
3. The daily death rate and recovery rate is distributed normally
4. The simulation is based on the data from WHO. 
5. all age of people have the same level of activity level

In this simulation, I simplify the virus breakout by the model below:

**Initial Phase:** The virus breakout time is on 12/31/2019

there are 41 initial patients on 12/31/2019 according to WHO refer: https://www.sciencedirect.com/science/article/pii/S0140673620301835

*current patients # = (patients_yesterday) x (the number of people met daily) x (1 + infection_rate) x (1 - death_rate)

**Second Phase**: when the special treatment involved and patient can be cured: 

12/31/2019 is the first cases confirmed. 1/9/2020 is the first day when the virus is defined as a new virus.

*current patients # = (patients_yesterday) x (the number of people met daily * 50%) x (1 + infection_rate) x (1 - death_rate) x (1 - recovery_rate)

**Third Phase**: when many cities are closed

1/23/2020 Wuhan government announced that the city is close. 24 days after the first confimed patient

*current patients # = (patients_yesterday) x (the number of people met daily * 10%) x (1 + infection_rate) x (1 - death_rate) x (1 - recovery_rate)

Note: once special treatment involved, people know the virus so that the number of people met will decrease to 50%. Once the city is closed and apartments restricted people's activity, this percentage reduced to 10%


where: 

The Numeber of people met daily without protection: a normal distribution with mean 6.20 and standard deviation 6.43

refer: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0220443

Infection_rate: a range between 1.4% to 2.5% refer: https://www.who.int/news-room/detail/23-01-2020-statement-on-the-meeting-of-the-international-health-regulations-(2005)-emergency-committee-regarding-the-outbreak-of-novel-coronavirus-(2019-ncov)

n: the total days of the disease outbreak. This simulation will use the days of 2002 SARS breakout. (totoal 274 days from 11/15/2002 to 8/16/2003)

refer: https://www.ncbi.nlm.nih.gov/books/NBK92479/

death_rate ~ (μ1, σ1) Derived from current dataset

recovery_rate ~ (μ2, σ2) Derived from current dataset

In [None]:
ini_patient = 41 
num_of_people_met_daily = 7
special_treatement_involved_day = 10  
city_closed_day = 24

Calculate mean and standard deviation for death rate and recovery rate

In [None]:
death_std = np.std(df_total['Death_ratio'])/100
death_mean = np.mean(df_total['Death_ratio'])/100
recovery_std = np.std(df_total['Recovery_ratio'])/100
recovery_mean = np.mean(df_total['Recovery_ratio'])/100

In [None]:
def random_walk(n, t = 10 , c = 24):
    """ return the day that number of confirmed patient reduced to the half of maximum confirmed patients after breakout, n is the number of breakout day after 12/31/2020,
    t is the speical treatment involved day, c is the city closed day"""
    patient = 41
    con_patient = []
    for i in range(n):
        if i < t:
            patient =  patient * np.random.normal(6.2, 6.43) * (1 + np.random.uniform(0.014, 0.025)) * (1 - np.random.normal(death_mean, death_std))
            con_patient.append(patient)
        elif i < c:
            patient = patient * np.random.normal(6.2, 6.43) * 0.5 * (1 + np.random.uniform(0.014, 0.025)) * (1 - np.random.normal(death_mean, death_std)) * (1 - np.random.normal(recovery_mean, recovery_std))
            con_patient.append(patient)
        else: 
            patient = patient * np.random.normal(6.2, 6.43) * 0.1 * (1 + np.random.uniform(0.014, 0.025)) * (1 - np.random.normal(death_mean, death_std)) * (1 - np.random.normal(recovery_mean, recovery_std))
            con_patient.append(patient)
    max_patient = max(con_patient)
    max_patient_loc = con_patient.index(max_patient)
    for i in range(len(con_patient)):
        if i > max_patient_loc and con_patient[i] < max_patient/3:
            break
    return i + 1 - max_patient_loc

In [None]:
random_walk(274)

In [None]:
number_of_simulation = 10000
days_after_max_number = []
for i in range(number_of_simulation):
    days_after_max_number.append(random_walk(274))
days_after_max_number = pd.Series(days_after_max_number)

In [None]:
count_days = days_after_max_number.value_counts()
df_count_days = pd.DataFrame(count_days).reset_index()
df_count_days = df_count_days.rename(columns = {'index': 'days after the date with maximum number of confirmed patients', 0:'Frq'})

In [None]:
df_count_days['Probability'] = df_count_days['Frq']/number_of_simulation
df_count_days

In [None]:
p = df_count_days.plot.bar(x = 'days after the date with maximum number of confirmed patients', y = 'Probability', figsize = (15,8), legend = False)
p.set_title('How many days will the number of confirmed patients reduces to 50% of maximum number', fontsize = 18)
p.set_xlabel('days after the date with maximum number of confirmed patients', fontsize = 14)
p.set_ylabel('Probability', fontsize = 14)
p.text(7,0.02, 'Data Source: Johns Hopkins University', fontsize = 10)

# **Conclusion**

Over 50% chance, the virus will be reduced to 50% in 2 days once the number of confirmed patients reached to maximum number. 

However, I acknowledge that this model is based on necessary assumptions and ignores many other variables. The model can only be used for discussion. 

I hope the disease breakout could be ended soon. 

加油武汉！