---
title: "Corona"
format:
  html:
    code-fold: true
    code-summary: "Show the code"
---

## Preparation and overview

### Imports and options

In [2]:
import os
import pandas as pd
import numpy as np
import textwrap
import matplotlib.pyplot as plt
from module1_s4_functions import *
import numpy.typing as npt
import matplotlib.ticker as mtick
import matplotlib.patches as mpatches
from matplotlib.lines import Line2D
from scipy.stats import gmean
import matplotlib.dates as mdates
import plotly.express as px
import plotly.graph_objects as go
import plotly as py
import plotly.io as pio
from plotly.subplots import make_subplots
from datetime import datetime
pio.renderers.default = "plotly_mimetype+notebook_connected"

### Loading and cleaning

In [3]:
pd.set_option("display.float_format", "{:.3g}".format)
base_fig_wid = 9  # base width of a figure
base_font_size = 12  # base size of a font
dpi = 96
plt.rcParams["font.size"] = base_font_size

Putting the data from separate in .csv files into a dictionary of pandas dataframes

In [4]:
data_files = csv_files = [f for f in os.listdir("data") if f.endswith(".csv")]
data = {}
for file in csv_files:
    file_path = os.path.join("data/", file)
    df = pd.read_csv(file_path)
    data[file[:-4]] = df

1. The Lists of Data Table
1) Case Data
Case: Data of COVID-19 infection cases in South Korea
2) Patient Data
PatientInfo: Epidemiological data of COVID-19 patients in South Korea
PatientRoute: Route data of COVID-19 patients in South Korea (currently unavailable)
3) Time Series Data
Time: Time series data of COVID-19 status in South Korea
TimeAge: Time series data of COVID-19 status in terms of the age in South Korea
TimeGender: Time series data of COVID-19 status in terms of gender in South Korea
TimeProvince: Time series data of COVID-19 status in terms of the Province in South Korea
4) Additional Data
Region: Location and statistical data of the regions in South Korea
Weather: Data of the weather in the regions of South Korea
SearchTrend: Trend data of the keywords searched in NAVER which is one of the largest portals in South Korea
SeoulFloating: Data of floating population in Seoul, South Korea (from SK Telecom Big Data Hub)
Policy: Data of the government policy for COVID-19 in South Korea

Formatting the date data:

In [5]:
date_format = '%Y-%m-%d'
data['Time']['date'] = pd.to_datetime(data['Time']['date'])
data['Time'].set_index('date',inplace=True)


Adding a new cases column:

In [6]:
data['Time']['new_cases'] = data['Time']['confirmed'] - data['Time']['confirmed'].shift(1)

### Preparation of weekly data

Weekly data:

In [7]:
weekly_time=data['Time'].resample('W').last()

New cases per week:

In [8]:
weekly_time["new_cases"] = weekly_time["confirmed"] - weekly_time["confirmed"].shift(
    1
).fillna(0)

Weekly tests done:

In [9]:
weekly_time["weekly_tested"] = weekly_time["test"] - weekly_time["test"].shift(
    1
).fillna(0)

Percentage of positive tests:

In [10]:
weekly_time['positivity'] = weekly_time['new_cases']/weekly_time['weekly_tested']*100

### Time series of total infections

Creating a two day doubling line:

In [11]:
# Create an array of dates starting from the start date
dates = pd.date_range(start='2020-01-20', end='2020-06-30', freq='2D')

# Create an array of confirmed cases that doubles every two days
confirmed_cases = np.power(2, np.arange(len(dates) // 2 + 1))
confirmed_cases = np.tile(confirmed_cases, 2)[:len(dates)]

# Create an array of new cases for each date
new_cases = np.zeros(len(dates))
new_cases[0] = confirmed_cases[0]
# Create a DataFrame with the date, confirmed, and new cases columns
doubling_line = pd.DataFrame({'date': dates, 'confirmed': confirmed_cases})
doubling_line['new_cases'] = doubling_line['confirmed'] - doubling_line['confirmed'].shift(1).fillna(0)

In [12]:
# | label: fig-time-trends
# | fig-cap: "Total confirmed cases and weekly new cases"
fig_trends = go.Figure()
# traces
for index, row in weekly_time.iterrows():
    fig_trends.add_trace(
        go.Scatter(
            name = 'South Korea',
            visible=False,
            mode="markers+lines",
            line=dict(color="blue", width=1.5),
            marker=dict(color="red", size=3),
            x=weekly_time["confirmed"][:index],
            y=weekly_time["new_cases"][:index],
            hovertemplate="<b>Confirmed:</b> %{x}<br>"
            + "<b>New Cases:</b> %{y}<br>"
            + "<b>Date:</b> %{customdata}<extra></extra>",
            customdata=weekly_time[:index].index.astype(str),
        )
    )
#doubling line
fig_trends.add_trace(
    go.Scatter(
        mode="lines",
        line=dict(color="black", width=2, dash="dash"),
        x=doubling_line["confirmed"],
        y=doubling_line["new_cases"],
        hovertemplate="<b>Confirmed:</b> %{x}<br>" + "<b>New Cases:</b>",
        name="2 Day doubling line",
    )
)

fig_trends.data[-2].visible = True

steps = []
for i in range(len(fig_trends.data) - 1):
    visible = [False] * len(
        fig_trends.data
    )  # initialize visible list as a list of booleans
    visible[i] = True  # set the i-th element to True
    visible[-1] = True  # doubling line always visible
    step = dict(
        method="update",
        args=[{"visible": visible}],
        label=weekly_time.index[i].strftime("%Y-%m-%d"),
    )
    steps.append(step)

sliders = [
    dict(
        active=len(fig_trends.data),
        currentvalue={"prefix": "Date: "},
        pad={"t": 50},
        steps=steps,
    )
]

fig_trends.update_layout(sliders=sliders)
fig_trends.update_layout(
    xaxis=dict(
        range=[0.3, 5],
        autorange=False,
        zeroline=False,
        type="log",
        title="Total cases",
    ),
    yaxis=dict(
        range=[0, 4],
        autorange=False,
        zeroline=False,
        type="log",
        title="Weekly New Cases",
    ),
    hovermode="closest",
    width=base_fig_wid * dpi,
    height=base_fig_wid * dpi / 2,
)

fig_trends.show()

### February policies

Using this chart we can pin point the dates of major inflection points of covid-19 spread. The rate of the spread decreased for two weeks starting with 2020-02-02, with a significant drop in the week of 20-02-09 to 20-02-23. We can check what policies were implemented during this period:

In [13]:
data["Policy"].loc[
    data["Policy"]["start_date"].between("2020-01-28", "2020-02-23")
].sort_values("start_date")

Unnamed: 0,policy_id,country,type,gov_policy,detail,start_date,end_date
2,3,Korea,Alert,Infectious Disease Alert Level,Level 3 (Orange),2020-01-28,2020-02-22
4,5,Korea,Immigration,Special Immigration Procedure,from China,2020-02-04,
19,20,Korea,Health,Emergency Use Authorization of Diagnostic Kit,1st EUA,2020-02-04,
5,6,Korea,Immigration,Special Immigration Procedure,from Hong Kong,2020-02-12,
6,7,Korea,Immigration,Special Immigration Procedure,from Macau,2020-02-12,
20,21,Korea,Health,Emergency Use Authorization of Diagnostic Kit,2nd EUA,2020-02-12,
50,51,Korea,Technology,Self-Diagnosis App,,2020-02-12,
3,4,Korea,Alert,Infectious Disease Alert Level,Level 4 (Red),2020-02-23,


During this period several important policies took were implemented:

 * Special immigration procedures for visitors from China, Macau and Hong Kong were implemented, these included visa free entry suspension, quarantine and testing procedures.
 * Authorization of diagnostic kits from two manufacturers alongside a release of a self-diagnosis app.
 * The infectious Disease Alert was raised by two levels orange (2020-01-28) and red (2020-02-23). These levels allowed the government to take special measures. At the orange level more rigorous testing was begun alongside issuing of masks and recommendations for schools. At level red the size of social gathering was limited. Contacts were traced and exposed individuals were isolated.

 Most of these measures increased the rates of testing and social awareness. As a result the number of new cases decreased the following week and then started to sharply increase. The decrease was more likely to be noise in the data as the number of cases was very low at the time. What the policies did achieve was increase the rate of testing and awareness as can be seen from the time line of weekly done tests in @fig-tests-time. The surge in tests also coincided with the surge in confirmed cases, this is means that before the package of policies was implemented it is possible that a large number of cases was not diagnosed prior.

In [33]:
# | label: fig-tests-time
# | fig-cap: "Tests done weekly in blue as a time series, for comparison the number of new cases each week is presented in red. The chart background colors corresponds to Infectious Disease Alert levels"
line_colors = ["blue", "red"]
fig_tests_time_yrange = [0, 3500]
fig_tests_time = two_yaxis_plotly(
    x_values=weekly_time.index,
    y1_values=weekly_time["new_cases"],
    y2_values=weekly_time["weekly_tested"],
    y1_title="New Weekly Cases",
    y2_title="Tested Weekly",
    x_title="Date",
    colors=line_colors,
    size=[base_fig_wid * dpi, base_fig_wid / 2.5 * dpi],
    yrange=fig_tests_time_yrange,
)

annotate_plotly_by_val(
    fig_tests_time,
    datetime.strptime("2020-02-12", date_format),
    text="Diagnostics <br>Immigration",
    ax=-25,
    ay=-50,
)
annotate_plotly_by_val(
    fig_tests_time,
    datetime.strptime("2020-03-01", "%Y-%m-%d"),
    "School<br>Closure",
    ax=-50,
    ay=10,
)
add_alert_background(fig_tests_time)


In [36]:
fig_search_trends_time = px.line(
    data["SearchTrend"],
    x=data["SearchTrend"]["date"],
    y=data["SearchTrend"]["coronavirus"],
    labels={'coronavirus':"Coronavirus Search Trend",'date':'Date'},
    range_x=datetime.strptime('2020-01-01'), weekly_time.index[-1]],
    range_y=[0,100],
    width=base_fig_wid * dpi,
    height=base_fig_wid / 4 * dpi,
    
)

fig_search_trends_time.update_layout(margin=dict(l=90,r=110,t=10,b=0))
add_alert_background(fig_search_trends_time)
fig_search_trends_time.show()

### March policies

The more important inflection point begun in 2020-03-01 that resulted in a steep decline of the spread and commenced the end of the first wave of covid 19 in South Korea.
Here is the list of policies during those weeks:

In [None]:
data["Policy"].loc[
    data["Policy"]["start_date"].between("2020-02-29", "2020-03-12")
].sort_values("start_date")

Unnamed: 0,policy_id,country,type,gov_policy,detail,start_date,end_date
28,29,Korea,Social,Social Distancing Campaign,Strong,2020-02-29,2020-03-21
33,34,Korea,Education,School Closure,Daycare Center for Children,2020-03-02,
34,35,Korea,Education,School Opening Delay,Kindergarten,2020-03-02,2020-04-06
35,36,Korea,Education,School Opening Delay,High School,2020-03-02,2020-04-06
36,37,Korea,Education,School Opening Delay,Middle School,2020-03-02,2020-04-06
37,38,Korea,Education,School Opening Delay,Elementary School,2020-03-02,2020-04-06
25,26,Korea,Health,Drive-Through Screening Center,Standard Operating Procedures,2020-03-04,
51,52,Korea,Technology,Self-Quarantine Safety Protection App,,2020-03-07,
49,50,Korea,Technology,Open API,Public Mask Sales Information,2020-03-08,
7,8,Korea,Immigration,Special Immigration Procedure,from Japan,2020-03-09,


During this period several important policies were implemented:

* A national social distancing campaign was launched that issued guidelines on social distancing on 2020-02-29 that included working from home, wearing masks, staying at home as much as possible.
* Schools were closed on March 2nd.
* Special immigrations procedures were implemented for visitors from largest outbreak countries
* Drive-trough screening centers were opened.

These measures, especially the school closure severely restricted social contacts.
After the implementation of these measure it a sharp decrease in new cases was observed as seen in @fig-tests-time

In [37]:
data['Policy'].sort_values('start_date')

Unnamed: 0,policy_id,country,type,gov_policy,detail,start_date,end_date
0,1,Korea,Alert,Infectious Disease Alert Level,Level 1 (Blue),2020-01-03,2020-01-19
48,49,Korea,Technology,Open Data,Patients Information,2020-01-20,
1,2,Korea,Alert,Infectious Disease Alert Level,Level 2 (Yellow),2020-01-20,2020-01-27
2,3,Korea,Alert,Infectious Disease Alert Level,Level 3 (Orange),2020-01-28,2020-02-22
4,5,Korea,Immigration,Special Immigration Procedure,from China,2020-02-04,
...,...,...,...,...,...,...,...
58,59,Korea,Transformation,Wearing of masks,Drivers such as buses and taxis can refuse to ...,2020-05-26,
57,58,Korea,Transformation,Wearing of masks,"Mandatory wearing of passenger mask domestic, ...",2020-05-27,
60,61,Korea,Health,Extends Tightened Quarantine Measures,Gov't Extends Tightened Quarantine Measures in...,2020-05-28,2020-06-14
56,57,Korea,Transformation,Logistics center,On-site inspection of major logistics faciliti...,2020-05-29,2020-06-11


In [None]:
# Create a main plot
fig = go.Figure()
fig.add_trace(go.Scatter(x=[1, 2, 3], y=[4, 5, 6]))

# Create a subplot that shares the x-axis with the main plot
fig.add_trace(go.Scatter(x=[1, 2, 3], y=[7, 8, 9]), row=2, col=1)
fig.update_yaxes(title_text="Subplot Y-Axis", row=2, col=1)
fig.update_layout(height=600, width=800, title_text="Main Plot Title")

# Share the x-axis between the main plot and the subplot
fig.update_layout(xaxis=dict(domain=[0, 1], tickmode='linear'))
fig.update_layout(xaxis2=dict(domain=[0, 1], tickmode='linear'))
fig.update_layout(yaxis=dict(domain=[0.3, 1]))
fig.update_layout(yaxis2=dict(domain=[0, 0.2]))
fig.update_layout(xaxis2={'anchor': 'y2'})

fig.show()

Exception: In order to reference traces by row and column, you must first use plotly.tools.make_subplots to create the figure with a subplot grid.

In [None]:
fig_new_time = px.line(weekly_time, x=weekly_time.index, y='positivity', width=base_fig_wid*dpi, height=base_fig_wid*dpi/2)
fig_new_time.show()