# TABLE OF CONTENTS

* [1. INTRODUCTION](#section-one)
* [2. SETUP](#section-two)
    - [2.1 Download/Draw Packages](#subsection-two-one)
    - [2.2 Import and Wrangle Data](#subsection-two-two)
* [3. STORY](#section-three)
    - [3.1 Question 1: When are the plateau and inflection points for your chosen country? How do these dates affect your portfolio allocation and selection?](#subsection-three-one)
    - [3.2 Question 2: What do you learn from the general negative runs in your chosen country? Which specific negative run is the closest (in nature and context) to Covid19? What do you learn from this specific run? What are the caveats?](#subsection-three-two)
      - [3.2.1 How do we calculate the duration (peak-to-peak, peak-to-trough, and trough-to-peak) and maximum drawdown (loss) for each negative run?](#subsection-three-two-one)
      - [3.2.2 How do the different types of market downturns (by severity) behave?](#subsection-three-two-two)
* [4. CONCLUSION](#section-four)

<a id="section-one"></a>
# 1. INTRODUCTION

The first part of this notebook tests the infection and plateau points of Singapore pedametic situation in the next two years.Then we apply the relative results to the portfolo allocation and selection. The second part of this notebook focus on all negative runs of Singapore stock market from 1995 to 2020, which we believe is an important indicator for the portfolio allocation.

<a id="section-two"></a>
# 2. SETUP

<a id="subsection-two-one"></a>
## 2.1 Download/Draw Packages

In [None]:
# Data Wrangling
import numpy as np
import pandas as pd
import datetime as dt

# Visualization
import seaborn as sns
import matplotlib.pyplot as plt

### new
# data wrangling
import pandas as pd
import numpy as np
from datetime import datetime, date, timedelta

# data visualization
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px

# offline interactive visualization
from plotly.offline import plot, iplot, init_notebook_mode
init_notebook_mode(connected=True)

# regression
import statsmodels.api as sm
from statsmodels.formula.api import ols
import statsmodels.graphics.api as smg

import warnings
warnings.filterwarnings("ignore")


# color pallette
# Hexademical code RRGGBB (True Black #000000, True White #ffffff)
cnf, dth, rec, act = '#393e46', '#ff2e63', '#21bf73', '#fe9801' 

<a id="subsection-two-two"></a>
## 2.2 Import and Wrangle Data

In [None]:
# Import and wrangle with stock_ret dataset
stock_ret = pd.read_csv('../input/stimonthly/STI-Month.csv')
stock_ret['Date'] = pd.to_datetime(stock_ret['Date'],dayfirst=True)
# stock_ret.tail()

# Calculate monthly Total Returns for the STI (excluding dividends)
stock_ret['mth_return'] = stock_ret['Close']/stock_ret['Close'].shift(1) - 1

# Analysis post 1995 (i.e., 1995 Nov onwards)
stock_ret['cum_return'] = np.cumprod(stock_ret['mth_return']+1)
# stock_ret.info()
# stock_ret.tail()

In [None]:
# Worldometer data
worldometer_data = pd.read_csv('../input/corona-virus-report/worldometer_data.csv')
# Replace missing values '' with NAN and then 0
worldometer_data = worldometer_data.replace('', np.nan).fillna(0)
# Correcting Country name 
worldometer_data['Country/Region'].replace({'USA':'US', 'UAE':'United Arab Emirates', 'S. Korea':'South Korea', \
                                           'UK':'United Kingdom'}, inplace=True)

# Grouped by day, country
full_grouped = pd.read_csv('../input/corona-virus-report/full_grouped.csv')
# Merge in population data
full_grouped = full_grouped.merge(worldometer_data[['Country/Region', 'Population']], how='left', on='Country/Region')
full_grouped['Date'] = pd.to_datetime(full_grouped['Date'], format = '%Y-%m-%d')
#full_grouped.tail()
# After check it, breakdown to SG only, create another dataframe called "full_grouped_SG_Only":
full_grouped_SG_Only = full_grouped.loc[full_grouped["Country/Region"].isin(['Singapore'])]


# golbal data
global_data=full_grouped.groupby('Date')[['Confirmed','Deaths','Recovered','Active','Population']].sum()
global_data.reset_index(inplace=True)
# global_data.tail()




<a id="section-three"></a>
# 3. STORY

<a id="subsection-three-one"></a>
## 3.1 Question 1: When are the plateau and inflection points for your chosen country? How do these dates affect your portfolio allocation and selection?

In [None]:
# Collapse Country, Date observations to Date observations and reindex
# Similar to world chart, can wtire a function to do this and just call, but notebook not scrip so make it clear, write it out
active_singapore_trend = full_grouped_SG_Only.groupby('Date')['Recovered', 'Deaths', 'Active'].sum().reset_index()

# Melt the data by the value_vars, bascially keep the date and make status as one column, cases become another column
active_singapore_trend = active_singapore_trend.melt(id_vars="Date", value_vars=['Deaths', 'Active', 'Recovered'],
                 var_name='Case', value_name='Count')

# Plot the general chart in the ways that as time goes by, what is the case situation
fig = px.area(active_singapore_trend, x="Date", y="Count", color='Case', height=600, width=700,
             title='Cases over time', color_discrete_sequence = [rec, dth, act])
fig.update_layout(xaxis_rangeslider_visible=True)
fig.show()

In [None]:
# date = date of the most recent subwave of covid19 to project into the future
# date format yyyy-mm-dd, e.g., '2020-07-04'

def plot_country(country, date): 
    temp = full_grouped[full_grouped['Country/Region']==country]
    temp['recent_wave'] = np.where(temp['Date'] >= date,1,0)

    fig = px.line(temp, x='Date', y='Confirmed', color='recent_wave', \
                  title = 'Infections for ' + str(country), height=600)      
    fig.show()
    
    fig = px.line(temp, x='Date', y='Recovered', color='recent_wave', \
              title = 'Recovered Patients ' + str(country), height=600)      
    fig.show()
    
    return country, date

In [None]:
country, date = plot_country('Singapore', '2020-04-01')

In [None]:
# Generate the global trend of active, recover and death chart
# Collapse Country, Date observations to Date observations and reindex
active_total_trend = full_grouped.groupby('Date')['Recovered', 'Deaths', 'Active'].sum().reset_index()

# Melt the data by the value_vars, bascially keep the date and make status as one column, cases become another column
active_total_trend = active_total_trend.melt(id_vars="Date", value_vars=['Recovered', 'Deaths', 'Active'],
                 var_name='Case', value_name='Count')

# Plot the general chart in the ways that as time goes by, what is the case situation
fig = px.area(active_total_trend, x="Date", y="Count", color='Case', height=600, width=700,
             title='Cases over time', color_discrete_sequence = [rec, dth, act])
fig.update_layout(xaxis_rangeslider_visible=True)
fig.show()

In [None]:
def plot_world(date): 
    temp = global_data[:]
    temp['recent_wave'] = np.where(temp['Date'] >= date,1,0)

    fig = px.line(temp, x='Date', y='Confirmed', color='recent_wave', \
                  title = 'Infections (World)', height=600)      
    fig.show()
    
    fig = px.line(temp, x='Date', y='Recovered', color='recent_wave', \
              title = 'Recovered Patients (World) ', height=600)      
    fig.show()
    
    return date
date = plot_world('2020-03-20')

#### Note:

These two graph did not give us specific indication on specific situation in SG compare to the world unless to take closer look into the numbers on Y axis. Therefore, we use the later part to have better explaination on the graph.

Model:  
\begin{align*}
\mathrm{S} \overset{\beta I}{\longrightarrow} \mathrm{I} \overset{\gamma}{\longrightarrow} \mathrm{R}  \\
\end{align*}

$\beta$: Effective contact rate or transmission rate [per day basis]  
$\gamma$: Recovery(and mortality) rate [per day basis]  

Ordinary Differential Equation (ODE):  
\begin{align*}
& \frac{\mathrm{d}S}{\mathrm{d}T}= - \beta S I / N \\
& \frac{\mathrm{d}I}{\mathrm{d}T}= \beta S I / N - \gamma I  \\
& \frac{\mathrm{d}R}{\mathrm{d}T}= \gamma I  \\
\end{align*}

Where $N=S+I+R$ is the total population, $T$ is the elapsed time from the start date.

In [None]:
# Calibrate model

def estimate_sir_param(country, date):
    
    # Assume everyone is at risk
    # Identify the maximum population and the latest date in the time series for the country
    population  = full_grouped[full_grouped['Country/Region']==country]["Population"].max()
    latest_date = full_grouped[full_grouped['Country/Region']==country]["Date"].max()
    
    time_series_length = (latest_date - datetime.strptime(date,'%Y-%m-%d')).days + 1

    temp = full_grouped[full_grouped['Country/Region']==country]
    temp['recent_wave'] = np.where(temp['Date'] >= date,1,0)
    
    # Initialize Numpy arrays for total population (the maximum population), 
    # susceptible population (empty), and change in time (i.e., 1 day)
    N  = np.array([population] * time_series_length)
    S  = np.array([])
    dt = np.array([1] * (time_series_length-1))

    # Apply the condition N = S+I+(R+D)
    # Filter time-series to those of the recent wave
    I = np.array(temp[temp['recent_wave']==1]['Active'])
    R = np.array(temp[temp['recent_wave']==1]['Recovered'])
    D = np.array(temp[temp['recent_wave']==1]['Deaths'])

    # R includes both Recovered and Death for brevity
    S = N - I - (R + D)

    ## 1. Estimate beta
    
    x = (S * I) / N
    
    # Copy all elements except the last
    x = x[:-1].copy()
    
    # Take the first difference
    dS = np.diff(S)
    y = dS/dt

    # Fit into a linear regression
    results = sm.OLS(y, x, missing='drop').fit()
    beta = results.params
    print(results.summary())
    print('\n')
    print('*'*80)
    print(f"Transmission rate or Beta is: {beta}")
    print('*'*80)
    
    ## 2. Estimate gamma
    
    x = I[:-1].copy()
    dR = np.diff(R+D)
    y = dR/dt

    results = sm.OLS(endog=y, exog=x, missing='drop').fit()
    gamma = results.params
    print (results.summary())
    print('\n')
    print('*'*80)
    print(f"Recovery (and Mortality) rate or Gamma is: {gamma}")
    print('*'*80)
    
    #3. Calculate R

    print('\n')
    print('*'*80)
    print(f"Reproduction number or R is: {-beta/gamma}")
    print('*'*80)
    
    return -beta.astype('float'), gamma.astype('float'), datetime.strptime(date,'%Y-%m-%d').date()


In [None]:
beta, gamma, date = estimate_sir_param("Singapore", date)

We are able to see although the DW test is not as significant, the rest test are okay for Singapore

In [None]:
# Calibrate model

def estimate_global_sir_param(date):
    
    # Assume everyone is at risk
    # Identify the maximum population and the latest date in the time series for the country
    population  = global_data["Population"].max()
    latest_date = global_data["Date"].max()
    
    time_series_length = (latest_date - datetime.strptime(date,'%Y-%m-%d')).days + 1

    temp = global_data[:]
    temp['recent_wave'] = np.where(temp['Date'] >= date,1,0)
    
    # Initialize Numpy arrays for total population (the maximum population), 
    # susceptible population (empty), and change in time (i.e., 1 day)
    N  = np.array([population] * time_series_length)
    S  = np.array([])
    dt = np.array([1] * (time_series_length-1))

    # Apply the condition N = S+I+(R+D)
    # Filter time-series to those of the recent wave
    I = np.array(temp[temp['recent_wave']==1]['Active'])
    R = np.array(temp[temp['recent_wave']==1]['Recovered'])
    D = np.array(temp[temp['recent_wave']==1]['Deaths'])

    # R includes both Recovered and Death for brevity
    S = N - I - (R + D)

    ## 1. Estimate beta
    
    x = (S * I) / N
    
    # Copy all elements except the last
    x = x[:-1].copy()
    
    # Take the first difference
    dS = np.diff(S)
    y = dS/dt

    # Fit into a linear regression
    results = sm.OLS(y, x, missing='drop').fit()
    beta = results.params
    print(results.summary())
    print('\n')
    print('*'*80)
    print(f"Transmission rate or Beta is: {beta}")
    print('*'*80)
    
    ## 2. Estimate gamma
    
    x = I[:-1].copy()
    dR = np.diff(R+D)
    y = dR/dt

    results = sm.OLS(endog=y, exog=x, missing='drop').fit()
    gamma = results.params
    print (results.summary())
    print('\n')
    print('*'*80)
    print(f"Recovery (and Mortality) rate or Gamma is: {gamma}")
    print('*'*80)
    
    #3. Calculate R

    print('\n')
    print('*'*80)
    print(f"Reproduction number or R is: {-beta/gamma}")
    print('*'*80)
    
    return -beta.astype('float'), gamma.astype('float'), datetime.strptime(date,'%Y-%m-%d').date()

In [None]:
beta, gamma, date = estimate_global_sir_param('2020-03-20')

We are able to observed that although Beta's DB test fails, the rest test still okay to use the character.

In [None]:
def sir_model(I0=0.01, beta=0.6, gamma=0.1, days=365, date=date.today()):
    """
    Function will take in initial state for infected population,
    Transmission rate (beta) and recovery rate(gamma) as input.
    
    The function returns the maximum percentage of infectious population,
    the number of days to reach the maximum (inflection point),
    the maximum percentage of population infected,
    the number of days to reach 80% of the maximum percentage of population infected.
    
    """
    ## Initialize model parameters
    N = 1          #Total population in percentage, i.e., 1 = 100%
    I = I0         #Initial state of I default value 1% of population, i.e., I0 = 0.01
    S = N - I      #Initial state of S
    R = 0          #Initial State of R
    C = I          #Initial State of Total Cases
    beta  = beta   #Transmission Rate
    gamma = gamma  #Recovery Rate

    ## Initialize empty lists
    inf  = []       # List of Infectious population for each day
    day  = []       # Time period in day
    suc  = []       # List of Susceptible population for each day
    rec  = []       # List of Recovered population for each day
    conf = []       # List of Total Cases population for each day
    
    ## Project into the future
    for i in range(days):
        day.append(i)
        inf.append(I)
        suc.append(S)
        rec.append(R)
        conf.append(C)

        new_inf= I*S*beta/N            #New infections equation (1)   
        new_rec= I*gamma               #New Recoveries equation (2)
        
        I=I+new_inf-new_rec            #Total infectious population for next day
        S=max(min(S - new_inf, N), 0)  #Total infectious population for next day
        R=min(R + new_rec, N)          #Total recovered population for next day
        
        C=C+new_inf                    #Total confirmed cases for next day

    ## Pinpoint important milestones    
    max_inf = round(np.array(inf).max()*100,2)        #Peak infectious population in percentage
    inflection_day = inf.index(np.array(inf).max())   #Peak infectious population in days
    max_conf = round(np.array(conf).max()*100,2)      #Overall infected population in percentage
    plateau_day = np.array(np.where(np.array(conf) >= 0.8*np.array(conf).max())).min()   #Peak infectious population in days
        
    print(f"Maximum Infectious population at a time :{max_inf}%")
    print(f"Number of Days to Reach Maximum Infectious Population (Inflection Point):{inflection_day} days or {date + timedelta(days=inflection_day)}")
    print(f"Total Infected population :{max_conf}%")
    print(f"Number of Days to Reach 80% of the Projected Confirmed Cases (Plateau Point):{plateau_day} days or {date + timedelta(days=plateau_day.item())}")

    ## Visualize the model outputs
    sns.set(style="darkgrid")
    plt.figure(figsize=(10,6))
    plt.title(f"SIR Model: R = {round(beta/gamma,2)}", fontsize=18)
    sns.lineplot(day,inf, label="Infectious")
    sns.lineplot(day,suc,label="Succeptible")
    sns.lineplot(day,rec, label="Recovered")
    
    plt.legend()
    plt.xlabel("Time (in days)")
    plt.ylabel("Fraction of Population")
    plt.show()

In [None]:
# Compute the SIR Model for Singapore
sir_model(I0=0.000874, beta = 0.03839909, gamma = 0.0380892, days=730, date = date)

In [None]:
# Compute the SIR model for the world
sir_model(I0=0.001042, beta = 0.03996787, gamma = 0.02675253, days=1000, date = date)

Conclusion is, SG situation is much better than the world, however, SG economy is affected by the world.

Therefore, for portfolio allocation, we need to dive deeper into negative runs figure.

<a id="subsection-three-two"></a>
## 3.2 Question 2: What do you learn from the general negative runs in your chosen country? Which specific negative run is the closest (in nature and context) to Covid19? What do you learn from this specific run? What are the caveats?

In [None]:
# Calculate the negative runs in the STI (i.e., from one peak to another)
# Initialize an empty list for cumulative returns from one peak to another 
stock_ret_2010 = stock_ret
stock_ret_2010 = stock_ret_2010[stock_ret_2010['Date']>="2010-01-01"]

def neg_run_func(stock_ret):
    neg_run = []

    # Store the previous maximum cumulative return
    max_cum_ret_now = stock_ret['cum_return'].iloc[0]   

    # enumerate() method adds counter (t) to an iterable (stock_ret['mth_return']) and 
    # returns a tuple (t, stock_ret['mth_return'])
    for t, val in enumerate(stock_ret['mth_return']):

        # First return in the monthly return series
        if t == 0:

            # If monthly return is negative
            if val < 0:

                # Append the negative return to neg_run list
                neg_run.append(val)

            else:

                # Append a zero to neg_run list
                neg_run.append(0)

        # Not the first return in the monthly return series
        else:

            # If the cumulative return at time t is less than the previous maximum cumulative return
            # i.e., the previous all time high
            if stock_ret['cum_return'].iloc[t] < max_cum_ret_now:

                # cumulate/compound the return at time t with the return at time t-1
                # i.e., tally the loss
                neg_run.append((1 + neg_run[t-1])*(1 + val) - 1) 

            # If the cumulative return at time t is more than the previous maximum cumulative return
            else:

                # stop the loss tally and append a zero to the negative run list
                neg_run.append(0)                                

                # replace the previous all time high with the new high
                max_cum_ret_now = stock_ret['cum_return'].iloc[t]

    # Add the variable to the dataframe stock_ret
    stock_ret['neg_run'] = neg_run
    
    return stock_ret

stock_ret = neg_run_func(stock_ret)
stock_ret_2010 = neg_run_func(stock_ret_2010)
# stock_ret.tail(10)

In [None]:
# Plot the STI time series
sns.lineplot(x='Date', y='Close', data=stock_ret, color='red')

In [None]:
# Plot the STI time series after 2010
sns.lineplot(x='Date', y='Close', data=stock_ret_2010, color='red')

In [None]:
# Plot the peak-to-peak negative run
sns.lineplot(x='Date', y='neg_run', data=stock_ret, color='red')
print(stock_ret[stock_ret["mth_return"] < 0].count())

In [None]:
# Plot the peak-to-peak negative run after 2010
sns.lineplot(x='Date', y='neg_run', data=stock_ret_2010, color='red')
print(stock_ret_2010[stock_ret_2010["mth_return"] < 0].count())

Implication from this graph as negative runs from 2008 is too long and not suitable for calculation and it's only half a size from 2008's situation. Therefore, we are approximate situation from 2010+ for this negative run in 2019. The reason behind is that the netative runs from the 2002 SARS is similar to current situation which did not take as long as 2008's negative run which hurting SG economy badly to recover.

<a id="subsection-three-two-one"></a>
### 3.2.1 How do we calculate the duration (peak-to-peak, peak-to-trough, and trough-to-peak) and maximum drawdown (loss) for each negative run?

In [None]:
# Subs the 2010 and after negative runs for later calculation as it better our prediction for future
stock_ret = stock_ret_2010

# Recap that a neg_run is the peak-to-peak run 
# Identify and label each neg_run sequentially (e.g., the 10th neg_run is tagged as 10)
# The label serves as the groupby variable to examine the characteristics of each run

# Initialize label value
label = 1

# Initialize the indicator value of whether stock_ret['neg_run'] (or loss tally) is within a peak-to-peak run
within_negative_run = False

# Initialize an empty list for negative run number
neg_run_num = []

# Identify and label each cycle of negative run, which ends with a zero
# The cumulative return (or loss tally) during the cycle is negative
for i in stock_ret['neg_run']:
    
    # Loss tally is negative
    if i < 0:
        
        # Append the label to neg_run_num list
        neg_run_num.append(label)
        
        # Switch the state for within_negative_run
        within_negative_run = True
        
    # Loss tally is zero - negative run ends
    else:
        
        # Append a zero to neg_run_num list
        neg_run_num.append(0)
        
        # Increment label value by 1 if within_negative_run is True
        # This happens only for a 'new' cycle of negative run
        # The label doesn't increment by 1 in market run-up after the exit from a negative run
        # i.e., reaching new all-time highs after exiting from a cycle of negative run
        if within_negative_run:
            label += 1
            within_negative_run = False
            
stock_ret['neg_run_num'] = neg_run_num

In [None]:
# Identify and label each peak (previous all time high) to trough (the lowest point) within each peak-to-peak run
# This is also known as the maximum drawdown
# The integer label runs sequentially (e.g., the 10th peak-to-trough is tagged as 10)

# Initialize the label value
label = 1

# Initialize the search status of whether the lowest point within a negative run has been discovered
is_neg_run_min = False

# Initialize an empty list for peak-to-trough run number
peak_trough_num = []

for t, val in enumerate(stock_ret['neg_run_num']):
    
    # Identify the lowest point (i.e., cumulated returns) within a negative run
    trough = min(stock_ret[stock_ret['neg_run_num']==val]['neg_run'])
    
    # Recap that if the cumulative return at time t is more than the previous maximum cumulative return
    # The loss tally will stop with a zero appended to the negative run list (i.e., the negative run has ended)
    # neg_run_num will also be appended with a zero when neg_run is zero

    # While still within a peak-to-peak negative run
    if val > 0:
        
        # Append zero to peak_trough_num if the lowest point has been discovered
        if is_neg_run_min:
            peak_trough_num.append(0)
            
        # Lowest point within a negative run has not been discovered
        else:
            if stock_ret.iloc[t]['neg_run'] == trough:
                is_neg_run_min = True
                peak_trough_num.append(val)
            else:
                peak_trough_num.append(val)
                
    # Out of the peak-to-peak negative run
    else:
        is_neg_run_min = False
        peak_trough_num.append(val)
            
stock_ret['peak_trough_num'] = peak_trough_num

In [None]:
# Groupby's to check out the durations and maximum loss or drawdown of each market decline identified
# There are 82 peak-to-peak negative runs

# By peak-to-peak run number, count the number of months 
run_len = stock_ret[stock_ret['neg_run_num']>0].groupby('neg_run_num').count()['neg_run']

# By peak-to-peak run number, count lowest cumulative returns (i.e., maximum drawdown)
maximum_drawdown = stock_ret[stock_ret['neg_run_num']>0].groupby('neg_run_num').min()['neg_run']

# By peak-to-trough run number, count the number of months
peak_trough_dur = stock_ret[stock_ret['peak_trough_num']>0].groupby('peak_trough_num').count()['neg_run']

fig, ax = plt.subplots(3)
ax[0].plot(run_len.sort_values(ascending=False).reset_index(drop=True))
ax[0].set_title("Time between Two Peaks (Months)")
ax[1].plot(peak_trough_dur.sort_values(ascending=False).reset_index(drop=True))
ax[1].set_title("Time to Maximum Drawdown (Months)")
ax[2].plot(maximum_drawdown.sort_values(ascending=False).reset_index(drop=True))
ax[2].set_title("Maximum Drawdown (%)")
fig.tight_layout()

In [None]:
# Store groupby results in a new dataframe with the 82 runs
declines_df = pd.DataFrame()

declines_df['run_len'] = run_len
declines_df['maximum_drawdown'] = maximum_drawdown
declines_df['peak_trough_dur'] = peak_trough_dur

# declines_df.tail(10)

<a id="subsection-three-two-two"></a>
### 3.2.2 How do the different types of market downturns (by severity) behave? 

In [None]:
# Create 6 buckets by the magnitude of drawdown
drawdown_bin = []
for i in maximum_drawdown:
    if i >= 0.00:
        drawdown_bin.append(0)
    elif i >= -0.05:
        drawdown_bin.append(1)
    elif i >= -0.10:
        drawdown_bin.append(2)
    elif i >= -0.20:
        drawdown_bin.append(3)
    elif i >= -0.30:
        drawdown_bin.append(4)
    else:
        drawdown_bin.append(5)

declines_df['drawdown_bin'] = drawdown_bin
# declines_df.tail(10)

In [None]:
# Overall means for drawdown metrics
np.mean(declines_df)

In [None]:
# Count the number of drawdowns in each drawdown bucket
declines_df.groupby('drawdown_bin').count()['run_len']

In [None]:
# Plot the number of declines in each magnitude bucket in probability term

# Calculate the probability of being in a drawdown bin relative to all drawdown bins
prob_bucket = declines_df.groupby('drawdown_bin').count()['run_len']/sum(declines_df.groupby('drawdown_bin').count()['run_len'])

# Plot the probabilities for each drawdown bin
fig, ax = plt.subplots(figsize=(10,6))
bin_names = ['-5% or Better','-5% to -10%','-10% to -20%','-20% to -30%', '-30% or Worse']
sns.barplot(x=prob_bucket, y=bin_names);
ax.set_xlabel("Probability",fontsize=14)
ax.set_ylabel("Drawdown Bin",fontsize=14)

# Probability is between 0 and 1 - limit the range of possible value for x-axis
ax.set_xlim(0, 1)

plt.tight_layout()

In [None]:
# What happens after the market has already dropped by 5%

# Calculate the probability for 
worst_probs = prob_bucket[1:]/sum(prob_bucket[1:])

# probability of decline more than 10%
print("The probability of a further decline of more than 10% is", sum(worst_probs[1:]))     


In [None]:
# Calculate the mean maximum drawdown for each drawdown bucket of negative runs 
declines_df.groupby('drawdown_bin').mean()['maximum_drawdown']

In [None]:
# Calculate the metrics of each drawdown bucket and store in a dataframe for plots

# Calculate the peak-to-peak and peak-to-trough duration for each run
duration_df = declines_df.groupby('drawdown_bin').mean()[['peak_trough_dur','run_len']]
duration_df.reset_index(inplace=True)

# Time to recover (in months)
duration_df['recover_dur'] = duration_df['run_len'] - duration_df['peak_trough_dur']

# Time to recover relative to time to the trough
duration_df['recover_to_peak_trough_ratio'] = duration_df['recover_dur'] / duration_df['peak_trough_dur']

In [None]:
# Plot the metrics
fig, ax = plt.subplots(figsize=(10,6))
bin_names = ['-5% or Better','-5% to -10%','-10% to -20%','-20% to -30%', '-30% or Worse']
sns.barplot(x=bin_names, y=duration_df['recover_dur'])
ax.set_xlabel("Market Decline Bin",fontsize=14)
ax.set_ylabel("Recovery Time in Months",fontsize=14)

plt.tight_layout()

In [None]:
# Print out the duration df to see the time needs to be used to recover
duration_df

Market decline of more than 30% (5th bins) takes disproportionately more time to recover than market decline of more than 30%. For example, market downturns of -30% or worse take an average of 2.1977 times longer than the time to from the peak to the trough to recover.

In [None]:
# Calculate the Number and percentage of negative months
print("The number of negative monthly returns: ", len([i for i in stock_ret['mth_return'] if i<0]))
print("The number of monthly returns: ", stock_ret.shape[0])
print("The fraction of negative monthly returns: ", len([i for i in stock_ret['mth_return'] if i<0])/stock_ret.shape[0])

In [None]:
# Calculate the Mean length of drawdown
print("The average length of peak-to-trough market downturn: ", np.mean(declines_df['peak_trough_dur']), "months")

<a id="section-four"></a>
# 4. CONCLUSION

- Base on the predicted Covid-19 situation in the next coming two years, we believe that the padamedic situation in Singapore is fully controlled since the beta/gamma ratio is only 1.01 and maximum infectious population is only 0.09%.

- Therefore, we think it is a positive sign for Singapore based industries, as well as our portfolio with Singapore companies. However,the Singapore stock market reached its peak at 2018 and then it takes more than 140 months to recover and till now it is still below the previous peak. 

- In addition, there are 6 negative runs from 1995 to 2020, 3 of 6 have average more than -57% drawdown. 

- And the ratio of recover to peak-through of the last negative run is 8.75, which indicates that the recovery time is 8.75 times longer than the time from peak to trough.

- Therefore, from the negative-run prospective, we believe Singapore stock market is not an optimal investment location, and our suggestion is to keep more cash and seek oppotunities overseas. 

- Furthermore, we estimate the potential reason for such situation in Singapore stock market is that the liquidity of Singapore stock market is much lower than other world's main stock markets so that it reached the trough within 16 months during the financial crisis at 2008 but hasn't recovered till today.