## Causal Impact on Samsung Electronics Share Price

Samsung Electronics' formal president, Gunhee Lee, passed away on last 25 October 2020. It was forecasted by experts that his death would cause a significant change in the company's stock price. In order to find out whether there was any impact of the event, we are going to use Bayesian Structured Time Series method (BSTS) below.

### Contents
1. Introduction
2. Causal Impact Analysis
 - Step 1. Local Trend
 - Step 2. Seasonality
 - Step 3. Regression
3. Conclusion

### 1. Introduction

#### What is Bayesian Structured Time Series?
BSTS 원리. counterfactual나오고

The method is different from A/B testing, in terms that the data is mainly gathered by natural observation, rather than by experiments between the certain timeframe. (e.g. how shipping fee increase has changed total payment amount, how competitor's marketing strategy changes have impacted our sales..) Since its wide range of usage in measuring the effect of events/campaigns retrospectively, it is actively being used in marketing analytics field.


#### What should I provide?

The model measures pure impact of an event, apart from the changes of local linear trend, seasonal effect and regression effects. Therefore it is required for us to key in local trend, seasonality and covariates as parameters, respectively.

We also need to define training period, which is known as past data that our model uses as a baseline to predict future, and treatment period (as known as future data that will be used ) that we use 여기도 채우기 

#### How does the method works?

Below is the equation of the basic BSTS;

    Yt=μt+xtβ+St+et,et∼N(0,σ2e)
    μt+1=μt+νt,νt∼N(0,σ2ν)

Here ```xt``` denotes a set of regressors, ```St``` represents seasonality, and ```μt``` is the local level term. The local level term defines how the latent state evolves over time and is often referred to as the unobserved trend. This could, for example, represent an underlying growth in the brand value of a company or external factors that are hard to pinpoint, but it can also soak up short term fluctuations that should be controlled for with explicit terms. Note that the regressor coefficients, seasonality and trend are estimated simultaneously, which helps avoid strange coefficient estimates due to spurious relationships. ```xtβ```, a linear regression of covariates, help to explain our data further. This dataset does not have any regressors yet, so we’ll add covariates later in order to fit the Bayesian structural model.

In [1]:
import yfinance as yf
import matplotlib.pyplot as plt
import pandas as pd
from causalimpact import CausalImpact
from sklearn.preprocessing import MinMaxScaler
from statsmodels.tsa.seasonal import seasonal_decompose
import warnings
warnings.filterwarnings("ignore")

In [9]:
training_start = '2016-1-04'
training_end = '2020-10-23'
treatment_start = '2020-10-26'
treatment_end = '2021-1-22'
end_date = "2021-1-23"

y = ['005930.KS']
df_samsung = yf.download(y, start = training_start, threads = False, end = end_date, interval = '1d')

df_samsung = df_samsung['Adj Close'].rename('Samsung')
df_samsung

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

In [None]:
df_samsung.plot(lw=2., figsize=(14,6), label='Adj. Close Price')
df_samsung.rolling(30).mean().plot(lw=1.5, label='Roll_30d',c='orange')
df_samsung.rolling(90).mean().plot(lw=1.5, label='Roll_90d', c='salmon')
plt.title('Samsung Electronics Stock Price (2018-2020)')
plt.ylabel('Close Price')
plt.axvline(x = 18559, color = 'r')
plt.grid()
plt.legend(); 

Observations: 
- Even though the price headed downward along the year of 2019, there was increasing trend over the past 5 years overall.
- The slope of increase got steeper since the beginning of 2021, while the increase was rather slower until then.

### 2. Causal Import Analysis
Now that we roughly know that the stock price has been significantly increasing since 2021, we would like to understand whether or not Mr.Lee's death would be responsible for this change. Let's find out how much the event account for the stock price increase. 
We are going to start the analysis by running the model when only local trend is considered in the computation process.Lastly, we are going to find exogenous variables that influence Samsung Electronic's management conditions. 

#### Step 1. Adding Local Level Only
We are going by running the model when only local trend is considered in the computation process.

In [None]:
pre_period = [training_start, training_end] 
post_period = [treatment_start, treatment_end] 
impact = CausalImpact(df_samsung, pre_period = pre_period, post_period = post_period)
impact.plot()

- The observed ‘post-event’ time series and fitted model’s forecast counterfactual
- The pointwise causal effect, as estimated by the model. This is the difference between the observed outcome and the predicted outcome.
- The cumulative effect is the total amount of difference accumulated until certain point of time.

In [None]:
print(impact.summary())

- Before you read a summary, make sure to focus on Average column instead of Cumulative. We are more interested in average change of stock price than the cumulative difference of stock price change.
- Our model has concluded that Mr.Lee's death has undoubtedly increase stock price by KRW 12,110 (20.68%), with statistical significance of 0. 
- Do note that this result is not reliable enough, since seasonal effect and covariate effect are not reflected.

#### Step 2. Adding Seasonal Impact
We have learn that the event is definitely responsible for the price change, but this is not delicate enough. There are far more variables that we can add to improve our model, and seasonality is one of them. Once we learn any seasonality lying in the business, we are going to add a seasonality parameter to see how absolute/relative effect in summary has changed.

In [None]:
samsung = seasonal_decompose(df_samsung, period=250)
seasonal_2016 = samsung.seasonal[0:245]
seasonal_2017 = samsung.seasonal[246:484]
seasonal_2018 = samsung.seasonal[485:726]
seasonal_2019 = samsung.seasonal[727:989]

fig, ax = plt.subplots(4, figsize=(14,10))
ax[0].plot(seasonal_2016, lw=2., color='royalblue')
ax[0].set_title('Seasonal Signal, 2016')
ax[1].plot(seasonal_2017, lw=2., color='royalblue')
ax[1].set_title('Seasonal Signal, 2017')
ax[2].plot(seasonal_2018, lw=2., color='royalblue')
ax[2].set_title('Seasonal Signal, 2018')
ax[3].plot(seasonal_2019, lw=2., color='royalblue')
ax[3].set_title('Seasonal Signal, 2019');

- Interestingly enough we can see the almost same figures every year; price reaches its peak at the beginning of year and falls to the bottom around March. After the tumble it starts to slowly rise to the top until the end of the year, encountering several ups and downs throughout the year. 
- Given the very similar shape happening each year, we are going to add annual seasonality parameter to the function.

In [None]:
pre_period = [training_start, training_end] 
post_period = [treatment_start, treatment_end] 
impact = CausalImpact(df_samsung, pre_period, post_period, model_args={'niter': 1000})

impact.plot()

In [None]:
print(impact.summary())

#### Step 3. Adding External Impact

Seasonal components did little to improve the forecast counterfactual.
Therefore, we are moving on for another variables for the model - exogenous variables that influence Samsung Electronic's management conditions. 

Using domain knowledge, we know that the company's business sector consists of 3 parts; semiconductor, IT/mobile communications and consumer electronics. For semiconductor, we are going to use Philadelphia semiconductor index which demonstrates supply/demand relationship of semiconductor worldwide. For other field that don't have a general index, we are going to use the company's competitors' stock price chart which influences Samsung Electronics' stock via sales/marketing strategies etc.
The added features' linear regressions (beta) will help us understand our data better.

In [None]:
samsung = seasonal_decompose(df_samsung, period=250)
result_samsung = samsung.seasonal[:]

df_apple = yf.download(['AAPL'], start = training_start, end = end_date, interval = '1d').iloc[:,4]
df_apple.rename('Apple', inplace = True)
apple = seasonal_decompose(df_apple, period=250)
result_apple = apple.seasonal[:]

df_philly = yf.download(['^SOX'], start = training_start, end = end_date, interval = '1d').iloc[:,4]
df_philly.rename('Phil.Semiconductor', inplace = True)
philly = seasonal_decompose(df_philly, period=250)
result_philly= philly.seasonal[:]

df_sony = yf.download(['SONY'], start = training_start, end = end_date, interval = '1d').iloc[:,4]
df_sony.rename('Sony', inplace = True)
sony = seasonal_decompose(df_sony, period=250)
result_sony = sony.seasonal[:]

fig, ax = plt.subplots(4, figsize=(15,12))
ax[0].plot(result_samsung, lw=2.)
ax[0].set_title('Samsung Seasonal Signal')
ax[1].plot(result_apple, lw=2., color='salmon')
ax[1].set_title('Apple Seasonal Signal', pad = 1)
ax[2].plot(result_philly, lw=2., color='orange')
ax[2].set_title('Philadelphia Semiconductor Index Seasonal Signal',pad = 1)
ax[3].plot(result_sony, lw=2., color = 'navy')
ax[3].set_title('Sony Seasonal Signal', pad = 1);

- We can easily see that Samsung Electronics' share price change looks similar to the rest in that they all fall in 1st quarter.
- especially Philadelphia Semiconductor Index has a very similar pattern throughout the year. It is also similar in that the price moves to its peak at the end of the year after a huge drop in the 1st quarter.
- Sony has 2nd most similar figure. The small ups and downs between 2nd quarter and 4th quarter are common to Phil. index. This is because semiconductor is another business sector that exists in Sony, just like Samsung Electronics. 
- Apple has the least similar seasonality to Samsung Electronics. They all share the price going down in 1st quarter and then bounce back. But Apple recovers the price faster than Samsung and falls again in 4th quarter.

In [None]:
# 2. Adding additional stocks in the semiconductor, mobile industry for more precise forecast
# stocks = ['TSM', 'MU', 'INTC']
# df_stocks = yf.download(stocks, start = training_start, end = end_date, interval = '1d')
# df_stocks = df_stocks.iloc[:,:3]
# df_stocks.columns = df_stocks.columns.droplevel()

df_total = pd.concat([df_samsung, df_apple, df_philly, df_sony], axis = 1)
# df_total.rename(columns ={'INTC': 'Intel', 'TSM':'TSMC', 'MU': 'Micron Tech.'}, inplace = True)
df_total.dropna(inplace = True)
df_total

In [None]:
# checking for correlation

df_total_corr = df_total[df_total.index <= training_end]
df_total_corr.dropna().corr()

- In order to improve forecast counterfactual, We would like to choose highly correlated stocks only.
- Given that all companies have Pearson correlation coefficient higher than 0.8, their share price are very correlated to Samsung Electronics'. Do note that high correlation coefficient means strong linear association, which allows us to use these as covariates in the model.
- However we need to scale the variables down, since each stock has different variance borne by different currency. We are going to choose MinMaxScaler as a scaler to use.

In [None]:
scaler = MinMaxScaler()
scaler.fit(df_total)
df_total_scaled = scaler.transform(df_total)
df_total_scaled_pd = pd.DataFrame(df_total_scaled, columns = df_total.columns, index = df_total.index)
df_total_scaled_pd

In [25]:
pre_period = [training_start, training_end] 
post_period = [treatment_start, treatment_end] 
impact = CausalImpact(df_total_scaled_pd, pre_period = pre_period, post_period = post_period, model_args=list(niter = 1000, nseasons = 52))
impact.plot()

NameError: name 'df_total_scaled_pd' is not defined

In [515]:
print(impact.summary())

Posterior Inference {Causal Impact}
                          Average            Cumulative
Actual                    0.72               43.05
Prediction (s.d.)         0.65 (0.03)        39.07 (1.76)
95% CI                    [0.6, 0.71]        [35.7, 42.59]

Absolute effect (s.d.)    0.07 (0.03)        3.98 (1.76)
95% CI                    [0.01, 0.12]       [0.46, 7.34]

Relative effect (s.d.)    10.18% (4.49%)     10.18% (4.49%)
95% CI                    [1.17%, 18.79%]    [1.17%, 18.79%]

Posterior tail-area probability p: 0.01
Posterior prob. of a causal effect: 99.0%

For more details run the command: print(impact.summary('report'))


1. sclaed 된 숫자이기 때문에 절대 금액은 비교하기 힘듬
2. 하지만 긍정적인 영향을 준것은 확실 - tail-area probability = 0.00 (statistical significance)- very significant. 
이벤트가 영향을 주지 않았다는 영가설은 확실하게 기각되었음. confidence interval (6.81%-) 는 항상 positive.
3. 급등의 이유는 이건희 회장의 후계자가 사후 배당을 늘릴 것이라고 전문가들에 의해 예측되었기 때문.
3. 주가는 평균 16.41% 오른 것으로 보임

### Conclusion

In [None]:
the price has been increasing in outstanding speed, 

### Reference
https://multithreaded.stitchfix.com/blog/2016/04/21/forget-arima/