Firstly, using the **yfinance** library in **Python**, we gather the historical data, i.e. the *close, high, low, open* and *volume* for **Samsung Electronics**, **Apple Inc.**, **Lenovo Group Ltd** and **Dell Inc.** from 15 June 2020 to 25 June 2025 at 00:00 each day.

In [None]:
# Importing and installing the necessary libraries
!pip install yfinance
import yfinance as yf
import pandas as pd


start_date = '2020-06-15' # Setting the starting date
end_date = '2025-06-15' # Setting the ending date


# Storing the historical stock data for Samsung in a Pandas dataframe
ticker_symbol = '005930.KS'
samsung_data = yf.download(ticker_symbol, start=start_date, end=end_date)
samsung_df = pd.DataFrame(samsung_data)


# Doing the same for Apple
ticker_symbol = 'AAPL'
apple_data = yf.download(ticker_symbol, start=start_date, end=end_date)
apple_df = pd.DataFrame(apple_data)


# The Same for Lenovo
ticker_symbol = '0992.HK'
lenovo_data = yf.download(ticker_symbol, start=start_date, end=end_date)
lenovo_df = pd.DataFrame(lenovo_data)


# The same for Dell
ticker_symbol = 'DELL'
dell_data = yf.download(ticker_symbol, start=start_date, end=end_date)
dell_df = pd.DataFrame(dell_data)



  samsung_data = yf.download(ticker_symbol, start=start_date, end=end_date)
[*********************100%***********************]  1 of 1 completed
  apple_data = yf.download(ticker_symbol, start=start_date, end=end_date)
[*********************100%***********************]  1 of 1 completed
  lenovo_data = yf.download(ticker_symbol, start=start_date, end=end_date)
[*********************100%***********************]  1 of 1 completed
  dell_data = yf.download(ticker_symbol, start=start_date, end=end_date)
[*********************100%***********************]  1 of 1 completed


Now we display the gathered data. (Only the first five entries are being shown in each case.)

In [None]:
samsung_data.head()

Price,Close,High,Low,Open,Volume
Ticker,005930.KS,005930.KS,005930.KS,005930.KS,005930.KS
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
2020-06-15,43901.445312,45749.001127,43901.445312,45221.128037,28772921
2020-06-16,45836.980469,45836.980469,44517.29773,45045.170825,21808375
2020-06-17,45924.957031,46540.808945,45133.147427,45836.978186,26672595
2020-06-18,46012.941406,46012.941406,45397.089418,45924.962551,15982926
2020-06-19,46540.808594,46540.808594,45397.083619,46276.872061,18157985


In [None]:
apple_data.head()

Price,Close,High,Low,Open,Volume
Ticker,AAPL,AAPL,AAPL,AAPL,AAPL
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
2020-06-15,83.352501,84.006218,80.82269,80.985515,138808800
2020-06-16,85.561508,85.833694,83.772905,85.410839,165428800
2020-06-17,85.442429,86.368324,85.32092,86.307569,114406400
2020-06-18,85.476479,85.894469,84.866503,85.398711,96820400
2020-06-19,84.987999,86.650236,83.877408,86.183647,264476000


In [None]:
lenovo_data.head()

Price,Close,High,Low,Open,Volume
Ticker,0992.HK,0992.HK,0992.HK,0992.HK,0992.HK
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
2020-06-15,3.235703,3.338549,3.21988,3.314815,40967948
2020-06-16,3.306904,3.322726,3.267348,3.28317,35385603
2020-06-17,3.362283,3.378105,3.306904,3.322726,24512233
2020-06-18,3.378105,3.401839,3.330638,3.338549,20244291
2020-06-19,3.386016,3.40975,3.354371,3.362282,30707019


In [None]:
dell_data.head()

Price,Close,High,Low,Open,Volume
Ticker,DELL,DELL,DELL,DELL,DELL
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
2020-06-15,22.033237,22.197106,21.359038,21.443313,4336259
2020-06-16,22.323519,22.801077,22.066012,22.716802,3159957
2020-06-17,22.248611,22.763625,22.225201,22.36566,4104235
2020-06-18,22.201794,22.257976,21.878739,22.117518,4487391
2020-06-19,22.904081,23.184997,22.501434,22.632527,6059478


For our analysis, we shall use the opening prices of the four stocks. The common dates in each of the four dataframes are identified, and a new dataframe is created consisting of the common dates and the respective opening prices of the four stocks.

In [None]:
# Finding common dates using the index (which is the date)
common_dates = samsung_df.index.intersection(apple_df.index).intersection(lenovo_df.index).intersection(dell_df.index)


# Filtering the dataframes to keep only the common dates the 'Open' prices (without dropping any levels from the column index)
samsung_common = samsung_df.loc[common_dates].xs('Open', level='Price', axis=1)
apple_common = apple_df.loc[common_dates].xs('Open', level='Price', axis=1)
lenovo_common = lenovo_df.loc[common_dates].xs('Open', level='Price', axis=1)
dell_common = dell_df.loc[common_dates].xs('Open', level='Price', axis=1)


# Creating the new dataframe with common dates and 'Open' prices
common_opens_df = pd.DataFrame({
    'Samsung_Open': samsung_common['005930.KS'],
    'Apple_Open': apple_common['AAPL'],
    'Lenovo_Open': lenovo_common['0992.HK'],
    'Dell_Open': dell_common['DELL']
})


# Display the new dataframe
common_opens_df

Unnamed: 0_level_0,Samsung_Open,Apple_Open,Lenovo_Open,Dell_Open
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2020-06-15,45221.124014,80.985507,3.314815,21.443309
2020-06-16,45045.170825,85.410839,3.283170,22.716804
2020-06-17,45836.974288,86.307562,3.322726,22.365657
2020-06-18,45924.954753,85.398704,3.338549,22.117516
2020-06-19,46276.875945,86.183655,3.362282,22.632525
...,...,...,...,...
2025-06-09,60400.000000,204.389999,9.150000,114.510002
2025-06-10,60000.000000,200.600006,9.240000,114.800003
2025-06-11,59500.000000,203.500000,9.240000,114.400002
2025-06-12,59700.000000,199.080002,9.320000,112.019997


Now that we have the open prices of each of the four stocks at $T$ distinct times, we may calcuate the rates of return for any particular stock as:
$$r_t=\frac{P_t-P_{t-1}}{P_{t-1}},\ \ \ t=1,2,\dots,T$$
where $P_t$ is the observed open price on the $t$-th point of observation and $r_t$ is the corresponding rate of return. (Note that in this case, $T=1143$.)


In [None]:
# Compute the rate of return for each stock
samsung_returns = (common_opens_df['Samsung_Open'].shift(-1) - common_opens_df['Samsung_Open']) / common_opens_df['Samsung_Open']
apple_returns = (common_opens_df['Apple_Open'].shift(-1) - common_opens_df['Apple_Open']) / common_opens_df['Apple_Open']
lenovo_returns = (common_opens_df['Lenovo_Open'].shift(-1) - common_opens_df['Lenovo_Open']) / common_opens_df['Lenovo_Open']
dell_returns = (common_opens_df['Dell_Open'].shift(-1) - common_opens_df['Dell_Open']) / common_opens_df['Dell_Open']


# Create a new dataframe for the rates of return
returns_df = pd.DataFrame({
    'Samsung_Return': samsung_returns,
    'Apple_Return': apple_returns,
    'Lenovo_Return': lenovo_returns,
    'Dell_Return': dell_returns
})


# Display the returns dataframe
returns_df

Unnamed: 0_level_0,Samsung_Return,Apple_Return,Lenovo_Return,Dell_Return
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2020-06-15,-0.003891,0.054643,-9.546583e-03,0.059389
2020-06-16,0.017578,0.010499,1.204805e-02,-0.015458
2020-06-17,0.001919,-0.010530,4.761948e-03,-0.011095
2020-06-18,0.007663,0.009192,7.108987e-03,0.023285
2020-06-19,-0.011407,-0.009305,4.396147e-08,0.005586
...,...,...,...,...
2025-06-09,-0.006623,-0.018543,9.836083e-03,0.002533
2025-06-10,-0.008333,0.014457,0.000000e+00,-0.003484
2025-06-11,0.003361,-0.021720,8.658001e-03,-0.020804
2025-06-12,0.008375,0.003265,-7.510697e-03,-0.005981


Having ascertained the historical rates of return for each of the four stocks at the same timestamps, we now proceed to computing the covariance matrix $$\Sigma=\begin{pmatrix}\mathrm{cov}(r_{sam},r_{sam})&\mathrm{cov}(r_{sam},r_{app})&\mathrm{cov}(r_{sam},r_{len})&\mathrm{cov}(r_{sam},r_{dell})\\\mathrm{cov}(r_{app},r_{sam})&\mathrm{cov}(r_{app},r_{app})&\mathrm{cov}(r_{app},r_{len})&\mathrm{cov}(r_{app},r_{dell})\\\mathrm{cov}(r_{len},r_{sam})&\mathrm{cov}(r_{len},r_{app})&\mathrm{cov}(r_{len},r_{len})&\mathrm{cov}(r_{len},r_{dell})\\\mathrm{cov}(r_{dell},r_{sam})&\mathrm{cov}(r_{dell},r_{app})&\mathrm{cov}(r_{dell},r_{len})&\mathrm{cov}(r_{dell},r_{dell})\end{pmatrix}$$ But how does one compute these covariances? For random variables $r_i$ and $r_j$ whose historical data are known to us at $T$ distinct timestamps, we can approximate
$$\mathrm{cov}(r_i,r_j)=\frac{1}{T-1}\sum_{t=1}^T\left(r_{i_t}-\bar{r_i}\right)\left(r_{j_t}-\bar{r_j}\right)$$
where $r_{i_t}$ and $r_{j_t}$ are the observed reurns values of $r_i$ and $r_j$ at timestamp $t$ respectively. Moreover, $\bar{r_i}$ and $\bar{r_j}$ are the means of the historical values of $r_i$ and $r_j$ respectively, i.e.
$$\bar{r_i}=\frac{1}{T}\sum_{t=1}^TR_{i_t},\ \ \ \bar{r_i}=\frac{1}{T}\sum_{t=1}^TR_{j_t}$$

The Pandas library has **cov.()** function that does this.

In [None]:
# Removing the last row as it contains NaN values due to the shift operation
returns_df_cleaned = returns_df.dropna()


# Calculating the covariance matrix using the .cov() function
covariance_matrix_df = returns_df_cleaned.cov()
covariance_matrix = covariance_matrix_df.values


#Displaying the covariance matrix
covariance_matrix

array([[3.14641659e-04, 7.88388792e-05, 1.24460292e-04, 1.56860365e-04],
       [7.88388792e-05, 4.21207175e-04, 1.07061042e-04, 2.31480755e-04],
       [1.24460292e-04, 1.07061042e-04, 8.84668944e-04, 2.20631415e-04],
       [1.56860365e-04, 2.31480755e-04, 2.20631415e-04, 9.79158523e-04]])

Therefore, we have:
$$\Sigma=\begin{pmatrix}0.000315&0.000079&0.000124&0.000157\\0.000079&0.000421&0.000107&0.000231\\0.000124&0.000107&0.000885&0.000221\\0.000157&0.000231&0.000221&0.000979\end{pmatrix}$$

Our next task is to compute $m=\begin{pmatrix}\mathbb{E}[r_{sam}]\\\mathbb{E}[r_{app}]\\\mathbb{E}[r_{len}\\\mathbb{E}[r_{dell}]\end{pmatrix}=\begin{pmatrix}\mu_{sam}\\\mu_{app}\\\mu_{len}\\\mu_{dell}\end{pmatrix}$. For that we will be using the CAPM. Firstly, for each of the four stocks we need an appropariate risk-free rate.

1.   For Samsung (05930.KS), we use the  South Korea 10-Year Government Bond Yield which is around 2.82%.
2.   For Apple (APPL) and Dell (DELL), we use the  US 10-Year Treasury rate which is around 4.30%.
3.   For Lenovo (0992.HK), we use the Hong Kong 10-Year Government Bond Yield which is around 2.98%.

Therefore, $r_{f_{sam}}=0.0282,r_{f_{app}}=r_{f_{dell}}=0.043$ and $r_{f_{len}}=0.0298$.

In [None]:
# Initialising the risk-free rates

r_f_sam = 0.0282
r_f_app = 0.0430
r_f_len = 0.0298
r_f_dell = 0.0430

Now to compute $\mu_M$ for each stock, we use **historical average return**. The core principle of this method is as follows:


> This is the most common and straightforward approach. You calculate the average historical returns of a broad market index (like the S&P 500 in the USA, FTSE 100 in the UK, Nifty 50 in India or a global index for a worldwide market) over a long period (e.g. 20, 50 or even 100+ years). If you have historical daily, monthly, or annual returns, you would simply average them. For example, using arithmetic mean: $$\mathbb{E}[r_M]=N_T\cdot\frac{1}{T}\sum_{t=1}^Tr_{M,t}$$ where $r_{M,t}$ is the market return in pertiod $t$ and $T$ is the total number of periods. ($N_T$ is the total number of tr

Now we need to identify the appropriate market index for the primary exchanges of each of the stocks



1.   For Samsung, the primary exchange is KRX, and hence the relevant market index is the KOSPI Composite Index.
2.   For Apple, the primary exchange is NASDAQ, and hence the relevant market index is S&P 500.
3.   For Lenovo, the primary exchange is HKEX, and hence the relevant market index is the Hang Seng Index.
4.   For Apple, the primary exchange is NASDAQ, and hence the relevant market index is S&P 500.


We can now calculate the $\mu_M$ for each of the stocks.

In [None]:
import yfinance as yf
import pandas as pd

# Define time period
start_date = '2020-06-15'
end_date = '2025-06-15'

# Define the tickers for market indices
market_indices = {
    'Samsung': '^KS11',  # KOSPI
    'Apple': '^GSPC',    # S&P 500
    'Lenovo': '^HSI',    # Hang Seng Index
    'Dell': '^GSPC'      # S&P 500 again
}

# Download market index data
kospi_df = yf.download(market_indices['Samsung'], start=start_date, end=end_date)
sp500_df = yf.download(market_indices['Apple'], start=start_date, end=end_date)
hsi_df = yf.download(market_indices['Lenovo'], start=start_date, end=end_date)

# Compute daily returns for each index
kospi_returns = kospi_df['Close'].pct_change().dropna()
sp500_returns = sp500_df['Close'].pct_change().dropna()
hsi_returns = hsi_df['Close'].pct_change().dropna()

# Compute arithmetic mean (i.e., historical average return)
mu_M_samsung = kospi_returns.mean()
mu_M_apple = sp500_returns.mean()
mu_M_lenovo = hsi_returns.mean()
mu_M_dell = sp500_returns.mean()  # same as Apple

# Optional: Annualize the returns (assuming 252 trading days/year)
mu_M_samsung_annual = mu_M_samsung * 252
mu_M_apple_annual = mu_M_apple * 252
mu_M_lenovo_annual = mu_M_lenovo * 252
mu_M_dell_annual = mu_M_dell * 252

# Display the results
print("Daily average market returns (mu_M):")
print(mu_M_samsung)
print(mu_M_apple)
print(mu_M_lenovo)
print(mu_M_dell)

print("\nAnnualized average market returns (mu_M * 252):")
print(mu_M_samsung_annual)
print(mu_M_apple_annual)
print(mu_M_lenovo_annual)
print(mu_M_dell_annual)

  kospi_df = yf.download(market_indices['Samsung'], start=start_date, end=end_date)
[*********************100%***********************]  1 of 1 completed
  sp500_df = yf.download(market_indices['Apple'], start=start_date, end=end_date)
[*********************100%***********************]  1 of 1 completed
  hsi_df = yf.download(market_indices['Lenovo'], start=start_date, end=end_date)
[*********************100%***********************]  1 of 1 completed

Daily average market returns (mu_M):
Ticker
^KS11    0.000355
dtype: float64
Ticker
^GSPC    0.000592
dtype: float64
Ticker
^HSI    0.000131
dtype: float64
Ticker
^GSPC    0.000592
dtype: float64

Annualized average market returns (mu_M * 252):
Ticker
^KS11    0.089418
dtype: float64
Ticker
^GSPC    0.149276
dtype: float64
Ticker
^HSI    0.033108
dtype: float64
Ticker
^GSPC    0.149276
dtype: float64





Therefore, $m=\begin{pmatrix}0.000355\\0.000592\\0.000131\\0.000592\end{pmatrix}$.

Now, using the calculated market returns, we can proceed to compute the beta ($\beta$) for each stock using the following formula:
$$\beta_i = \frac{\mathrm{cov}(r_i, r_M)}{\mathrm{Var}(r_M)}$$
where $r_i$ is the return of the individual stock, $r_M$ is the return of the market index, $\mathrm{cov}(r_i, r_M)$ is the covariance between the stock and market returns, and $\mathrm{Var}(r_M)$ is the variance of the market returns.