# **Project Idea:**
The main project idea is to carry out **Analysis of Stock Prices** for some of the largest banks in **Mainland China** listed on the **Hong Kong Stock Exchange (HKEX)**, and trying to figure out various strategies based on available historical data on these prices and implement statistical models and try to tell a story surrounding this data as well as aim to learn more about the volatility of the market and so, how accurate our predictive modeling power would be based on our available historical data.

# **Historical Data Collection:**
Owned by Verizon Media, Yahoo! Finance is a part of the Yahoo! network. It aids in providing financial news and data like stock prices listings, reports, press releases, and other unique and important sources of news and data outputs. 
<br/> However, Yahoo! Finance has long since decomissioned their historical data API. So, we shall be using the Python package ```yfinance``` to help us get our historical data for relevant stocks listed on the HKEX.
We can install the package on Google Colaboratory using ```!pip install yfinance```. So, now we install the package for our usage.




In [1]:
!pip install yfinance

Collecting yfinance
  Downloading https://files.pythonhosted.org/packages/a7/ee/315752b9ef281ba83c62aa7ec2e2074f85223da6e7e74efb4d3e11c0f510/yfinance-0.1.59.tar.gz
Collecting lxml>=4.5.1
[?25l  Downloading https://files.pythonhosted.org/packages/30/c0/d0526314971fc661b083ab135747dc68446a3022686da8c16d25fcf6ef07/lxml-4.6.3-cp37-cp37m-manylinux2014_x86_64.whl (6.3MB)
[K     |████████████████████████████████| 6.3MB 7.8MB/s 
Building wheels for collected packages: yfinance
  Building wheel for yfinance (setup.py) ... [?25l[?25hdone
  Created wheel for yfinance: filename=yfinance-0.1.59-py2.py3-none-any.whl size=23455 sha256=8907ccfe6c4d8842c7583e8b9f34894f4ef3dc604affbedae21f27910d927d9e
  Stored in directory: /root/.cache/pip/wheels/f8/2a/0f/4b5a86e1d52e451757eb6bc17fd899629f0925c777741b6d04
Successfully built yfinance
Installing collected packages: lxml, yfinance
  Found existing installation: lxml 4.2.6
    Uninstalling lxml-4.2.6:
      Successfully uninstalled lxml-4.2.6
Successfu

## **Importing the Package:**
So, now we import the package for our usage and implement the ```Ticker``` module for the stocks we need and require for further indepth analysis.


In [2]:
import yfinance as yf

## **Implementation of the Ticker Module:**
We now implement the ```Ticker``` module to access the Ticker data for the listed stocks in HKEX. In depth, we look at the following institutions or banks:
<br/>
<br/>1. **Industrial and Commercial Bank of China (ICBC)**  [1398.HK]
<br/>2. **China Construction Bank (CCB)** [0939.HK]
<br/>3. **Agricultural Bank of China (ABC)** [1288.HK]
<br/>4. **Bank of China (BOC)** [3988.HK]
<br/>5. **China Development Bank (CDB)** [1062.HK]
<br/>6. **Postal Savings Bank of China (PSBC)** [1658.HK]
<br/>7. **Bank of Communications (BOCom)** [3329.HK]
<br/>8. **China Merchants Bank (CMB)** [3968.HK]
<br/>

In [3]:
icbc = yf.Ticker('1398.HK')
ccb = yf.Ticker('0939.HK')
abc = yf.Ticker('1288.HK')
boc = yf.Ticker('3988.HK')
cdb = yf.Ticker('1062.HK')
psbc = yf.Ticker('1658.HK')
bocom = yf.Ticker('3329.HK')
cmb = yf.Ticker('3968.HK')

## **Getting the Historical Data:**
Using our 8 Tickers, we get past 10 years' historical data for the above 8 institutions and banks.



In [4]:
icbc_data = icbc.history(period = '10y')
ccb_data = ccb.history(period = '10y')
abc_data = abc.history(period = '10y')
boc_data = boc.history(period = '10y')
cdb_data = cdb.history(period = '10y')
psbc_data = psbc.history(period = '10y')
bocom_data = bocom.history(period = '10y')
cmb_data = cmb.history(period = '10y')

## **Gauging the Ticker Data:**
We tried to find the info from our Tickers.

In [5]:
icbc_data.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 2465 entries, 2011-06-13 to 2021-06-11
Data columns (total 7 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Open          2465 non-null   float64
 1   High          2465 non-null   float64
 2   Low           2465 non-null   float64
 3   Close         2465 non-null   float64
 4   Volume        2465 non-null   int64  
 5   Dividends     2465 non-null   float64
 6   Stock Splits  2465 non-null   int64  
dtypes: float64(5), int64(2)
memory usage: 154.1 KB


In [6]:
ccb_data.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 2465 entries, 2011-06-13 to 2021-06-11
Data columns (total 7 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Open          2465 non-null   float64
 1   High          2465 non-null   float64
 2   Low           2465 non-null   float64
 3   Close         2465 non-null   float64
 4   Volume        2465 non-null   int64  
 5   Dividends     2465 non-null   float64
 6   Stock Splits  2465 non-null   int64  
dtypes: float64(5), int64(2)
memory usage: 154.1 KB


In [7]:
abc_data.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 2465 entries, 2011-06-13 to 2021-06-11
Data columns (total 7 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Open          2465 non-null   float64
 1   High          2465 non-null   float64
 2   Low           2465 non-null   float64
 3   Close         2465 non-null   float64
 4   Volume        2465 non-null   int64  
 5   Dividends     2465 non-null   float64
 6   Stock Splits  2465 non-null   int64  
dtypes: float64(5), int64(2)
memory usage: 154.1 KB


In [8]:
boc_data.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 2465 entries, 2011-06-13 to 2021-06-11
Data columns (total 7 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Open          2465 non-null   float64
 1   High          2465 non-null   float64
 2   Low           2465 non-null   float64
 3   Close         2465 non-null   float64
 4   Volume        2465 non-null   int64  
 5   Dividends     2465 non-null   float64
 6   Stock Splits  2465 non-null   int64  
dtypes: float64(5), int64(2)
memory usage: 154.1 KB


In [9]:
cdb_data.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 2465 entries, 2011-06-13 to 2021-06-11
Data columns (total 7 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Open          2465 non-null   float64
 1   High          2465 non-null   float64
 2   Low           2465 non-null   float64
 3   Close         2465 non-null   float64
 4   Volume        2465 non-null   int64  
 5   Dividends     2465 non-null   int64  
 6   Stock Splits  2465 non-null   int64  
dtypes: float64(4), int64(3)
memory usage: 154.1 KB


In [10]:
psbc_data.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1161 entries, 2016-09-28 to 2021-06-11
Data columns (total 7 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Open          1161 non-null   float64
 1   High          1161 non-null   float64
 2   Low           1161 non-null   float64
 3   Close         1161 non-null   float64
 4   Volume        1161 non-null   int64  
 5   Dividends     1161 non-null   float64
 6   Stock Splits  1161 non-null   int64  
dtypes: float64(5), int64(2)
memory usage: 72.6 KB


In [11]:
bocom_data.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1005 entries, 2017-05-19 to 2021-06-11
Data columns (total 7 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Open          1005 non-null   float64
 1   High          1005 non-null   float64
 2   Low           1005 non-null   float64
 3   Close         1005 non-null   float64
 4   Volume        1005 non-null   int64  
 5   Dividends     1005 non-null   float64
 6   Stock Splits  1005 non-null   int64  
dtypes: float64(5), int64(2)
memory usage: 62.8 KB


In [12]:
cmb_data.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 2465 entries, 2011-06-13 to 2021-06-11
Data columns (total 7 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Open          2465 non-null   float64
 1   High          2465 non-null   float64
 2   Low           2465 non-null   float64
 3   Close         2465 non-null   float64
 4   Volume        2465 non-null   int64  
 5   Dividends     2465 non-null   float64
 6   Stock Splits  2465 non-null   int64  
dtypes: float64(5), int64(2)
memory usage: 154.1 KB


## **Takeaway from Ticker info() Data and further Data Cleaning:**
We can conclude from the ```info()``` function and the given ```DatatimeIndex``` that we are expecting a **total of 2465 days of data accumulated for the last 10 years**.
<br/>However, not all of the banks have enough historical data on ```yfinance``` for the past 10 years. So, we just select the ones for further analysis which provide us with the ample data of 10 full years or 2465 market days.
<br/>Hence, the list of banks after discarding the ones with insufficient data are:
<br/>
<br/>1. **Industrial and Commercial Bank of China (ICBC)**  [1398.HK]
<br/>2. **China Construction Bank (CCB)** [0939.HK]
<br/>3. **Agricultural Bank of China (ABC)** [1288.HK]
<br/>4. **Bank of China (BOC)** [3988.HK]
<br/>5. **China Development Bank (CDB)** [1062.HK]
<br/>6. **China Merchants Bank (CMB)** [3968.HK]
<br/>
<br/>
So, our Analysis get reduced to 6 banks now.


## Naive Expectation of Prices

We consider the following simple model for Expectation of pricing in a day:
<br/>
$Open = O$ <br/>
$Close = C$ <br/>
$High = H$ <br/>
$Close = C$ <br/>
$\mathbb{E}(X) = \frac{1}{4}O + \frac{1}{4}C +
\frac{1}{4}H + \frac{1}{4}L$
<br/>
Finally after calculating, we normalize the DataFrame.



In [38]:
import pandas as pd

In [39]:
naive_pdata = pd.DataFrame()

In [40]:
icbc_data['Expectation'] = (icbc_data['Open'] + icbc_data['Close'] + icbc_data['High'] + icbc_data['Low'])/4
ccb_data['Expectation'] = (ccb_data['Open'] + ccb_data['Close'] + ccb_data['High'] + ccb_data['Low'])/4
abc_data['Expectation'] = (abc_data['Open'] + abc_data['Close'] + abc_data['High'] + abc_data['Low'])/4
boc_data['Expectation'] = (boc_data['Open'] + boc_data['Close'] + boc_data['High'] + boc_data['Low'])/4
cdb_data['Expectation'] = (cdb_data['Open'] + cdb_data['Close'] + cdb_data['High'] + cdb_data['Low'])/4
cmb_data['Expectation'] = (cmb_data['Open'] + cmb_data['Close'] + cmb_data['High'] + cmb_data['Low'])/4

In [41]:
naive_pdata['icbc'] = icbc_data['Expectation']
naive_pdata['ccb'] = ccb_data['Expectation']
naive_pdata['abc'] = abc_data['Expectation']
naive_pdata['boc'] = boc_data['Expectation']
naive_pdata['cdb'] = cdb_data['Expectation']
naive_pdata['cmb'] = cmb_data['Expectation']

In [42]:
naive_pdata

Unnamed: 0_level_0,icbc,ccb,abc,boc,cdb,cmb
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2011-06-13,3.595991,4.133405,2.230216,2.119722,1.6050,12.809892
2011-06-14,3.642311,4.187752,2.261608,2.137376,1.6125,13.048477
2011-06-15,3.643855,4.204358,2.269798,2.130586,1.5700,13.237940
2011-06-16,3.574375,4.132966,2.207013,2.081700,1.5325,12.974795
2011-06-17,3.594447,4.148538,2.196094,2.077626,1.5675,12.809892
...,...,...,...,...,...,...
2021-06-07,5.015000,6.210000,2.896456,2.852500,0.1585,69.600000
2021-06-08,5.030000,6.235000,2.873266,2.845000,0.1550,69.237501
2021-06-09,5.037500,6.257500,2.850000,2.850000,0.1590,68.962500
2021-06-10,5.020000,6.272500,2.815000,2.830000,0.1525,69.150000


In [43]:
naive_pdata.reset_index(level=0, inplace=True)

In [44]:
naive_pdata

Unnamed: 0,Date,icbc,ccb,abc,boc,cdb,cmb
0,2011-06-13,3.595991,4.133405,2.230216,2.119722,1.6050,12.809892
1,2011-06-14,3.642311,4.187752,2.261608,2.137376,1.6125,13.048477
2,2011-06-15,3.643855,4.204358,2.269798,2.130586,1.5700,13.237940
3,2011-06-16,3.574375,4.132966,2.207013,2.081700,1.5325,12.974795
4,2011-06-17,3.594447,4.148538,2.196094,2.077626,1.5675,12.809892
...,...,...,...,...,...,...,...
2460,2021-06-07,5.015000,6.210000,2.896456,2.852500,0.1585,69.600000
2461,2021-06-08,5.030000,6.235000,2.873266,2.845000,0.1550,69.237501
2462,2021-06-09,5.037500,6.257500,2.850000,2.850000,0.1590,68.962500
2463,2021-06-10,5.020000,6.272500,2.815000,2.830000,0.1525,69.150000


In [45]:
import plotly.express as px
import plotly.graph_objects as go

In [49]:
fig = go.Figure()

In [50]:
fig.add_trace(go.Scatter(x=naive_pdata['Date'], y=naive_pdata['icbc'],
                    mode='lines',
                    name='ICBC'))
fig.add_trace(go.Scatter(x=naive_pdata['Date'], y=naive_pdata['ccb'],
                    mode='lines',
                    name='CCB'))
fig.add_trace(go.Scatter(x=naive_pdata['Date'], y=naive_pdata['abc'],
                    mode='lines',
                    name='ABC'))
fig.add_trace(go.Scatter(x=naive_pdata['Date'], y=naive_pdata['boc'],
                    mode='lines',
                    name='BOC'))
fig.add_trace(go.Scatter(x=naive_pdata['Date'], y=naive_pdata['cdb'],
                    mode='lines',
                    name='CDB'))
fig.add_trace(go.Scatter(x=naive_pdata['Date'], y=naive_pdata['cmb'],
                    mode='lines',
                    name='CMB'))
fig.show()

## Rejecting CMB

Pricing of CMB Stocks is much different from the remaining 5 banks. So we cut our analysis down to 5 banks now:
<br/>
<br/>1. **Industrial and Commercial Bank of China (ICBC)**  [1398.HK]
<br/>2. **China Construction Bank (CCB)** [0939.HK]
<br/>3. **Agricultural Bank of China (ABC)** [1288.HK]
<br/>4. **Bank of China (BOC)** [3988.HK]
<br/>5. **China Development Bank (CDB)** [1062.HK]
<br/>
Now, we revisualise our data.

In [51]:
fig = go.Figure()

In [52]:
fig.add_trace(go.Scatter(x=naive_pdata['Date'], y=naive_pdata['icbc'],
                    mode='lines',
                    name='ICBC'))
fig.add_trace(go.Scatter(x=naive_pdata['Date'], y=naive_pdata['ccb'],
                    mode='lines',
                    name='CCB'))
fig.add_trace(go.Scatter(x=naive_pdata['Date'], y=naive_pdata['abc'],
                    mode='lines',
                    name='ABC'))
fig.add_trace(go.Scatter(x=naive_pdata['Date'], y=naive_pdata['boc'],
                    mode='lines',
                    name='BOC'))
fig.add_trace(go.Scatter(x=naive_pdata['Date'], y=naive_pdata['cdb'],
                    mode='lines',
                    name='CDB'))
fig.show()