# **Project Idea:**
The main project idea is to carry out **Analysis of Stock Prices** for some of the largest banks in **Mainland China** listed on the **Hong Kong Stock Exchange (HKEX)**, and trying to figure out various strategies based on available historical data on these prices and implement statistical models and try to tell a story surrounding this data as well as aim to learn more about the volatility of the market and so, how accurate our predictive modeling power would be based on our available historical data.

# **Historical Data Collection:**
Owned by Verizon Media, Yahoo! Finance is a part of the Yahoo! network. It aids in providing financial news and data like stock prices listings, reports, press releases, and other unique and important sources of news and data outputs. 
<br/> However, Yahoo! Finance has long since decomissioned their historical data API. So, we shall be using the Python package ```yfinance``` to help us get our historical data for relevant stocks listed on the HKEX.
We can install the package on Google Colaboratory using ```!pip install yfinance```. So, now we install the package for our usage.




In [1]:
!pip install yfinance

Collecting yfinance
  Downloading https://files.pythonhosted.org/packages/a7/ee/315752b9ef281ba83c62aa7ec2e2074f85223da6e7e74efb4d3e11c0f510/yfinance-0.1.59.tar.gz
Collecting lxml>=4.5.1
[?25l  Downloading https://files.pythonhosted.org/packages/30/c0/d0526314971fc661b083ab135747dc68446a3022686da8c16d25fcf6ef07/lxml-4.6.3-cp37-cp37m-manylinux2014_x86_64.whl (6.3MB)
[K     |████████████████████████████████| 6.3MB 10.2MB/s 
Building wheels for collected packages: yfinance
  Building wheel for yfinance (setup.py) ... [?25l[?25hdone
  Created wheel for yfinance: filename=yfinance-0.1.59-py2.py3-none-any.whl size=23455 sha256=ac102b07039f1b1f00bde3dbdc4ff06b6cc697fd4923873700b96f55283dc3c9
  Stored in directory: /root/.cache/pip/wheels/f8/2a/0f/4b5a86e1d52e451757eb6bc17fd899629f0925c777741b6d04
Successfully built yfinance
Installing collected packages: lxml, yfinance
  Found existing installation: lxml 4.2.6
    Uninstalling lxml-4.2.6:
      Successfully uninstalled lxml-4.2.6
Successf

## **Importing the Package:**
So, now we import the package for our usage and implement the ```Ticker``` module for the stocks we need and require for further indepth analysis.


In [2]:
import yfinance as yf

## **Implementation of the Ticker Module:**
We now implement the ```Ticker``` module to access the Ticker data for the listed stocks in HKEX. In depth, we look at the following institutions or banks:
<br/>
<br/>1. **Industrial and Commercial Bank of China (ICBC)**  [1398.HK]
<br/>2. **China Construction Bank (CCB)** [0939.HK]
<br/>3. **Agricultural Bank of China (ABC)** [1288.HK]
<br/>4. **Bank of China (BOC)** [3988.HK]
<br/>5. **China Development Bank (CDB)** [1062.HK]
<br/>6. **Postal Savings Bank of China (PSBC)** [1658.HK]
<br/>7. **Bank of Communications (BOCom)** [3329.HK]
<br/>8. **China Merchants Bank (CMB)** [3968.HK]
<br/>

In [3]:
icbc = yf.Ticker('1398.HK')
ccb = yf.Ticker('0939.HK')
abc = yf.Ticker('1288.HK')
boc = yf.Ticker('3988.HK')
cdb = yf.Ticker('1062.HK')
psbc = yf.Ticker('1658.HK')
bocom = yf.Ticker('3329.HK')
cmb = yf.Ticker('3968.HK')

## **Getting the Historical Data:**
Using our 8 Tickers, we get past 10 years' historical data for the above 8 institutions and banks.



In [4]:
icbc_data = icbc.history(period = '10y')
ccb_data = ccb.history(period = '10y')
abc_data = abc.history(period = '10y')
boc_data = boc.history(period = '10y')
cdb_data = cdb.history(period = '10y')
psbc_data = psbc.history(period = '10y')
bocom_data = bocom.history(period = '10y')
cmb_data = cmb.history(period = '10y')

## **Gauging the Ticker Data:**
We tried to find the info from our Tickers.

In [6]:
icbc_data.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 2465 entries, 2011-06-07 to 2021-06-07
Data columns (total 7 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Open          2465 non-null   float64
 1   High          2465 non-null   float64
 2   Low           2465 non-null   float64
 3   Close         2465 non-null   float64
 4   Volume        2465 non-null   int64  
 5   Dividends     2465 non-null   float64
 6   Stock Splits  2465 non-null   int64  
dtypes: float64(5), int64(2)
memory usage: 154.1 KB


In [7]:
ccb_data.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 2465 entries, 2011-06-07 to 2021-06-07
Data columns (total 7 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Open          2465 non-null   float64
 1   High          2465 non-null   float64
 2   Low           2465 non-null   float64
 3   Close         2465 non-null   float64
 4   Volume        2465 non-null   int64  
 5   Dividends     2465 non-null   float64
 6   Stock Splits  2465 non-null   int64  
dtypes: float64(5), int64(2)
memory usage: 154.1 KB


In [8]:
abc_data.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 2465 entries, 2011-06-07 to 2021-06-07
Data columns (total 7 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Open          2465 non-null   float64
 1   High          2465 non-null   float64
 2   Low           2465 non-null   float64
 3   Close         2465 non-null   float64
 4   Volume        2465 non-null   int64  
 5   Dividends     2465 non-null   float64
 6   Stock Splits  2465 non-null   int64  
dtypes: float64(5), int64(2)
memory usage: 154.1 KB


In [9]:
boc_data.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 2465 entries, 2011-06-07 to 2021-06-07
Data columns (total 7 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Open          2465 non-null   float64
 1   High          2465 non-null   float64
 2   Low           2465 non-null   float64
 3   Close         2465 non-null   float64
 4   Volume        2465 non-null   int64  
 5   Dividends     2465 non-null   float64
 6   Stock Splits  2465 non-null   int64  
dtypes: float64(5), int64(2)
memory usage: 154.1 KB


In [10]:
cdb_data.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 2465 entries, 2011-06-07 to 2021-06-07
Data columns (total 7 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Open          2465 non-null   float64
 1   High          2465 non-null   float64
 2   Low           2465 non-null   float64
 3   Close         2465 non-null   float64
 4   Volume        2465 non-null   int64  
 5   Dividends     2465 non-null   int64  
 6   Stock Splits  2465 non-null   int64  
dtypes: float64(4), int64(3)
memory usage: 154.1 KB


In [11]:
psbc_data.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1157 entries, 2016-09-28 to 2021-06-07
Data columns (total 7 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Open          1157 non-null   float64
 1   High          1157 non-null   float64
 2   Low           1157 non-null   float64
 3   Close         1157 non-null   float64
 4   Volume        1157 non-null   int64  
 5   Dividends     1157 non-null   float64
 6   Stock Splits  1157 non-null   int64  
dtypes: float64(5), int64(2)
memory usage: 72.3 KB


In [12]:
bocom_data.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1001 entries, 2017-05-19 to 2021-06-07
Data columns (total 7 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Open          1001 non-null   float64
 1   High          1001 non-null   float64
 2   Low           1001 non-null   float64
 3   Close         1001 non-null   float64
 4   Volume        1001 non-null   int64  
 5   Dividends     1001 non-null   float64
 6   Stock Splits  1001 non-null   int64  
dtypes: float64(5), int64(2)
memory usage: 62.6 KB


In [13]:
cmb_data.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 2465 entries, 2011-06-07 to 2021-06-07
Data columns (total 7 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Open          2465 non-null   float64
 1   High          2465 non-null   float64
 2   Low           2465 non-null   float64
 3   Close         2465 non-null   float64
 4   Volume        2465 non-null   int64  
 5   Dividends     2465 non-null   float64
 6   Stock Splits  2465 non-null   int64  
dtypes: float64(5), int64(2)
memory usage: 154.1 KB


## **Takeaway from Ticker info() Data and further Data Cleaning:**
We can conclude from the ```info()``` function and the given ```DatatimeIndex``` that we are expecting a **total of 2465 days of data accumulated for the last 10 years**.
<br/>However, not all of the banks have enough historical data on ```yfinance``` for the past 10 years. So, we just select the ones for further analysis which provide us with the ample data of 10 full years or 2465 market days.
<br/>Hence, the list of banks after discarding the ones with insufficient data are:
<br/>
<br/>1. **Industrial and Commercial Bank of China (ICBC)**  [1398.HK]
<br/>2. **China Construction Bank (CCB)** [0939.HK]
<br/>3. **Agricultural Bank of China (ABC)** [1288.HK]
<br/>4. **Bank of China (BOC)** [3988.HK]
<br/>5. **China Development Bank (CDB)** [1062.HK]
<br/>6. **China Merchants Bank (CMB)** [3968.HK]
<br/>
<br/>
So, our Analysis get reduced to 6 banks now.
