## Correlation Analysis
The purpose of this notebook is to show the correlation between the BSE index movement and few selected stocks.

### 1. Datasets
We will be using the following real world datasets for the correlation analysis.

*   Daily movement of following stocks belonging to different sectors
    * HDFC Bank Ltd.
    * Britannia Industries Ltd.
    * Tata Motors Corporation Ltd.
    * HCL Technologies Ltd.
    * REC Ltd.
*   For the period of 1-Sep-2022 till 23-Mar-2023
*   The daily stock price data has been downloaded from the BSE India Site
https://www.bseindia.com/indices/IndexArchiveData.html

### 2. Daily Data for Sensex

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sn

In [None]:
DATA_PATH = '/content/drive/MyDrive/learning/ML/ML Course Mar-2023/Lecture-2/''

In [None]:
sensex_df = pd.read_csv(DATA_PATH+"Sensex.csv",
                        index_col=False, 
                        parse_dates=['Date'])

In [None]:
sensex_df.info()

In [None]:
sensex_df.head()

In [None]:
sensex_df = sensex_df.set_index('Date', drop=True)

#### 2.a. Calculate the daily gains for Sensex data

In [None]:
sensex_df['sensex_gain'] = (sensex_df.Close - sensex_df.Open)*100/sensex_df.Open
sensex_df[0:5]

### 3. Sectorial stocks

For this analysis, we are comparing daily movement of the following sectoral stocks with the sensex.

* HDFC Bank Ltd.
* Britannia Industries Ltd.
* Tata Motors Corporation Ltd.
* HCL Technologies Ltd.
* REC Ltd.

#### 3.a. Defining a method to load data and calculate gains
*   Pass the file name and it will return a dataframe with daily open and close price and gain percentage

In [None]:
def get_stock_gain( filename ):

    # Read the csv file
    df = pd.read_csv(filename, index_col = False, parse_dates=['Date'])
    # Set the time index 
    df = df.set_index(['Date'], drop=True)

    # Sort the records based on time
    df.sort_index(ascending = True, inplace=True)

    # Select only Close and Open Price columns for further analysis
    df['gain'] = ((df['Close Price'] - df['Open Price']) * 100 /
                    df['Open Price'])

    return df[['Close Price', 'Open Price', 'gain']]

#### 3.b. Loading the data for various stocks

In [None]:
hdfcbank_df = get_stock_gain(DATA_PATH+"hdfcbank.csv")
britannia_df = get_stock_gain(DATA_PATH+"britannia.csv")
tatamotors_df = get_stock_gain(DATA_PATH+"tatamotors.csv")
hcltech_df = get_stock_gain(DATA_PATH+"hcltech.csv")
recltd_df = get_stock_gain(DATA_PATH+"recltd.csv")

#### 3.c. Add gains for various stocks to the sensex data frame

In [None]:
sensex_df['hdfcbank_gain'] = hdfcbank_df['gain']
sensex_df['britannia_gain'] = britannia_df['gain']
sensex_df['tatamotors_gain'] = tatamotors_df['gain']
sensex_df['hcltech_gain'] = hcltech_df['gain']
sensex_df['recltd_gain'] = recltd_df['gain']

### 4. Plotting the correlation

#### 4.a. Scatterplots between Sensex and Stocks

In [None]:
plt.figure(figsize=(12, 6))
sn.scatterplot(data = sensex_df, x = 'sensex_gain', y = 'hdfcbank_gain');

In [None]:
plt.figure(figsize=(12, 6))
sn.scatterplot(data = sensex_df, x = 'sensex_gain', y = 'britannia_gain');

In [None]:
plt.figure(figsize=(12, 6))
sn.scatterplot(data = sensex_df, x = 'sensex_gain', y = 'tatamotors_gain');

In [None]:
plt.figure(figsize=(12, 6))
sn.scatterplot(data = sensex_df, x = 'sensex_gain', y = 'hcltech_gain');

In [None]:
plt.figure(figsize=(12, 6))
sn.scatterplot(data = sensex_df, x = 'sensex_gain', y = 'recltd_gain');

### 5. Strength of the Correlation

In [None]:
sensex_df[['sensex_gain', 'hdfcbank_gain']].corr()

In [None]:
sensex_df[['sensex_gain', 'britannia_gain']].corr()

In [None]:
sensex_df[['sensex_gain', 'tatamotors_gain']].corr()

In [None]:
sensex_df[['sensex_gain', 'hcltech_gain']].corr()

In [None]:
sensex_df[['sensex_gain', 'recltd_gain']].corr()

### 6. Heatmap highlighting the strength of the correlation between the Sensex and the selected stocks

In [None]:
sector_corr = sensex_df[['sensex_gain', 
                         'hdfcbank_gain', 
                         'britannia_gain',
                         'tatamotors_gain',
                         'hcltech_gain',
                         'recltd_gain']].corr()
sector_corr

In [None]:
plt.figure(figsize=(8, 6))
sn.heatmap(sector_corr,
           annot = True,
           fmt = "0.2f",
           cmap = sn.diverging_palette(240, 10),
           vmin = -1.0, 
           vmax = 1.0);