# Correlation
- Correlation measures association, but doesn't show if x causes y or vice versa
- Correlation is a statistic that measures the degree to which two variables move in relation to each other.
- In finance, the correlation can measure the movement of a stock with that of a benchmark index, such as the S&P 500.


### Formula
- $r = \frac{\sum(X - \bar{X})(Y - \bar{Y})}{\sqrt{\sum(X - \bar{X})^2 \sum(Y - \bar{Y})^2}}$
- $r$: the correlation factor
- $\bar{X}$: the average observations of $X$
- $\bar{Y}$: the average observations of $Y$

What does it mean?
- $r$ ranges between -1 and 1 (both inclusive)
- $r = 1$: Perfect positive correlation
- $r = -1$: Perfect negative correlation
- $r = 0$: No correlation at all

### Resources
- Correlation https://www.investopedia.com/terms/c/correlation.asp
- SP500 by Market Cap https://www.slickcharts.com/sp500

In [None]:
import pandas as pd
import pandas_datareader as pdr
import datetime as dt
import numpy as np

In [None]:
tickers = ['AAPL', 'TWTR', 'IBM', 'MSFT']
start = dt.datetime(2020, 1, 1)

data = pdr.get_data_yahoo(tickers, start)

In [None]:
data.head()

In [None]:
data = data['Adj Close']

In [None]:
data.head()

In [None]:
log_returns = np.log(data/data.shift())

In [None]:
log_returns

In [None]:
log_returns.corr()

In [None]:
sp500 = pdr.get_data_yahoo("^GSPC", start)

In [None]:
log_returns['SP500'] = np.log(sp500['Adj Close']/sp500['Adj Close'].shift())

In [None]:
log_returns.corr()

In [None]:
def test_correlation(ticker):
    df = pdr.get_data_yahoo(ticker, start)
    lr = log_returns.copy()
    lr[ticker] = np.log(df['Adj Close']/df['Adj Close'].shift())
    return lr.corr()

In [None]:
test_correlation("LQD")

In [None]:
test_correlation("TLT")

In [None]:
import matplotlib.pyplot as plt
%matplotlib notebook

In [None]:
def visualize_correlation(ticker1, ticker2):
    df = pdr.get_data_yahoo([ticker1, ticker2], start)
    df = df['Adj Close']
    df = df/df.iloc[0]
    fig, ax = plt.subplots()
    df.plot(ax=ax)

In [None]:
visualize_correlation("AAPL", "TLT")

In [None]:
visualize_correlation("^GSPC", "TLT")