# # Covariance
1. It measures the relationship between two variables. 
2. If covariance > 0, the two variables move in the same direction.
3. If covariance < 0, the two variables move in opposite direction.
4. If covariance = 0, the two variables are independent.
4. To calculate covariance we need to perform the following :
covariance(x,y) = for all x and y [(x-mean(x)) * (y-mean(y))]/(n-1)

# Correlation 

1. **Perfect positive Correlation**:It has value as 1. The entire variability of the 2nd variable is explained by the 1st variable. Example :  A perfect positive correlation may exist between house size and price. An increase in the size of house will lead to an increase in the price of house as well. In this case, size will be the only variable that will determine a house's price and so it will be easy to calculate the houses size once we know it's price. However, in reality, it is very rare to get a perfect positive correlation , it is very likely that we will get a imperfect positive correlation. Size is one of the most important variables which determine the price of the house but it is not the only factor, other factors such as location and year of construction also play an important role while determining the price of the house. So it is not correct to say that a house will be valued purely based on it's size. That's how it works with most things.

1. **Independent Correlation** : Variables with 0 correlation are absolutely independent from each other. Example: There is no correlation between price of coffee in italy and price of house in London.

2. **Negative Correlation** - Two variables move in opposite direction. A perfect negative correlation coefficient : -1 and an imperfect negative correlation coefficient between : -1 and 0.The imperfect negative correlation is more likely predominant in reality. Example: When it's rainy season, there exists a negative correlation between the ice cream producers and umbrella producers since when it's raining , people will not likely buy ice creams but will surely purchase umbrellas. The prices of the two companies are influenced by the same variable but the variable impacts their businesses in a different way.

3. We don't need to manually compute correlation coefficient nowadays. There are different software packages available for it.

4. The more similar the context in which companies operate, the more correlation there will be between their share prices as they will be influenced by the same or similar factors.

In [1]:
#pandas.DataFrame.var()- Calculates the variance 

In [2]:
import numpy as np 
import pandas as pd
from pandas_datareader import data as wb
import matplotlib.pyplot as plt

In [3]:
tickers = ['PG','BEI.DE']
sec_data = pd.DataFrame()
for t in tickers:
    sec_data[t] = wb.DataReader(t, data_source ='yahoo',start ='2007-01-01')['Adj Close']

In [4]:
sec_data.tail()

Unnamed: 0_level_0,PG,BEI.DE
Date,Unnamed: 1_level_1,Unnamed: 2_level_1
2020-03-25,100.919998,95.660004
2020-03-26,107.379997,95.540001
2020-03-27,110.169998,91.559998
2020-03-30,115.0,93.620003
2020-03-31,111.330002,90.540001


In [5]:
sec_data.head()

Unnamed: 0_level_0,PG,BEI.DE
Date,Unnamed: 1_level_1,Unnamed: 2_level_1
2007-01-03,43.43528,39.869827
2007-01-04,43.105507,40.131615
2007-01-05,42.735352,39.306618
2007-01-08,42.829563,39.314541
2007-01-09,42.721889,38.426079


In [6]:
sec_returns = np.log(sec_data / sec_data.shift(1)) # taking the logarithmic returns since we r examining each company in a given time frame and this will tell us more about the behaviour of the stock

In [7]:
sec_returns

Unnamed: 0_level_0,PG,BEI.DE
Date,Unnamed: 1_level_1,Unnamed: 2_level_1
2007-01-03,,
2007-01-04,-0.007621,0.006545
2007-01-05,-0.008624,-0.020772
2007-01-08,0.002202,0.000202
2007-01-09,-0.002517,-0.022858
...,...,...
2020-03-25,-0.023019,0.009242
2020-03-26,0.062046,-0.001255
2020-03-27,0.025651,-0.042551
2020-03-30,0.042908,0.022250


# PG 

In [8]:
sec_returns['PG'].mean() # Gives us the daily average return

0.00028239627051410523

In [9]:
sec_returns['PG'].mean() * 250 # 250 is the total number of days in which stocks r dealt with so therefore it's multiplied by 250. 250 is the total number of trading days in a year.

0.0705990676285263

In [10]:
sec_returns['PG'].std() # To measure the volatility of a company's stock

0.011858014272654763

In [27]:
sec_returns['PG'].var() * 250 # Gives the variance of the stock anually

0.03515312562262102

In [24]:
sec_returns['PG'].std() * 250 ** 0.5 # For measuring the volatility of a company for an entire year. We take it's square root since standard deviation is computed by taking the square root of variance.

0.18749166814186977

In [25]:
#standard_deviation = v ** 0.5
#print (v)

# Beirsdorf

In [13]:
sec_returns['BEI.DE'].mean() # For computing the daily average return

0.00023097821973417737

In [14]:
sec_returns['BEI.DE'].mean() * 250 ## 250 is the total number of days in which stocks r dealt with so therefore it's multiplied by 250. 250 is the total number of trading days in a year.

0.05774455493354434

In [15]:
sec_returns['BEI.DE'].std() # To measure the volatility of a company's stock

0.013778186574993205

In [16]:
sec_returns['BEI.DE'].std() * 250 ** 0.5 # For measuring the volatility of a company for an entire year. We take it's square root since standard deviation is computed by taking the square root of variance.

0.21785225801866445

In [28]:
sec_returns['BEI.DE'].var() * 250 # Variance of the stock anually 

0.04745960632383074

In [17]:
# As we can see from the above data, stocks with higher standard deviation value involve more risk

In [19]:
# pandas.DataFrame.cov() - computes pairwise covariance of columns 
# pandas.DataFrame.corr() - computes pairwise correlation of columns 

In [20]:
cov_matrix = sec_returns.cov()

In [21]:
cov_matrix_a = sec_returns.cov() * 250
cov_matrix_a

Unnamed: 0,PG,BEI.DE
PG,0.035153,0.011352
BEI.DE,0.011352,0.04746


In [22]:
corr_matrix = sec_returns.corr()
corr_matrix

Unnamed: 0,PG,BEI.DE
PG,1.0,0.277172
BEI.DE,0.277172,1.0
