# Covariance :

- In probability theory and statistics, covariance is a measure of the joint variability of two random variables.
- If the greater values of one variable mainly correspond with the greater values of the other variable, and the same holds for the lesser values, (i.e., the variables tend to show similar behavior), the covariance is positive.In the opposite case, when the greater values of one variable mainly correspond to the lesser values of the other, (i.e., the variables tend to show opposite behavior), the covariance is negative. 
- The sign of the covariance therefore shows the tendency in the linear relationship between the variables.
- The magnitude of the covariance is not easy to interpret because it is not normalized and hence depends on the magnitudes of the variables. The normalized version of the covariance, the correlation coefficient, however, shows by its magnitude the strength of the linear relation.

<img src="images/covariance.png">

<img src="images/covarianceGraph.png">

- If the correlation is positive, then the covariance is positive, as well. A stronger relationship corresponds to a higher value of the covariance.
- If the correlation is negative, then the covariance is negative, as well. A stronger relationship corresponds to a lower (or higher absolute) value of the covariance.
- If the correlation is weak, then the covariance is close to zero.

In [5]:
import math
import statistics
import numpy as np
import scipy.stats
import pandas as pd

x = list(range(-10, 11))
y = [0, 2, 2, 2, 2, 3, 3, 6, 7, 4, 7, 6, 6, 9, 4, 5, 5, 10, 11, 12, 14]
x_, y_ = np.array(x), np.array(y)
x__, y__ = pd.Series(x_), pd.Series(y_)

# covariance with pure python
n = len(x)
mean_x, mean_y = sum(x) / n, sum(y) / n
cov_xy = (sum((x[k] - mean_x) * (y[k] - mean_y) for k in range(n))
          / (n - 1))
cov_xy

# covariance with numPy 
cov_matrix = np.cov(x_, y_)
cov_matrix
# Output :
# [var(x,x), var(x,y)
#  var(y,x), var(y,y)]

# Note that cov() has the optional parameters : 
#    bias, which defaults to False,
#    ddof, which defaults to None. 
# Their default values are suitable for getting the sample covariance matrix. 


### Pandas Series have the method .cov() that you can use to calculate the covariance:

cov_xy = x__.cov(y__)
cov_xy

cov_xy = y__.cov(x__)
cov_xy

array([[38.5       , 19.95      ],
       [19.95      , 13.91428571]])

In [6]:
x_.var(ddof=1)


38.5