# 7 Statistics with NumPy
## 7_5 Covariance and Correlation in NumPy

#### numpy.cov(m, y=None, rowvar=True, bias=False, ddof=None, fweights=None, aweights=None, *, dtype=None)
- Estimate a covariance matrix, given data and weights.
- Covariance indicates the level to which two variables vary together.
- https://www.mathsisfun.com/data/covariance.html
- Wiki: Covariance in probability theory and statistics is a measure of the joint variability of two random variables.
The sign of the covariance, therefore, shows the tendency in the linear relationship between the variables.

#### numpy.corrcoef(x, y=None, rowvar=True, bias=<no value>, ddof=<no value>, *, dtype=None)
- Return Pearson product-moment correlation coefficients.
- The values of R are between -1 and 1, inclusive.
- https://www.mathsisfun.com/data/correlation.html
- Wiki: In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statistics it usually refers to the degree to which a pair of variables are linearly related.

#### numpy.

In [11]:
import numpy as np
np.__version__

'2.1.1'

In [12]:
# Functions

def show_attr(arrnm: str) -> str:
    strout = f' {arrnm}: '

    for attr in ('shape', 'ndim', 'size', 'dtype'):     #, 'itemsize'):
            arrnm_attr = arrnm + '.' + attr
            strout += f'| {attr}: {eval(arrnm_attr)} '

    return strout

In [13]:
A = np.array([[1,0,0,3,1],
              [3,6,6,2,9],
              [4,5,3,8,0]])

print(show_attr('A'))
A

 A: | shape: (3, 5) | ndim: 2 | size: 15 | dtype: int64 


array([[1, 0, 0, 3, 1],
       [3, 6, 6, 2, 9],
       [4, 5, 3, 8, 0]])

In [14]:
# np.cov() - The output is a 3x3 matrix which is symmetrical over
# the main diagonal (1.5 - 7.7 - 8.5). The value in the 2nd col
# of the 1st row (-2) is equal to the value of the 1st col of 2nd
# row.
# jm: Every element is the cov. of the row and itself and the other
# two rows. First row: 1.5 es cov of row_1 and row_1. -2 is the cov
# of row_1 and row_2. And 2 is the cov of row_1 and row _3

np.cov(A)
# All these values represent the cov between diff rows of A
# The cov of an element and itself is just the variance
# Cov(X,X) = Var(X), then all the els in te main diagonal represnt
# the variance for that associated row.
# Cov(A,B) = Cov(B,A) - (jm: commutative)

array([[ 1.5, -2. ,  2. ],
       [-2. ,  7.7, -7. ],
       [ 2. , -7. ,  8.5]])

In [15]:
# np.corrcoef() - Correlation coefficient of the matrix
# Corr(X,Y) = Cov(X,Y) / (Std(X) *  Std(Y))

np.corrcoef(A)
# Finds the relationships between every two rows of the array
# Corr(X,X) = 1. Corr between a value and itself is always 1

array([[ 1.        , -0.58848989,  0.56011203],
       [-0.58848989,  1.        , -0.8652532 ],
       [ 0.56011203, -0.8652532 ,  1.        ]])

In [23]:
display(np.correlate(A[0,:], A[1,:]))
display(np.correlate(A[0,:], A[1,:], 'same'))
display(np.correlate(A[0,:], A[1,:], 'full'))


array([18])

array([ 6, 33, 18, 20, 24])

array([ 9,  2,  6, 33, 18, 20, 24, 15,  3])

In [28]:
# Create a randomize new integer matrix
from numpy.random import Generator as gen 
from numpy.random import PCG64 as pcg 

array_RG = gen(pcg())
B = array_RG.integers(20, size=(3,5))
display(B)

display(np.cov(B))
display(np.corrcoef(B))

array([[ 8,  8, 17, 19, 14],
       [17, 13,  7,  8, 18],
       [ 1, 13,  5, 14,  8]])

array([[ 25.7 , -17.15,   8.45],
       [-17.15,  25.3 ,  -9.9 ],
       [  8.45,  -9.9 ,  29.7 ]])

array([[ 1.        , -0.67256971,  0.30585242],
       [-0.67256971,  1.        , -0.36115756],
       [ 0.30585242, -0.36115756,  1.        ]])

Correlating

corrcoef(x[, y, rowvar, bias, ddof, dtype])
Return Pearson product-moment correlation coefficients.

correlate(a, v[, mode])
Cross-correlation of two 1-dimensional sequences.

cov(m[, y, rowvar, bias, ddof, fweights, ...])
Estimate a covariance matrix, given data and weights.