<a href="https://colab.research.google.com/github/sundarjhu/Astrostatistics2021/blob/main/Astrostatistics_Lecture02.5_20210304.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
seed = 10001 #set the random seed for subsequent calls to random-number generators

##Demonstrating the properties of the mean and variance
###The expectation value of (a random variable plus a constant) is the population mean of the random variable plus the constant.
>Why? Because (a) the expectation value is a linear operator, and (b) the expectation value of a constant is that constant.
###The variance of (a random variable plus a constant) is the variance of the random variable.
>Why? Because (a) the variance is a linear operator, and (b) the variance of a constant is zero.

In [None]:
import numpy as np
np.random.seed(seed)             #for reproducibility
x = np.random.uniform(size = 20) #sample of 20 uniform random numbers between 0 and 1 (Expectation value = 0.5)
print("The sample mean is {}".format(np.round(x.mean(), decimals = 3)))
print("The sample variance is {}".format(np.round(x.var(), decimals = 3)))
print("The sample mean of x + 3 is {}".format(np.round((x + 3).mean(), decimals = 3))) #should be 3 + x.mean()
print("The sample variance of x + 3 is {}".format(np.round((x + 3).var(), decimals = 3))) # should be x.var()

The sample mean is 0.497
The sample variance is 0.068
The sample mean of x + 3 is 3.497
The sample variance of x + 3 is 0.068


##Evaluating the covariance matrix for three correlated random variables

####$X$ is a random variable, and $Y$ and $Z$ are functions of $X$:
>$Y = X^2; Z = 1 / X$.
####$Y$ is therefore positively correlated with $X$, and $Z$ is negatively correlated with it.


In [None]:
import numpy as np
np.random.seed(seed)             #for reproducibility
x = np.random.uniform(size = 20) #sample of 20 uniform random numbers between 0 and 1
y = x**2; z = 1/x                #positively- and anti-correlated with x
X = np.array([x, y, z])          #20 samples of a 4-element random vector
print("The covariance matrix for the two variables is:")
print(np.round(np.cov(X), decimals = 2))

The covariance matrix for the two variables is:
[[ 0.07  0.07 -0.72]
 [ 0.07  0.07 -0.6 ]
 [-0.72 -0.6  10.91]]


##Evaluating the covariance matrix for two independent random variables
###$x$ and $w$ are built from independent calls to np.random.uniform.
###Expectation: the covariance matrix is diagonal (off-diagonal elements should be close to zero).

In [None]:
samples = [10, 1000, 10000, 1000000]
for N in samples:
  np.random.seed(seed)             #for reproducibility
  x = np.random.uniform(size = N)  #sample of N uniform random numbers between 0 and 1
  np.random.seed(seed + 1)         #for reproducibility and independence
  w = np.random.uniform(size = N)  #no correlation expected with x
  print("*****Covariance matrix from {} samples".format(N))
  print(np.cov(np.array([x, w])))
  print("-----")

*****Covariance matrix from 10 samples
[[0.08343402 0.00593563]
 [0.00593563 0.08128469]]
-----
*****Covariance matrix from 1000 samples
[[ 0.08345954 -0.00303727]
 [-0.00303727  0.0817306 ]]
-----
*****Covariance matrix from 10000 samples
[[ 8.28494090e-02 -9.95351161e-05]
 [-9.95351161e-05  8.45424787e-02]]
-----
*****Covariance matrix from 1000000 samples
[[8.33598815e-02 1.33373657e-05]
 [1.33373657e-05 8.33893600e-02]]
-----


##Evaluating the Pearson Correlation Coefficient
###We define $Y = X^2$ and $Z = 1/X$
###Let's evaluate Pearson's Correlation Coefficient between each pair of variables.
>We do this in two ways -- from the definition of the covariance, and then using the np.corrcoef method.

In [None]:
import numpy as np
np.random.seed(seed)
N = 1000000
x = np.random.uniform(size = N); y = x**2; z = 1/x
np.random.seed(seed + 1)
w = np.random.uniform(size = N)
Sigma = np.cov(np.array([x, y, z, w]))
X = np.array([x, y, z, w])
vnames = ['X', 'Y', 'Z', 'W']
#The following should return the same value}
i = 0 #compare other variables to X
for j in [0, 1, 2, 3]:
  cc1 = Sigma[i, j] / np.sqrt(Sigma[i,i] * Sigma[j,j])
  print("Correlation coefficient ({}, {}) from definition of covariance: {}".format(vnames[i], vnames[j], np.round(cc1, decimals = 3)))
  cc2 = np.corrcoef(X[i], y = X[j])[0, 1]
  print("Correlation coefficient ({}, {}) from definition of covariance: {}".format(vnames[i], vnames[j], np.round(cc2, decimals = 3)))

Correlation coefficient (X, X) from definition of covariance: 1.0
Correlation coefficient (X, X) from definition of covariance: 1.0
Correlation coefficient (X, Y) from definition of covariance: 0.968
Correlation coefficient (X, Y) from definition of covariance: 0.968
Correlation coefficient (X, Z) from definition of covariance: -0.021
Correlation coefficient (X, Z) from definition of covariance: -0.021
Correlation coefficient (X, W) from definition of covariance: 0.0
Correlation coefficient (X, W) from definition of covariance: 0.0
