Take the following dataset...

$X = [(x_0^{(1)}, x_1^{(1)}), (x_0^{(2)}, x_1^{(2)}), \dots , (x_0^{(n)}, x_1^{(n)})]$ 

$\textbf{x}^{(i)} = (x_0^{(i)}, x_1^{(i)})$

In [77]:
import numpy as np
size = 50
x = np.array((np.random.randint(1,10,size), np.random.randint(1,10,size))).T
print x

[[9 2]
 [3 9]
 [4 1]
 [2 8]
 [9 1]
 [1 8]
 [1 5]
 [9 7]
 [3 6]
 [3 1]
 [6 5]
 [2 6]
 [7 7]
 [9 1]
 [4 2]
 [2 5]
 [3 3]
 [9 6]
 [7 4]
 [8 6]
 [3 8]
 [2 6]
 [2 9]
 [7 2]
 [8 6]
 [7 9]
 [3 2]
 [7 8]
 [4 7]
 [8 7]
 [8 2]
 [1 3]
 [4 6]
 [7 6]
 [7 9]
 [9 2]
 [7 5]
 [2 1]
 [9 1]
 [1 2]
 [6 6]
 [1 7]
 [2 6]
 [5 8]
 [5 7]
 [6 5]
 [6 2]
 [7 2]
 [3 3]
 [7 4]]


#### Unweighted Mean

Often when refering to the mean of a multidimensional dataset, we are referring to the mean calculated over each dimension. If the mean is intended to be over the entire dataset it is often explicitly stated.

$\mu = \frac{1}{n} \sum\limits_{i=1}^n \textbf{x}^{(i)} = (\bar{x}_0, \bar{x}_1)$

In [78]:
mu = np.average(x, 0) # This performs the average over the two main dimensions
mu

array([ 5.1 ,  4.88])

To verify...

In [79]:
x0_bar = 0
x1_bar = 0
for xi in x:
    x0_bar += xi[0]
    x1_bar += xi[1]
x0_bar /= float(size)
x1_bar /= float(size)

np.array((x0_bar, x1_bar))

array([ 5.1 ,  4.88])

#### Standard Deviation

This is often understood as the tendency for the data to "deviate" from the mean. In other words, this is how far on average any one element from the dataset will differ from the mean.

$\sigma = \sqrt{ \frac{1}{n} \sum\limits_{i=1}^n ( \textbf{x}^{(i)} - \mu )^2 }$

In [80]:
sigma = np.std(x, 0) # This performs the standard deviation over the two main dimensions
sigma

array([ 2.70739727,  2.58178233])

This time we will verify using vectorized code...

In [81]:
sigma2 = np.array([0.0,0.0])
for xi in x:
    sigma2 += np.power(xi - mu, 2)
np.sqrt(sigma2/size)

array([ 2.70739727,  2.58178233])

#### Variance

Variance is the square of the standard deviation. Therefore we often simply refer to this quantity as $\sigma ^2$. Why not just use standard deviation? This answer has more theoretical origins than I am comfortable explaining right now. I need to look more into this.

$\sigma ^2 = \frac{1}{n} \sum\limits_{i=1}^n ( \textbf{x}^{(i)} - \mu )^2$

In [82]:
np.var(x, 0)

array([ 7.33  ,  6.6656])

To verify we will simply square the standard deviation found in the last section

In [83]:
np.power(sigma, 2)

array([ 7.33  ,  6.6656])

#### Covariance

This is how we begin to understand the relationship between two variables. If the standard deviation and variance are measures of how each element of a dataset vary from the mean, then covariance is a measure of how two datasets vary together.

$\text{COV}(\vec{x}_0, \vec{x}_1) = \frac{1}{n} \sum\limits_{i=1}^n ( x_0^{(i)} - \mu_0 )( x_1^{(i)} - \mu_1)$

In [84]:
np.cov(x.T)

array([[ 7.47959184, -1.11020408],
       [-1.11020408,  6.80163265]])

In [85]:
np.dot((x-mu).T, x-mu)/size

array([[ 7.33  , -1.088 ],
       [-1.088 ,  6.6656]])

In [86]:
plt.scatter(x.T[0], x.T[1])
plt.show()