### Requirements

In [1]:
import math
import numpy as np

### Central tendency indexes

#### Mean
Given a series of values $x_1, x_2, ... , x_n$ the mean is defined as follows
\begin{equation}
\mu = \frac{1}{n}\sum_{i=1}^nx_i =\frac{x_1, x_2, ... , x_n}{n}
\end{equation}
To calculate the mean of a series in numpy you can use the method .mean()

In [2]:
values = np.random.rand(100)
values.mean()

0.5096409773999343

The mean is that point that is less distant from all the other points of the series. Meaning that is the value $x$ that minimize the function:
\begin{equation}
f(x) = \sum_{i=1}^n(x - x_i)^2
\end{equation}

#### Median
It is denoted as $\hat{x}$ and represents the value that precedes as many elements of a series as follows. If you sort a series in ascending order $x_{(1)}, x_{(2)}, ... , x_{(n)}$, the median is the element in position $(n+1)/2$ if $n$ is odd, or the arithmetic mean between the elements of position $n/2$ and $n/2 + 1$ if it is even.

In [3]:
# Manually implemented
values.sort()
n = len(values)
if n % 2 == 0:
    print(np.mean([values[n//2-1], values[n//2]]))
else:
    print(values[(n-1)//2])

# Numpy function
np.median(values)

0.5063803844527567


0.5063803844527567

### Variability indexes

#### Variance
It provides information on how homogeneity and inhomogeneity of data. In other words it tells how much the values are spread around the mean.
\begin{equation}
\sigma^2 = \frac{1}{n}\sum_{i=1}^n(x_i - \mu)^2
\end{equation}

In [4]:
values.var()

0.09381087913770067

Since the dimension of the variance is the square of the data one, it is a common practice to use another variability index called standard deviation. It is the square root of the variance.
\begin{equation}
\sigma = \sqrt{\frac{1}{n}\sum_{i=1}^n(x_i - \mu)^2}
\end{equation}

In [5]:
print(f"Numpy std: {values.std()}")
print(f"square root of variance: {math.sqrt(values.var())}")

Numpy std: 0.30628561692920003
square root of variance: 0.30628561692920003


### Coefficient of Variation

In probability theory and statistics, the coefficient of variation (CV), also known as relative standard deviation (RSD), is a standardized measure of dispersion of a probability distribution or frequency distribution. It is often expressed as a percentage, and is defined as the ratio of the standard deviation $\sigma$ to the mean $\mu$ (or its absolute value, | $\sigma$ | |$\mu$ |).
\begin{equation}
c.v. = \frac{\sigma}{\mu}
\end{equation}

The coefficient of variation makes it possible to assess the dispersion of values around the mean regardless of the unit of measurement. For example, the standard deviation of a sample of incomes expressed in liras is completely different than the standard deviation of the same incomes expressed in euros, while the coefficient of variation is the same in both cases.

In [14]:
sigma = values.std()
mu = values.mean()
print(f"Coefficient of variation: {sigma/mu}")

Coefficient of variation: 0.6009831047962343


### Covariance

n probability theory and statistics, covariance is a measure of the joint variability of two random variables. If the greater values of one variable mainly correspond with the greater values of the other variable, and the same holds for the lesser values (that is, the variables tend to show similar behavior), the covariance is positive. In the opposite case, when the greater values of one variable mainly correspond to the lesser values of the other, (that is, the variables tend to show opposite behavior), the covariance is negative.

The sign of the covariance therefore shows the tendency in the linear relationship between the variables.

Given two series of values $X$ and $Y$, the covariance is:

\begin{equation}
c_{xy} = \frac{1}{n}\sum_{i=1}^n(x_i - \bar{x})(y_i - \bar{y})
\end{equation}

where $\bar{x}$ and $\bar{y}$ are the mean of the respective series.

np.cov(v1, v2) return the covariance matrix. By selecting the indexes [0][1] you show the covariance computed by the formula above.

The bias parameter equal to True it is necessary since it specify how to normalize the covariance.
If bias=True the normalization is by "N" otherwise is by "N-1". This is because by default the cov() function perform the sample covariance and not the "population" covariance.

In [17]:
v1 = np.random.rand(10)
v2 = np.random.rand(10)

print(np.cov(v1, v2, bias=True))
print(np.cov(v1, v2, bias=True)[0][1])

[[ 0.08171151 -0.03611054]
 [-0.03611054  0.09860654]]
-0.036110536874274755
