# Statistics

## Mean

### Simple Intuition
In simple terms, mean is the average of a set of observed values, as simple as that. We can use it to say, describe some properties of a set of observations with a single number.

In the context of Mathematics, we are in the business of finding relationships that we observe in the real world and quantifying them from,initially qualitative observations, into something we can universally describe with (numbers). We define mean as a measure of central tendancy. Given a process that produces values which follow a particular distribution, we can use the mean an expectation of the values that are produced.

### Mathematical Intuition and Notation
$$ Mean(\mu) = \frac{\sum^N_{i=1}X_i}{N}$$
The above means: "The mean is the sum of all observations of x divided by N (the number of observations)."



In [4]:
observations = [1,1,1,2,2,4,5,7,10,11,11,14,14,14]
# In the formulation below we calculate the mean in the above set of observations
N = len(observations)
mean = sum(observations)/N
print("The mean is:")
print(mean)

The mean is:
6.928571428571429


In the context of probability, we intend to use this means with the purpose of prediction. We are able to also express the mean in a different form yet provide the same result.

$$Mean(\mu) = E[X] = \sum^N_{i=1} P(X = x_i) * x_i$$

The above means: "The mean is the sum of the multiples of x with their respective probabilities." We also say that the mean is the expectation of the observations. Meaning, if we have a random sample, we expect the value of that random sample to be the mean.

In [5]:
# In the formulation, below we calculate the mean with the probability method from above
counts = [observations.count(i) for i in set(observations)]
counts 
# Below we have the number of times each element appears in the observations list.
# We observe 1 three times, 2 two times, so on and so forth

[3, 2, 1, 1, 1, 1, 2, 3]

In [6]:
# Next we get the probabilites of each observation happening by dividing each count by the total number of observations
probabilities = [c/N for c in counts]
probabilities

[0.21428571428571427,
 0.14285714285714285,
 0.07142857142857142,
 0.07142857142857142,
 0.07142857142857142,
 0.07142857142857142,
 0.14285714285714285,
 0.21428571428571427]

In [7]:
# Then we get the mean by summing the multiples of their probabilites with their respective observed values as explained above.
# We observe that for both methods, we get the same final value of the mean
mean = sum([x * p for x,p in zip(probabilities,set(observations))])
print('The mean is: ')
print(mean)

The mean is: 
6.928571428571428


In [8]:
def get_mean(observations):
    N = len(observations)
    return sum(observations)/N
get_mean(observations)

6.928571428571429

## Variance

Now, we move on to the slightly more complicated statistical measures, the first of which being variance. I believe many feel that they have a strong grasp of simple statistical measures (myself included) related to variance, but it was not until I had to use them for more complex computations in option pricing did I realize that I only had a very surface-level understanding.
Variance is the foundation of standard deviation, covariance, and correlation, all of which are very important measures in Quantitative Finance.

### Simple Intuition
Simply put, variance is a measure of how spread out a set of observations are.

### Mathematical Intuition

Once again, in the context of Mathematics, we say variance is a measure of dispersion of a group of observations. In the application of variance in drawing relationships of real-life observations, we find that the larger the variance of an observation, the harder it is to say with confidence, that a particular sample of a distribution will be close to the mean. Recall in the above we say that the mean is the expectation of a distribution. 

Just as an example, we can say that we expect the height of a random guy in Singapore to have a height of 171 cm, but if the variance is very high, we are unable to say, with a high level of confidence, that for a random guy we find in Singaore, his height would be around 171 cm.

### Mathematical Notation

$$ Variance (\sigma^2) = \frac{\sum^N_{i=1} (x_i -\tilde{x})^2}{N}$$

Here, $\tilde{x}$ is the mean of the population, (could either be known or an estimate as well). In simple terms this means "The variance is the sum of squared deviations of each observed sample from the mean divided by the number of samples." The purpose of the square in the above would be to make each observed value become positive (if not they would very likley cancel each other out and give a poor representation of dispersion). Note how, it is increasingly difficult to describe in simple English what each of these statistical parameters are.

In [9]:
print("The mean is:")
print(mean)
# From the above we have that the mean is equal to 6.92

The mean is:
6.928571428571428


In [10]:
N = len(observations)                                    # Find the denominator N
squared_deviations = [(o-mean)**2 for o in observations] #Find the numerator Sum[(x - mu)^2]
variance = sum(squared_deviations)/N            
print("The variance is: ")
print(variance)

The variance is: 
25.637755102040817


In [11]:

def get_variance(observations):
    mu = get_mean(observations)
    N = len(observations)
    squared_deviations = [(o-mu)**2 for o in observations]
    return sum(squared_deviations)/N
get_variance(observations)

25.637755102040813

### Mathematical Notation in the Form of Expectations

Note that we can also express the variance in other notations. Recal that we have defined the mean above as the expectation of a distribution $E[X]$. The variance can also be expressed in a similar notation.

$$ Variance (\sigma ^2) =E[X^2] - E[X]^2$$

This can be described in simple terms as "Variance is the mean of the squared observations subtracted by the squared of the mean." Yes, it is becoming increasingly difficult for me to explain the above in simple terms.

Now, if we bring back the initial mathematical formulation, we can notice how the above notation is true.
$$ Variance (\sigma^2) = \frac{\sum^N_{i=1} (x_i -\tilde{x})^2}{N}$$
$$ = \frac{\sum^N_{i=1} x_i^2 - 2\tilde{x}x_i + \tilde{x}^2}{N}$$
$$ = E[ x_i^2 - 2\tilde{x}x_i + \tilde{x}^2]$$ 
Note that $\tilde{x} = E[X]$ and $E[x_i] = E[X]$.
$$ = E\bigg[ X^2 - 2E[X]X + E[X]^2 \bigg]$$
Here, $E\bigg[ E[X] X\bigg]$ is a nested expectation. $E[X]$ is a constant and the $E[E[X]] \text{ aka. expectation of the expecation of the observations}$, meaning the expectation of the mean, is still the mean. Thus:
$E[E[X]X] = E[X]*E[X] = E[X]^2$

then, we can say that
$$ = E[ X^2] - 2E[X]E[X] + E[X]^2$$
$$ = E[ X^2] - 2E[X]^2 + E[X]^2$$
$$ = E[ X^2] - E[X]^2$$


If you did not understand the above, the key to converting the first initial computation of variance into our second form, expressed as expecations. Take this away:
$$\frac{\sum^N_{i=1} 2x_i\tilde{x}}{N} = E[2x_i\tilde{x}] = E\big[2XE[2X] \big] = 2E[X]E[X] = 2E[X]^2$$


In [15]:
# In the same notation,
variance = sum([o**2 for o in observations])/N - (get_mean(observations))**2
print("The variance is: ")
print(variance)

The variance is: 
25.637755102040806


### Properties of Variance
1. $Var(aX) = a^2Var(X)$
2. $Var(X+Y) = Var(X)+Var(Y)+2Cov(X,Y)$
3. $Var(X-Y) = Var(X)+Var(Y)-2Cov(X,Y)$
    If X and Y are independent Cov(X,Y) = 0
4. $Var(a) = 0$

Out of all the properties, the first and second one are the less intuitive ones. For the first property, if we multiply all our observations by a constant we get aX = a * X. Recall that variance requires us to square all the deviations from the mean, this time $a\tilde{x} = aE[X]$, giving us $E[a^2X^2] - a^2[EX]^2$. Thus we can see that we are able to take out $a^2$.

## Standard Deviation

In the above variance, notice that we have squared our deivations from the mean. So this provides a value that is not to scale, and more difficult to understand from a simplistic point of view. In our observations given, [1,1,1,2,2,4,5,7,10,11,11,14,14,14], the average is about 6, while our variance is 25. To the unobservant eye, the value 25 brings little value.

Hence we need to square root our variance formulation to bring it down to a value that is more intuitive. Thus we have:

$$StandardDeviation (\sigma) = \sqrt{\frac{\sum^N_{i=1} (x_i - \tilde{x})^2}{N}}$$

In simple terms we describe the standard deviation as *the average absolute deviations from the mean*.
From the above formulation, "The square root of the variance of the distribution."


In [18]:
stddev = (get_variance(observations))**0.5
print("The standard deviation is:")
print(stddev)
# Now we have a value 5.05 which gives us a more intuitive understanding of the dispersion of the distribution

The standard deviation is:
5.063373885270652


In [24]:
def get_std_dev(observations):
    return get_variance(observations)**0.5
get_std_dev(observations)

5.063373885270652

## Covariance

In simple terms, covariance provides a numerical measure of the extent to which two variables move together. In the context of Quantitative Finance, we are often interested in how to variables move together in time, i.e. time is the independent variable that is always increasing, and we want to measure, as time passes, how two variables move together.

### Mathematical Notation and Intuition
$$ Covariance(Cov(X,Y)) = \frac{\sum^N_{i=1}(x_i - \tilde{x})(y_i-\tilde{y})}{N}$$
$$ = E\bigg[(X-E[X])(Y-E[Y])\bigg]$$

### Simple Intuition
If we think about the above formulation intuitively, a positive covariance tells us that the higher one value is away from its mean, the higher the other value also is from its mean, on average. Take note of the term on average, as if we have just one particularly big outlier, it can distort the relationship of the two variables extensively (Example Below). 


It is however, difficult to simply define an intuitive explanation of covariance. I would desribe covariance as such - with an example. Given X our independent variable with a covariance of 100 with Y, our dependent variable; if an observation of x at time t, had deviated by a certain amount from its mean, we would expect y to deviate from its own mean by 100 times the amount x had deviated from its mean.

Numerically, if mean of X = 10 and one observation of X was 21, i.e deviation from mean = 11, we would expect the observation of Y to be 21 * 100 = 2100. Given mean of Y = 20, that would mean y's value would be 2120.

In [135]:
X = [1,1,1,2,-2,-2,-4,7,10,2,-5,1,4,-4]
Y = [1,1,1,2,-2,-2,-4,7,10,2,-5,1,4,400]
# Take note that they are identical expcept for the last term in which they are opposite in direction and with a large magnitude for Y

In [136]:
mu_x = get_mean(X) # We find that the mean of X is very small
mu_y = get_mean(Y) # While the mean of Y is very large
print(f"Mean of X and Y are {mu_x} and {mu_y} respectively")

Mean of X and Y are 0.8571428571428571 and 29.714285714285715 respectively


In [137]:
N = len(X)
dev_x = [x - mu_x for x in X]
dev_y = [y-mu_y for y in Y]
#Lets observe how X and Y deviate from their mean using pandas dataframe for ease
import pandas as pd
deviations = pd.DataFrame(zip(dev_x,dev_y),columns = ['dev_x','dev_y'])
deviations['dev_xy'] = deviations['dev_x']*deviations['dev_y']
deviations 

Unnamed: 0,dev_x,dev_y,dev_xy
0,0.142857,-28.714286,-4.102041
1,0.142857,-28.714286,-4.102041
2,0.142857,-28.714286,-4.102041
3,1.142857,-27.714286,-31.673469
4,-2.857143,-31.714286,90.612245
5,-2.857143,-31.714286,90.612245
6,-4.857143,-33.714286,163.755102
7,6.142857,-22.714286,-139.530612
8,9.142857,-19.714286,-180.244898
9,1.142857,-27.714286,-31.673469


#### Observations
We can see that for bigger deviations from the mean, absolute of dev_xy reacts by becoming larger itself(Rows 6 and 7)
However, this does not tell a relationship between the magnitudes of the deviations as seen in row 6 and 7, dev_x are relatively small, but dev_xy does not tell you that the x and y values move together in temrs of magnitude.

Covariance however, tells us something about how frequently (frequency) and to what extent (magnitude) our two distributions move together ON AVERAGE. Note that in the X and Y, I have made them almost exactly identical, except for the very last value.

In [138]:
covar = sum([x*y for x,y in zip(dev_x,dev_y)])/N
print('The covariance is: ')
print(covar)



The covariance is: 
-123.61224489795917


## Correlation

### Simple Intuition
Now, we have correlation which is the covariance scaled by the multiple of the standard deviations of both distributions. Note that, the correlation tells us on a deviation-weighted (from the mean) basis, how often do two variables move together. Note that we have removed the idea of scale by scaling the covariance by the mulitples of the standard deviations. So the correlation does not tell us to what extent (in terms of magnitude) does one variable vary with another.

In [139]:
#Correlation
std_x = get_std_dev(X)
std_y = get_std_dev(Y)
corr = covar/(std_x*std_y)
print("The correlation is: ")
print(corr)

The correlation is: 
-0.2956518768813528
