In [72]:
import numpy as np
from scipy import stats

In [73]:
x = np.array([12,1,0,2,3,0])

## Location measures

### Mean

- Arithmetic avearge 
- Most common location measure

$$\frac{1}{n}\sum_{i=1}^{n}x_i$$

In [121]:
def sample_mean(x):
    return np.sum(x) / len(x)

print(np.mean(x))
print(sample_mean(x))

3.0
3.0


### Median 

Note that x must be ordered from smallest to largest.

$median(x) = x((n+1)/2)$ if n is odd, else $\frac{1}{2}(x(n/2) + x(n/2+1))$ if n is even 

- Robust location measure in contrst to the mean. 
- Precise location of most data points does not affect the median.

In [124]:
def median(x):
    return np.median(x)

median(x)

1.5

### Trimmed average 

- More robust than mean 
- Average of data when leavining out smallest and largest $k < n/2$ values

$$\frac{1}{n-2k}\sum_{i=k+1}^{n-k}{x_{(i)}}$$

### Mode

- Value which occurs the most frequently 
- Not necessarily unique 
- Can be more than one mode if several values occur equally often 

In [125]:
def mode(x):
    return stats.mode(x)[0]

mode(x)

array([0])

## Scale measures

### Sample variance 

- Answers questions about the spread of the data 
- Mean squared difference from the sample mean
- Squared deviation of the random variable from the mean - aka the "Central Moment" 

$$var(x)  = \frac{1}{n}\sum_{i=1}^{n}(x_i - \bar(x))^2$$

In [127]:
def variance(x):
    return np.sum((x - np.mean(x))**2) / len(x)

print(variance(x))
print(np.var(x))

17.333333333333332
17.333333333333332


### Sample standard deviation

- Due to squaring, affected by outliers more than the sample mean. 

$$std(x) = \sqrt{var(x)}$$

In [128]:
def std(x):
    return np.sqrt(variance(x))

print(std(x))
print(np.std(x))

4.163331998932265
4.163331998932265


### Median absolute deviation from the mean

- Robust measure of the scale of data 

$$MAD(x) = median(|x_i - median(x)|)$$

In [129]:
def mad(x):
    xs = np.abs(x - median(x))
    return median(xs)

print(mad(x))
print(stats.median_abs_deviation(x))

1.5
1.5


### Interquartile range 

- Robust quantity 
- Difference between upper and lower end of what contains the central 50% of the data

$$IQR = Q_3 - Q_1$$

In [130]:
def iqr(x):
    q3 = np.quantile(x, 0.75)
    q1 = np.quantile(x, 0.25)
    return q3 - q1

print(iqr(x))
print(stats.iqr(x))

2.5
2.5


## Shape measures 

### Sample skewness

- Measures the asymmetry of the data. 
- Does not take into account location and scale, due to having zero mean and unit standard deviation.
- Positive skewness: distribution has longer right tail -> mass concentrated on the left, elongated on the right.
- Negative skewness: Other way around: longer left tail 
- Data symmetric around the mean have zero skewness 
- Zero skewness does not necessarily mean the data are symmetric around their mean! 
- Sample skewness is sensitive to outleirs due to the third power

$$skew(x) = \frac{1}{n}\sum_{i+1}^n(\frac{(x_i - \bar{x})}{std(x)})^3$$

In [None]:
def skew(x):
    return np.sum((x))