In [96]:
from scipy import stats
import numpy as np

In [97]:
x=np.array([-40, 1, 1.2, 1.4, 1.8,2,3,3.5,3.8, 3.9, 4,10,-3,10,100])

### Standard Deviation

A way to determine how "unusual" an element is - i.e. how far is it from the mean.

$\sigma= \sqrt{\dfrac{\sum_{i=1}^{n}(x-\bar{x})}{n}}$

For samples:

$s= \sqrt{\dfrac{\sum_{i=1}^{n}(x-\bar{x})}{n-1}}$

In [86]:
stats.tstd(x) #Trimmed standard deviation corrected by n/(n-1) factor

28.266279759256815

In [90]:
stats.tstd(x, (0,10)) #Trimmed standard deviation corrected by n/(n-1) factor and bound my the limits 0,10.

3.0933947224838461

In [98]:
np.std(x, axis=0)

27.307820613638626

## Z-score 

A way to determine how many standard deviations each element is from the mean.

#### $z = \dfrac{x - μ}{ σ}$

In [92]:
stats.zscore(x)

array([-1.71525955, -0.21385815, -0.20653424, -0.19921033, -0.18456251,
       -0.1772386 , -0.14061906, -0.12230928, -0.11132342, -0.10766147,
       -0.10399951,  0.11571777, -0.36033634,  0.11571777,  3.41147693])

In [99]:
x[np.where(abs(stats.zscore(x)>3))]  #Find outliers

array([ 100.])

In [100]:
x[np.where(abs(stats.zscore(x)<3))] #Find all values within 3 standard deviations (usually middle 99%)

array([-40. ,   1. ,   1.2,   1.4,   1.8,   2. ,   3. ,   3.5,   3.8,
         3.9,   4. ,  10. ,  -3. ,  10. ])

In [101]:
x[np.where(abs(stats.zscore(x))<1)] #Find all values within 1 standard deviations (usually middle 68%)

array([  1. ,   1.2,   1.4,   1.8,   2. ,   3. ,   3.5,   3.8,   3.9,
         4. ,  10. ,  -3. ,  10. ])

### Mean

A way to describe the "center" of the data.

$\mu = \dfrac{\sum_{i=1}^{n}{x_i}}{n}$

In [107]:
stats.trim_mean(x,.1) #Mean with 10% cut from the "left" and the "right" of the data set

3.2769230769230764

In [108]:
np.mean(x)  #Mean of the data set (note this equals stats.trim_mean(x,0))

6.8399999999999999