In [2]:
import numpy as np
import scipy as sp
import pandas as pd

In statistics, a z-score tells us how many standard deviations away a value is from the mean. We use the following formula to calculate a z-score:

$$\mathcal Z = \frac {X - \mu}{\sigma}$$

where:

- X is a single raw data value
- μ is the population mean
- σ is the population standard deviation

# Calculate Z-Scores in Python

We can calculate z-scores in Python using `scipy.stats.zscore`, which uses the following syntax:

```
scipy.stats.zscore(a, axis=0, ddof=0, nan_policy=’propagate’)
```

where:

- `a`: an array like object containing data
- `axis`: the axis along which to calculate the z-scores. Default is 0.
- `ddof`: degrees of freedom correction in the calculation of the standard deviation. Default is 0.
- `nan_policy`: how to handle when input contains nan. Default is propagate, which returns nan. ‘raise’ throws an error and ‘omit’ performs calculations ignoring nan values.

In [3]:
data = np.array([6, 7, 7, 12, 13, 13, 15, 16, 19, 22])

In [4]:
sp.stats.zscore(data)

array([-1.39443338, -1.19522861, -1.19522861, -0.19920477,  0.        ,
        0.        ,  0.39840954,  0.5976143 ,  1.19522861,  1.79284291])

In [10]:
data = np.array([[5, 6, 7, 7, 8],
                 [8, 8, 8, 9, 9],
                 [2, 2, 4, 4, 5]])

In [11]:
sp.stats.zscore(data, axis=1)

array([[-1.56892908, -0.58834841,  0.39223227,  0.39223227,  1.37281295],
       [-0.81649658, -0.81649658, -0.81649658,  1.22474487,  1.22474487],
       [-1.16666667, -1.16666667,  0.5       ,  0.5       ,  1.33333333]])

In [12]:
sp.stats.zscore(data, axis = 0)

array([[ 0.        ,  0.26726124,  0.39223227,  0.16222142,  0.39223227],
       [ 1.22474487,  1.06904497,  0.98058068,  1.13554995,  0.98058068],
       [-1.22474487, -1.33630621, -1.37281295, -1.29777137, -1.37281295]])

In [7]:
data = pd.DataFrame(np.random.randint(0, 10, size=(5, 3)), columns=['A', 'B', 'C'])

In [8]:
data

Unnamed: 0,A,B,C
0,8,8,4
1,5,2,7
2,5,2,1
3,3,8,4
4,3,8,3


In [9]:
data.apply(sp.stats.zscore)

Unnamed: 0,A,B,C
0,1.745743,0.816497,0.103142
1,0.109109,-1.224745,1.650274
2,0.109109,-1.224745,-1.44399
3,-0.981981,0.816497,0.103142
4,-0.981981,0.816497,-0.412568
