In statistics, a z-score tells us how many standard deviations away a value is from the mean. We use the following formula to calculate a z-score:

* z = (X – μ) / σ

where:

* X is a single raw data value
* μ is the population mean
* σ is the population standard deviation

#### Numpy One-Dimensional Arrays

In [3]:
# Import modules
import pandas as pd
import numpy as np
import scipy.stats as stats

In [4]:
# Create an array of values
data = np.array([6, 7, 7, 12, 13, 13, 15, 16, 19, 22])

In [5]:
# Calculate z-scores for each value in the array
stats.zscore(data)

array([-1.39443338, -1.19522861, -1.19522861, -0.19920477,  0.        ,
        0.        ,  0.39840954,  0.5976143 ,  1.19522861,  1.79284291])

##### Each z-score tells us how many standard deviations away an individual value is from the mean. For example:

* The first value of “6” in the array is -1.394 standard deviations *below* the mean.
* The fifth value of “13” in the array is 0 standard deviations away from the mean, i.e. it is *equal* to the mean.
* The last value of “22” in the array is 1.793 standard deviations *above* the mean.

#### Numpy Multi-Dimensional Arrays

In [7]:
data = np.array([[5, 6, 7, 7, 8],
                 [8, 8, 8, 9, 9],
                 [2, 2, 4, 4, 5]])
stats.zscore(data, axis = 1) # calculates the z-scores for each array

array([[-1.56892908, -0.58834841,  0.39223227,  0.39223227,  1.37281295],
       [-0.81649658, -0.81649658, -0.81649658,  1.22474487,  1.22474487],
       [-1.16666667, -1.16666667,  0.5       ,  0.5       ,  1.33333333]])

* The first value of “5” in the first array is -1.159 standard deviations *below* the mean of its array.
* The first value of “8” in the second array is .816 standard deviations *below* the mean of its array.
* The first value of “2” in the third array is 1.167 standard deviations *below* the mean of its array.

#### Pandas DataFrames

In [11]:
data = pd.DataFrame(np.random.randint(0, 10, size = (5, 3)), columns = ["A","B","C"])
data

Unnamed: 0,A,B,C
0,9,8,6
1,0,2,7
2,6,3,4
3,5,3,9
4,3,9,7


In [12]:
data.apply(stats.zscore)  # calculates the z-score of the values by column

Unnamed: 0,A,B,C
0,1.463418,1.035098,-0.369274
1,-1.529937,-1.035098,0.246183
2,0.465633,-0.690066,-1.600189
3,0.133038,-0.690066,1.477098
4,-0.532152,1.380131,0.246183
