# **Z-Score**

A Z-score of 0 indicates that the value is exactly at the mean, while a positiove Z-score indicates a value above the mean, and a negative Z-score indicates a value below the mean.

How many Standard Deviations away from the mean


## Examples

In [2]:
import numpy as np
import pandas as pd 
from scipy import stats

In [3]:
data = [10,12,22,21,55,1,42,32]

mean = np.mean(data)
std_dev = np.std(data)

z_score_manual = (data - mean) / std_dev
z_score_manual

array([-0.86101871, -0.7412248 , -0.14225526, -0.20215222,  1.8343442 ,
       -1.40009129,  1.05568381,  0.45671427])

### Scipy

In [4]:
z_scores_scipy = stats.zscore(data)
z_scores_scipy

array([-0.86101871, -0.7412248 , -0.14225526, -0.20215222,  1.8343442 ,
       -1.40009129,  1.05568381,  0.45671427])

### Pandas

In [7]:
np.random.seed(7)

data2= np.random.normal(loc = 0, scale = 1, size=1000)
df = pd.DataFrame(data2, columns=['Values'])
df

Unnamed: 0,Values
0,1.690526
1,-0.465937
2,0.032820
3,0.407516
4,-0.788923
...,...
995,0.038660
996,0.557177
997,-0.488457
998,-0.628522


In [9]:
df['Z-Score'] = (df['Values'] - df['Values'].mean()) / df['Values'].std()

df

Unnamed: 0,Values,Z-Score
0,1.690526,1.785156
1,-0.465937,-0.454298
2,0.032820,0.063654
3,0.407516,0.452770
4,-0.788923,-0.789714
...,...,...
995,0.038660,0.069719
996,0.557177,0.608190
997,-0.488457,-0.477684
998,-0.628522,-0.623139


In [10]:
within_1_std = len(df[(df['Z-Score'] >= -1) & (df['Z-Score'] <= 1)]) / len(df) * 100
within_2_std = len(df[(df['Z-Score'] >= -2) & (df['Z-Score'] <= 2)]) / len(df) * 100
within_3_std = len(df[(df['Z-Score'] >= -3) & (df['Z-Score'] <= 3)]) / len(df) * 100

summary = pd.DataFrame({
    'STD Dev': ['One', 'Two', 'Three'],
    'Perrcent': [within_1_std, within_2_std, within_3_std]
})

summary

Unnamed: 0,STD Dev,Perrcent
0,One,67.6
1,Two,96.2
2,Three,99.7


In [12]:
df['Outlier'] = (df['Z-Score'] > 3) | (df['Z-Score'] < -3)

top_5_highest = df.sort_values(by='Z-Score', ascending=False).head(5)
top_5_lowest = df.sort_values(by='Z-Score', ascending=True).head(5)

print(top_5_highest)
print(top_5_lowest)

       Values   Z-Score  Outlier
316  2.861067  3.000746     True
350  2.638539  2.769654    False
899  2.594645  2.724071    False
564  2.510811  2.637011    False
985  2.351867  2.471950    False
       Values   Z-Score  Outlier
384 -3.082505 -3.171564     True
685 -3.000316 -3.086212     True
457 -2.823831 -2.902935    False
449 -2.794785 -2.872771    False
419 -2.761517 -2.838223    False


In [14]:
df_no_outliers = df[df['Outlier'] == False].copy()
df_no_outliers.shape

(997, 3)

In [15]:
top_5_highest_no_outlier = df_no_outliers.sort_values(by='Z-Score', ascending=False).head(5)
top_5_lowest_no_outlier = df_no_outliers.sort_values(by='Z-Score', ascending=True).head(5)

print(top_5_highest_no_outlier)
print(top_5_lowest_no_outlier)

       Values   Z-Score  Outlier
350  2.638539  2.769654    False
899  2.594645  2.724071    False
564  2.510811  2.637011    False
985  2.351867  2.471950    False
96   2.259947  2.376492    False
       Values   Z-Score  Outlier
457 -2.823831 -2.902935    False
449 -2.794785 -2.872771    False
419 -2.761517 -2.838223    False
278 -2.305183 -2.364327    False
356 -2.299249 -2.358164    False
