# Calculating Z-Scores in Pandas


Recall that the Z-score tells you the precise numerical value of the standard deviation for an individual data point in your sample.

When you know the Z-score for two things that are measured in different ways, we know their "standard" scores which allows us to compare one against the other!


In [44]:
## import our usual
import pandas as pd

## another package with mad math functions
from scipy.stats import zscore 


# Create a sample df
df = pd.DataFrame({'numbers': [1,2,3,4,5,6,7,8,9,3,4,6,5,7,3,2,9]})
df


Unnamed: 0,numbers
0,1
1,2
2,3
3,4
4,5
5,6
6,7
7,8
8,9
9,3


In [46]:
## get stats


Unnamed: 0,numbers
count,17.0
mean,4.941176
std,2.486729
min,1.0
25%,3.0
50%,5.0
75%,7.0
max,9.0


In [47]:
# get rounded stats


Unnamed: 0,numbers
count,17.0
mean,4.941
std,2.487
min,1.0
25%,3.0
50%,5.0
75%,7.0
max,9.0


In [48]:
# Calculate the zscores and drop zscores into new column


Unnamed: 0,numbers,zscore
0,1,-1.63366
1,2,-1.21915
2,3,-0.804639
3,4,-0.390128
4,5,0.024383
5,6,0.438894
6,7,0.853405
7,8,1.267916
8,9,1.682426
9,3,-0.804639


## Climate Comparison

You're a climate reporter and you're comparing temperatures in the North and South poles. Which region had the most usually warm day?


In [49]:
## import this data
df = pd.read_csv("https://raw.githubusercontent.com/sandeepmj/datasets/main/zscore-climate.csv")
df

Unnamed: 0,Date,Pole,Temp
0,1/29/23,Antarctica,-55
1,1/29/23,Arctic,-17
2,1/30/23,Antarctica,-68
3,1/30/23,Arctic,-24
4,1/31/23,Antarctica,-82
5,1/31/23,Arctic,-32
6,2/1/23,Antarctica,-76
7,2/1/23,Arctic,-22


## Method 1 - two dataframes

In [50]:
## get artic data into its own df


Unnamed: 0,Date,Pole,Temp
1,1/29/23,Arctic,-17
3,1/30/23,Arctic,-24
5,1/31/23,Arctic,-32
7,2/1/23,Arctic,-22


In [25]:
## get Antarctica data into its own df


Unnamed: 0,Date,Pole,Temp
0,1/29/23,Antarctica,-55
2,1/30/23,Antarctica,-68
4,1/31/23,Antarctica,-82
6,2/1/23,Antarctica,-76


In [11]:
## get zscore for Antarctica


Unnamed: 0,Date,Pole,Temp,zscore
0,1/29/23,Antarctica,-55,1.508589
2,1/30/23,Antarctica,-68,0.222579
4,1/31/23,Antarctica,-82,-1.162356
6,2/1/23,Antarctica,-76,-0.568812


In [28]:
## get zscore for Arctic


Unnamed: 0,Date,Pole,Temp,zscore
1,1/29/23,Arctic,-17,1.249411
3,1/30/23,Arctic,-24,-0.046274
5,1/31/23,Arctic,-32,-1.527058
7,2/1/23,Arctic,-22,0.323921


In [27]:
df.groupby("Pole")[["Temp"]].mean()

Unnamed: 0_level_0,Temp
Pole,Unnamed: 1_level_1
Antarctica,-70.25
Arctic,-23.75


#### Which day was more unusually warmer?

## Method 2 - Pivot

In [52]:
## recall our df


Unnamed: 0,Date,Pole,Temp
0,1/29/23,Antarctica,-55
1,1/29/23,Arctic,-17
2,1/30/23,Antarctica,-68
3,1/30/23,Arctic,-24
4,1/31/23,Antarctica,-82
5,1/31/23,Arctic,-32
6,2/1/23,Antarctica,-76
7,2/1/23,Arctic,-22


```df.pivot(columns = "columns you want to pivot",
index = "What your new index should be",
values = "What values are for your columns"```

In [37]:
## pivot df


Pole,Antarctica,Arctic
Date,Unnamed: 1_level_1,Unnamed: 2_level_1
1/29/23,-55,-17
1/30/23,-68,-24
1/31/23,-82,-32
2/1/23,-76,-22


In [38]:
## call the Antarctica series


Date
1/29/23   -55
1/30/23   -68
1/31/23   -82
2/1/23    -76
Name: Antarctica, dtype: int64

In [39]:
## create z-score for Antarctica


Pole,Antarctica,Arctic,Antartica_Zscore
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1/29/23,-55,-17,1.508589
1/30/23,-68,-24,0.222579
1/31/23,-82,-32,-1.162356
2/1/23,-76,-22,-0.568812


In [40]:
## create z-score for Arctic


Pole,Antarctica,Arctic,Antartica_Zscore,Arctic_Zscore
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1/29/23,-55,-17,1.508589,1.249411
1/30/23,-68,-24,0.222579,-0.046274
1/31/23,-82,-32,-1.162356,-1.527058
2/1/23,-76,-22,-0.568812,0.323921


In [31]:
## recall our means


Unnamed: 0_level_0,Temp
Pole,Unnamed: 1_level_1
Antarctica,-70.25
Arctic,-23.75
