## Trimmed Mean

Trimmed means are averaging techniques that do not count (i.e. trim off) extreme values. The goal is to make mean calculations more robust to extreme values by not considering those values when calculating the mean.

SciPy offers a great methods of calculating trimmed means.

In [2]:
import pandas as pd
import numpy as np
from scipy import stats

**Create DataFrame**

In [4]:
df = pd.DataFrame({'name':['Jason', 'Molly', 'Tina', 'Jake', 'Amy', 'Bob', 'Jack', 'Jill', 'Kelly', 'Mark', 'Kao', 'Dillon'],
                   'score':[1,2,3,4,5,6,7,8,9,10,100,100]})
df

Unnamed: 0,name,score
0,Jason,1
1,Molly,2
2,Tina,3
3,Jake,4
4,Amy,5
5,Bob,6
6,Jack,7
7,Jill,8
8,Kelly,9
9,Mark,10


**Calculate Non-trimmed Mean**

In [5]:
df['score'].mean()

21.25

**Calculate Mean After Trimming Off Highest And Lowest**

In [6]:
# Trim off the 20% most extreme scores (lowest and highest)
stats.trim_mean(df['score'], proportiontocut=0.2)

6.5

In [9]:
#trim off the 20% most extreme scores and show the non-trimmed values
stats.trimboth(df['score'], proportiontocut = 0.2)

array([ 3,  5,  4,  6,  7,  8,  9, 10])

**Calculate Mean After Trimming Only Highest Extremes**

In [13]:
# Trim off the highest 20% most extreme scores 
stats.trim1(df['score'], proportiontocut=0.2, tail = 'right')

array([ 1,  3,  2,  4,  5,  6,  7,  9,  8, 10])

In [15]:
stats.trim1(df['score'], proportiontocut=0.2, tail = 'right').mean()

5.5

In [14]:
# Trim off the lowest 20% most extreme scores 
stats.trim1(df['score'], proportiontocut=0.2, tail = 'left')

array([  3,   5,   4,   6,   7,   8,   9,  10, 100, 100])

In [16]:
stats.trim1(df['score'], proportiontocut=0.2, tail = 'left').mean()

25.2