# Removing Outliers

<span>How to handle outliers in a dataset requires a bit of intuition and domain expertise to get right for descriptive and predictive analytics. Yet there are basic "go-to" methodologies that are used in machine learning to increase the predictive power of your models. Below I got over the basic methodology to remove outliers from your data. Which methodology you use, and whether you use one at all is up to you.</span>

### Import Preliminaries

In [2]:
# Import modules
import pandas as pd

# Create a sample dataframe
df = pd.DataFrame(columns = ['name','age','score'],
                  data = [['Matt',24,99.0],
                         ['Chris', 26,10.0],
                         ['Ashley',22, 98.0],
                         ['Link', 16,100.0],
                         ['Avi', 96,100.0],
                         ['Steph', 1,98.0]],)

# View the dataframe
df

Unnamed: 0,name,age,score
0,Matt,24,99.0
1,Chris,26,10.0
2,Ashley,22,98.0
3,Link,16,100.0
4,Avi,96,100.0
5,Steph,1,98.0


### Remove Outliers by Quantile

In [3]:
# Copy the dataframe
df1 = df.copy()

# Remove numerical outliers by quantile
for col in df1:
    if df1[col].dtype == 'int64' or df1[col].dtype == 'float64':
        
      # Set quantile limits and replacement mean
      upper_limit = df1[col].quantile(.90)
      lower_limit = df1[col].quantile(.10)
      mean = df1[col].mean()
    
      # replace outliers
      for index,value in enumerate(df1[col]):
        if value > upper_limit: df1.loc[index,col] = mean
        if value < lower_limit: df1.loc[index,col] = mean

# View the dataframe
df1

Unnamed: 0,name,age,score
0,Matt,24.0,99.0
1,Chris,26.0,84.166667
2,Ashley,22.0,98.0
3,Link,16.0,100.0
4,Avi,30.833333,100.0
5,Steph,30.833333,98.0


### Removing Oultiers by Value

In [4]:
# Replacing outliers using panda's replace function
outliers = [96,1]
df.age.replace(outliers, df.age.mean(), inplace=True)

# View dataframe
df

Unnamed: 0,name,age,score
0,Matt,24.0,99.0
1,Chris,26.0,10.0
2,Ashley,22.0,98.0
3,Link,16.0,100.0
4,Avi,30.833333,100.0
5,Steph,30.833333,98.0


Author: Kavi Sekhon