# Outlier detection using IQR method

The Interqurtile Range (IQR) is a measure of statistical dispersion, being equal to the difference between the first quartiles and the third quartile. The first quartile, denoted as Q1 is the value in the data set that holds 25% of the values below it. The third quartile, denoted Q3, is the value in the data set that holds 25% of the values above it. 

Therefore, the IQR method can be used to identify outliers by defining limits on the sample values that are a factor c of the IQR below the 25th percentile or above the 75th percentile. The default value for the factor c is 1.5, but it can be increased to identify just values that are extreme outliers.


                 LB          Q1      Q3          UB      
      outlier                +−−−−−+−+                  outlier
         *       |−−−−−−−−−−−|     | |−−−−−−−−−−−|         *
                             +−−−−−+−+    
                    
 +−−−+−−−+−−−+−−−+−−−+−−−+−−−+−−−+−−−+−−−+−−−+−−−+−−−+−−−+−−−+−−−+

In [1]:
import sys, os, pprint, copy
sys.path.append(os.path.abspath('../'))
import ptp.reader, ptp.metrics
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import ptp.outlier
%matplotlib inline
%load_ext autoreload
%autoreload 2

In [None]:
filenames = ["../data/serial-20190822-173422.json",
             "../data/serial-20190813-185308.json",
             "../data/serial-20190820-103355.json",
             "../data/serial-20190821-223131.json",
             "../data/serial-20190822-100213.json",
             "../data/serial-20190823-083502.json"]

for filename in filenames:
    # Reader
    reader = ptp.reader.Reader(filename)
    reader.run()
    print("Dataset info:")
    print(f"Filename: {filename}")
    pprint.pprint(reader.metadata)
    
    # Outlier detection
    outlier = ptp.outlier.Outlier(reader.data)
    outlier.process(c=2)
    
    d_asym     = np.array([r['asym'] for r in reader.data])
    x          = np.array([r['idx'] for r in reader.data])
    d_asym_out = [r['asym'] for r in reader.data if 'outlier' in r]
    x_out      = [r['idx'] for r in reader.data if 'outlier' in r]
    
    plt.figure()
    plt.scatter(x, d_asym, s=0.7)
    plt.scatter(x_out, d_asym_out, s=1)
    plt.show()
    plt.close()
    
    print("---------------------------------------------------------------\n")