## Importing Packages for Outlier Detection Analysis and EDA 

In [6]:
import pandas as pd
import numpy as np
import sklearn
import os
import scipy
import statsmodels.api as sm
import seaborn as sns
from pyod.models.mad import MAD



## PyOD is a standard package for python which represented Python Outlier Detection

One of the common methods to analyze a dataset and identify outliers is to use the mean and standard deviation of the dataset. However, these two measures are prone to the outliers themselves, and if there are large magnitudes in the differences between the inliers and outliers, these metrics will be influenced significantly and hence not be representative detectors for the outliers. 

Median Absolute Difference is an alternative measure used to describe the variability or the extent to which a datapoint is an outlier. It uses the median to calculate the difference, then an absolute measure is applied. This provides a more stable result, as the median is less prone to being influenced by extreme outlier values. 

In [7]:
input_df = pd.read_csv("/Users/petartodorovic/Documents/Programming/Programming-Problems/RandomLearnings/datasets/listings.csv")

# Multivariate Anomalies 
- have two or more attributes that are outside the norm
- Isolation Forests 
    - iTrees is the short for Isolation Trees
    - Randomized versions of Decision Trees, and splitting/branching occurs randomly
    - Random split is more likely to occur in an inlier/outlier gap, hence why the method is an efficient protocol for detecting these outliers

Only a fraction of the inliers will suffice to differentiate the outliers, which drastically reduces computation time

In [None]:
from pyod.models.iforest import IForest

iforest = IForest()
labels = iforest.fit_predict()

outliers = input_df[labels==1]