# What is Anomaly Detection?

**Anomaly detection (aka outlier analysis) is a step in data mining that identifies data points, events, and/or observations that deviate from a dataset's normal behavior. Anomalous data can indicate critical incidents, such as a technical glitch, or potential opportunities, for instance a change in consumer behavior.**

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))


In [None]:
d=pd.read_csv('/kaggle/input/customer-segmentation-tutorial-in-python/Mall_Customers.csv')

In [None]:
d.info()

In [None]:
d.head()

In [None]:
#describing the data
d.describe()

In [None]:
!pip install pycaret

# Automated Anomaly Detection Using PyCaret

In [None]:
# Importing anomaly detection module.
from pycaret.anomaly import *

In [None]:
# Initializing the setup function used for pre-processing.
setup_anomaly_data = setup(data=d)

**Isolation Forest Implementation**

In [None]:
# Instantiating Isolation Forest model.
iforest = create_model('iforest')

In [None]:
# Plotting the data using Isolation Forest model.
plot_model(iforest)

In [None]:
# Generating the predictions using Isolation Forest trained model.
iforest_predictions = predict_model(iforest, data = d)
print(iforest_predictions)

**Checking anomaly rows. Label = 1 is the anomaly data**.

In [None]:
# Checking anomaly rows. Label = 1 is the anomaly data.
iforest_anomaly_rows = iforest_predictions[iforest_predictions['Label'] == 1]
print(iforest_anomaly_rows.head())

In [None]:
# Checking the number of anomaly rows returned by Isolaton Forest.
print(iforest_anomaly_rows.shape) 

In [None]:
print(iforest_anomaly_rows.head())

# K Nearest Neighbors (KNN) Implementation

In [None]:
# Instantiating KNN model.
knn = create_model('knn')

In [None]:
# Plotting the data using KNN model.
plot_model(knn)

In [None]:
# Generating the predictions using KNN trained model.
knn_predictions = predict_model(knn, data = d)


**Checking KNN anomaly rows. Predictions with Label = 1 are anomalies.**

In [None]:
print(knn_predictions)

In [None]:
knn_anomaly_rows = knn_predictions[knn_predictions['Label'] == 1]

In [None]:
# Checking the number of anomaly rows returned by KNN model.
knn_anomaly_rows.head()

In [None]:
knn_anomaly_rows.shape 

# Clustering Implementation

In [None]:
# Instantiating Cluster model.
cluster = create_model('cluster')

# Plotting the data using Cluster model.
plot_model(cluster)

In [None]:
# Generating the predictions using Cluster trained model.
cluster_predictions = predict_model(cluster, data = d)
print(cluster_predictions)
# Checking cluster anomaly rows. Predictions with Label = 1 are anomalies.
cluster_anomaly_rows = cluster_predictions[cluster_predictions['Label'] == 1]

In [None]:
# Checking the number of anomaly rows returned by Cluster model
print(cluster_anomaly_rows.head())
cluster_anomaly_rows.shape