## PyCaret Anomaly Detection
PyCaret is an open-source, low-code machine learning library in Python that automates machine learning workflows. It is an end-to-end machine learning and model management tool that exponentially speeds up the experiment cycle and makes you more productive.


The design and simplicity of PyCaret are inspired by the emerging role of citizen data scientists, a term first used by Gartner. Citizen Data Scientists are power users who can perform both simple and moderately sophisticated analytical tasks that would previously have required more technical expertise.

In [1]:
!pip install pycaret

Collecting pycaret
  Downloading pycaret-3.3.2-py3-none-any.whl.metadata (17 kB)
Collecting scipy<=1.11.4,>=1.6.1 (from pycaret)
  Downloading scipy-1.11.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (60 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m60.4/60.4 kB[0m [31m3.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting joblib<1.4,>=1.2.0 (from pycaret)
  Downloading joblib-1.3.2-py3-none-any.whl.metadata (5.4 kB)
Collecting scikit-learn>1.4.0 (from pycaret)
  Downloading scikit_learn-1.5.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (13 kB)
Collecting pyod>=1.1.3 (from pycaret)
  Downloading pyod-2.0.2.tar.gz (165 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m165.8/165.8 kB[0m [31m9.3 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting category-encoders>=2.4.0 (from pycaret)
  Downloading category_encoders-2.6.3-py2.py3-none-any.whl.metadata (8

In [2]:
from pycaret.anomaly import *

In [3]:
from google.colab import files
uploaded = files.upload()

Saving tesla_data.csv to tesla_data (1).csv


In [4]:
import pandas as pd
dataset=pd.read_csv('tesla_data.csv')
dataset.head()

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,Cumulative Open,Price Change
0,2022-08-29,282.829987,287.73999,280.700012,284.820007,284.820007,41864700,282.829987,
1,2022-08-30,287.869995,288.480011,272.649994,277.700012,277.700012,50541800,570.699982,5.040009
2,2022-08-31,280.619995,281.25,271.809998,275.609985,275.609985,52107300,851.319977,-7.25
3,2022-09-01,272.579987,277.579987,266.149994,277.160004,277.160004,54287000,1123.899963,-8.040009
4,2022-09-02,281.070007,282.350006,269.079987,270.209991,270.209991,50890100,1404.969971,8.490021


In [5]:
dataset.shape

(251, 9)

In [6]:
data = dataset.sample(frac=0.95, random_state=786)
data_unseen = dataset.drop(data.index)

data.reset_index(drop=True, inplace=True)
data_unseen.reset_index(drop=True, inplace=True)

print('Data for the Modeling: ' + str(data.shape))
print('Unseen Data For Predictions: ' + str(data_unseen.shape))

Data for the Modeling: (238, 9)
Unseen Data For Predictions: (13, 9)


# PyCaret’s Anomaly Detection Module is an unsupervised machine learning module that is used for identifying rare items, events, or observations that raise suspicions by differing significantly from the majority of the data.

Typically, the anomalous items will translate to some kind of problems such as bank fraud, a structural defect, medical problems, or errors.

PyCaret's Anomaly Detection module provides several pre-processing features to prepare the data for modeling through the setup function. It has over 10 ready-to-use algorithms and few plots to analyze the performance of trained models.

A typical workflow in PyCaret's unsupervised module consist of following 6 steps in this order:

Setup ➡️ Create Model ➡️ Assign Labels ➡️ Analyze Model ➡️ Prediction ➡️ Save Model



In [7]:
exp_ano101 = setup(data, normalize = True,

                   session_id = 123)

Unnamed: 0,Description,Value
0,Session id,123
1,Original data shape,"(238, 9)"
2,Transformed data shape,"(238, 246)"
3,Numeric features,8
4,Categorical features,1
5,Rows with missing values,0.4%
6,Preprocess,True
7,Imputation type,simple
8,Numeric imputation,mean
9,Categorical imputation,mode


In [8]:
iforest = create_model('iforest')

Processing:   0%|          | 0/3 [00:00<?, ?it/s]

In [9]:
print(iforest)

IForest(behaviour='new', bootstrap=False, contamination=0.05,
    max_features=1.0, max_samples='auto', n_estimators=100, n_jobs=-1,
    random_state=123, verbose=0)


In [10]:
svm = create_model('svm', fraction = 0.025)

Processing:   0%|          | 0/3 [00:00<?, ?it/s]

In [11]:

print(svm)

OCSVM(cache_size=200, coef0=0.0, contamination=0.025, degree=3, gamma='auto',
   kernel='rbf', max_iter=-1, nu=0.5, shrinking=True, tol=0.001,
   verbose=False)


In [12]:
models()

Unnamed: 0_level_0,Name,Reference
ID,Unnamed: 1_level_1,Unnamed: 2_level_1
abod,Angle-base Outlier Detection,pyod.models.abod.ABOD
cluster,Clustering-Based Local Outlier,pycaret.internal.patches.pyod.CBLOFForceToDouble
cof,Connectivity-Based Local Outlier,pyod.models.cof.COF
iforest,Isolation Forest,pyod.models.iforest.IForest
histogram,Histogram-based Outlier Detection,pyod.models.hbos.HBOS
knn,K-Nearest Neighbors Detector,pyod.models.knn.KNN
lof,Local Outlier Factor,pyod.models.lof.LOF
svm,One-class SVM detector,pyod.models.ocsvm.OCSVM
pca,Principal Component Analysis,pyod.models.pca.PCA
mcd,Minimum Covariance Determinant,pyod.models.mcd.MCD


In [13]:
iforest_results = assign_model(iforest)
iforest_results.head()

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,Cumulative Open,Price Change,Anomaly,Anomaly_Score
0,2023-08-09,250.869995,251.100006,241.899994,242.190002,242.190002,101596300,49310.160156,3.419998,0,-0.014938
1,2022-10-07,233.940002,234.570007,222.020004,223.070007,223.070007,83916800,8120.330078,-5.5,0,-0.018026
2,2023-05-03,160.009995,165.0,159.910004,160.610001,160.610001,119728000,33547.96875,-1.87001,0,-0.016752
3,2023-07-26,263.25,268.040009,261.75,264.350006,264.350006,95856200,46729.898438,-9.130005,0,-0.005435
4,2022-12-09,173.839996,182.5,173.360001,179.050003,179.050003,104872300,16947.220703,1.639999,0,-0.017662


In [14]:
plot_model(iforest)

In [15]:
pip install pycaret[analysis]

Collecting shap~=0.44.0 (from pycaret[analysis])
  Downloading shap-0.44.1-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (24 kB)
Collecting interpret>=0.2.7 (from pycaret[analysis])
  Downloading interpret-0.6.3-py3-none-any.whl.metadata (1.1 kB)
Collecting umap-learn>=0.5.2 (from pycaret[analysis])
  Downloading umap_learn-0.5.6-py3-none-any.whl.metadata (21 kB)
Collecting ydata-profiling>=4.3.1 (from pycaret[analysis])
  Downloading ydata_profiling-4.10.0-py2.py3-none-any.whl.metadata (20 kB)
Collecting explainerdashboard>=0.3.8 (from pycaret[analysis])
  Downloading explainerdashboard-0.4.7-py3-none-any.whl.metadata (3.8 kB)
Collecting fairlearn==0.7.0 (from pycaret[analysis])
  Downloading fairlearn-0.7.0-py3-none-any.whl.metadata (7.3 kB)
Collecting dash-auth (from explainerdashboard>=0.3.8->pycaret[analysis])
  Downloading dash_auth-2.3.0-py3-none-any.whl.metadata (10 kB)
Collecting dash-bootstrap-components>=1 (fro

In [16]:
pip install umap-learn



In [17]:

unseen_predictions = predict_model(iforest, data=data_unseen)
unseen_predictions.head()

Unnamed: 0,Date_2023-08-09,Date_2022-10-07,Date_2023-05-03,Date_2023-07-26,Date_2022-12-09,Date_2023-04-24,Date_2022-11-01,Date_2023-07-24,Date_2023-05-17,Date_2023-08-11,...,Open,High,Low,Close,Adj Close,Volume,Cumulative Open,Price Change,Anomaly,Anomaly_Score
0,-0.064957,-0.064957,-0.064957,-0.064957,-0.064957,-0.064957,-0.064957,-0.064957,-0.064957,-0.064957,...,1.572984,1.569472,1.561542,1.562062,1.562062,-1.405443,-1.439635,1.516498,0,-0.017663
1,-0.064957,-0.064957,-0.064957,-0.064957,-0.064957,-0.064957,-0.064957,-0.064957,-0.064957,-0.064957,...,1.203533,1.297592,1.2432,1.190809,1.190809,-1.27974,-1.379906,-2.009763,0,-0.018857
2,-0.064957,-0.064957,-0.064957,-0.064957,-0.064957,-0.064957,-0.064957,-0.064957,-0.064957,-0.064957,...,0.259542,0.26362,0.238431,0.167711,0.167711,-1.078521,-1.26014,-0.330414,0,-0.020626
3,-0.064957,-0.064957,-0.064957,-0.064957,-0.064957,-0.064957,-0.064957,-0.064957,-0.064957,-0.064957,...,-0.491473,-0.553471,-0.52885,-0.580881,-0.580881,-0.941063,-0.763425,0.656494,0,-0.021045
4,-0.064957,-0.064957,-0.064957,-0.064957,-0.064957,-0.064957,-0.064957,-0.064957,-0.064957,-0.064957,...,-1.514617,-1.585991,-1.724539,-1.745219,-1.745219,1.804952,-0.557983,-0.37477,0,-0.015998


In [18]:
data_predictions = predict_model(iforest, data = data)
data_predictions.head()

Unnamed: 0,Date_2023-08-09,Date_2022-10-07,Date_2023-05-03,Date_2023-07-26,Date_2022-12-09,Date_2023-04-24,Date_2022-11-01,Date_2023-07-24,Date_2023-05-17,Date_2023-08-11,...,Open,High,Low,Close,Adj Close,Volume,Cumulative Open,Price Change,Anomaly,Anomaly_Score
0,15.394804,-0.064957,-0.064957,-0.064957,-0.064957,-0.064957,-0.064957,-0.064957,-0.064957,-0.064957,...,0.884414,0.789738,0.807807,0.706858,0.706858,-0.545862,1.66494,0.458126,0,-0.014938
1,-0.064957,15.394804,-0.064957,-0.064957,-0.064957,-0.064957,-0.064957,-0.064957,-0.064957,-0.064957,...,0.530835,0.446671,0.387019,0.305594,0.305594,-0.928936,-1.292078,-0.640903,0,-0.018026
2,-0.064957,-0.064957,15.394804,-0.064957,-0.064957,-0.064957,-0.064957,-0.064957,-0.064957,-0.064957,...,-1.013175,-0.997195,-0.927625,-1.005231,-1.005231,-0.15299,0.533373,-0.193653,0,-0.016752
3,-0.064957,-0.064957,-0.064957,15.394804,-0.064957,-0.064957,-0.064957,-0.064957,-0.064957,-0.064957,...,1.142967,1.141313,1.22796,1.171921,1.171921,-0.670237,1.479704,-1.088154,0,-0.005435
4,-0.064957,-0.064957,-0.064957,-0.064957,15.394804,-0.064957,-0.064957,-0.064957,-0.064957,-0.064957,...,-0.724338,-0.633998,-0.642937,-0.618238,-0.618238,-0.474879,-0.658395,0.238813,0,-0.017662


In [19]:
save_model(iforest,'Final IForest Model')

Transformation Pipeline and Model Successfully Saved


(Pipeline(memory=Memory(location=None),
          steps=[('numerical_imputer',
                  TransformerWrapper(include=['Open', 'High', 'Low', 'Close',
                                              'Adj Close', 'Volume',
                                              'Cumulative Open',
                                              'Price Change'],
                                     transformer=SimpleImputer())),
                 ('categorical_imputer',
                  TransformerWrapper(include=['Date'],
                                     transformer=SimpleImputer(strategy='most_frequent'))),
                 ('onehot_encoding',
                  TransformerWrapper(include=['Date'],
                                     transformer=OneHotEncoder(cols=['Date'],
                                                               handle_missing='return_nan',
                                                               use_cat_names=True))),
                 ('normalize', Transformer

In [32]:
saved_iforest = load_model('Final IForest Model')
saved_iforest

Transformation Pipeline and Model Successfully Loaded
