In [1]:
import pandas as pd
pd.set_option("display.max_columns", None)

# [Quickstart](https://pycaret.gitbook.io/docs/get-started/quickstart)

## [Anomaly Detection](https://pycaret.gitbook.io/docs/get-started/quickstart#https://pycaret.gitbook.io/docs/get-started/quickstart#anomaly-detection)（異常検出）

PyCaret’s **Anomaly Detection Module** is an unsupervised machine learning module that is used for identifying **rare items**, **events**, or **observations** that raise suspicions by differing significantly from the majority of the data. Typically, the anomalous items will translate to some kind of problems such as bank fraud, a structural defect, medical problems, or errors. It provides several [pre-processing](https://pycaret.gitbook.io/docs/get-started/preprocessing) features that prepare the data for modeling through the [setup](https://pycaret.gitbook.io/docs/get-started/functions#setting-up-environment) function. It has over 10 ready-to-use algorithms and [several plots](https://pycaret.gitbook.io/docs/get-started/functions#plot-model) to analyze the performance of trained models.

PyCaret の **異常検出モジュール** は教師なし機械学習モジュールで、データの大部分と大きく異なることで疑いを持たれる **稀な項目**、**イベント**、**観測** を特定するために使用されます。通常、異常な項目は、銀行詐欺、構造的欠陥、医療問題、エラーなど、何らかの問題に変換される。[セットアップ](https://pycaret.gitbook.io/docs/get-started/functions#setting-up-environment)機能を通じて、データをモデリングするための準備となる[前処理](https://pycaret.gitbook.io/docs/get-started/preprocessing)機能をいくつか提供します。また、10種類以上のすぐに使えるアルゴリズムと、学習したモデルの性能を分析するための[複数のプロット](https://pycaret.gitbook.io/docs/get-started/functions#plot-model)を備えています。

### Setup

This function initializes the training environment and creates the transformation pipeline. The setup function must be called before executing any other function. It takes one mandatory parameter only: `data`. All the other parameters are optional.

### セットアップ

この関数は、学習環境を初期化し、変換パイプラインを作成します。setup 関数は他の関数を実行する前に呼び出さなければならない。この関数が受け取る必須パラメータは `data` だけである。他の全てのパラメータは任意である。

In [2]:
from pycaret.datasets import get_data
data = get_data('anomaly')

Unnamed: 0,Col1,Col2,Col3,Col4,Col5,Col6,Col7,Col8,Col9,Col10
0,0.263995,0.764929,0.138424,0.935242,0.605867,0.51879,0.912225,0.608234,0.723782,0.733591
1,0.546092,0.653975,0.065575,0.227772,0.845269,0.837066,0.272379,0.331679,0.429297,0.367422
2,0.336714,0.538842,0.192801,0.553563,0.074515,0.332993,0.365792,0.861309,0.899017,0.0886
3,0.092108,0.995017,0.014465,0.176371,0.24153,0.514724,0.562208,0.158963,0.073715,0.208463
4,0.325261,0.805968,0.957033,0.331665,0.307923,0.355315,0.501899,0.558449,0.885169,0.182754


When the `setup` is executed, PyCaret's inference algorithm will automatically infer the data types for all features based on certain properties. The data type should be inferred correctly but this is not always the case. To handle this, PyCaret displays a prompt, asking for data types confirmation, once you execute the `setup`. You can press enter if all data types are correct or type `quit` to exit the setup.

Ensuring that the data types are correct is really important in PyCaret as it automatically performs multiple type-specific preprocessing tasks which are imperative for machine learning models.

Alternatively, you can also use `numeric_features` and `categorical_features` parameters in the `setup` to pre-define the data types.

`setup` を実行すると、PyCaret の推論アルゴリズムが、特定のプロパティに基づいてすべての素性のデータ型を自動的に推論します。データ型は正しく推論されるはずですが、必ずしもそうなるとは限りません。これを処理するために、PyCaret は `setup` を実行すると、データ型の確認を求めるプロンプトを表示します。すべてのデータ型が正しい場合は Enter キーを、セットアップを終了する場合は `quit` と入力してください。

PyCaret は機械学習モデルで必須となる複数の型固有の前処理を自動的に行うため、データ型が正しいかどうかを確認することは非常に重要です。

また、 `setup` の `numeric_features` と `categorical_features` パラメータを使用して、データ型を事前に定義することも可能です。

In [3]:
from pycaret.anomaly import *
s = setup(data)

Unnamed: 0,Description,Value
0,session_id,2788
1,Original Data,"(1000, 10)"
2,Missing Values,False
3,Numeric Features,10
4,Categorical Features,0
5,Ordinal Features,False
6,High Cardinality Features,False
7,High Cardinality Method,
8,Transformed Data,"(1000, 10)"
9,CPU Jobs,-1


### Create Model

This function trains an unsupervised anomaly detection model. All the available models can be accessed using the `models` function.

### モデルの作成

この関数は教師なし異常検知モデルを学習する。利用可能なすべてのモデルには `models` 関数を使ってアクセスすることができる。

In [4]:
iforest = create_model('iforest')
print(iforest)

IForest(behaviour='new', bootstrap=False, contamination=0.05,
    max_features=1.0, max_samples='auto', n_estimators=100, n_jobs=-1,
    random_state=2788, verbose=0)


In [5]:
models()

Unnamed: 0_level_0,Name,Reference
ID,Unnamed: 1_level_1,Unnamed: 2_level_1
abod,Angle-base Outlier Detection,pyod.models.abod.ABOD
cluster,Clustering-Based Local Outlier,pyod.models.cblof.CBLOF
cof,Connectivity-Based Local Outlier,pyod.models.cof.COF
iforest,Isolation Forest,pyod.models.iforest.IForest
histogram,Histogram-based Outlier Detection,pyod.models.hbos.HBOS
knn,K-Nearest Neighbors Detector,pyod.models.knn.KNN
lof,Local Outlier Factor,pyod.models.lof.LOF
svm,One-class SVM detector,pyod.models.ocsvm.OCSVM
pca,Principal Component Analysis,pyod.models.pca.PCA
mcd,Minimum Covariance Determinant,pyod.models.mcd.MCD


### Analyze Model

### モデルの分析

In [6]:
plot_model(iforest, plot='tsne')

In [7]:
plot_model(iforest, plot='umap')

### Assign Model

This function assigns anomaly labels to the dataset for a given model. (1=outlier, 0=inlier).

### モデルの割り当て

この関数は、与えられたモデルに対して、データセットに異常値ラベルを割り当てる。(1=異常値（外れ値）, 0=インライヤ)。

In [8]:
result = assign_model(iforest)
result.head()

Unnamed: 0,Col1,Col2,Col3,Col4,Col5,Col6,Col7,Col8,Col9,Col10,Anomaly,Anomaly_Score
0,0.263995,0.764929,0.138424,0.935242,0.605867,0.51879,0.912225,0.608234,0.723782,0.733591,0,-0.023643
1,0.546092,0.653975,0.065575,0.227772,0.845269,0.837066,0.272379,0.331679,0.429297,0.367422,0,-0.070447
2,0.336714,0.538842,0.192801,0.553563,0.074515,0.332993,0.365792,0.861309,0.899017,0.0886,0,-0.002528
3,0.092108,0.995017,0.014465,0.176371,0.24153,0.514724,0.562208,0.158963,0.073715,0.208463,1,0.049406
4,0.325261,0.805968,0.957033,0.331665,0.307923,0.355315,0.501899,0.558449,0.885169,0.182754,0,-0.016433


### Predictions

This function generates anomaly labels using a trained model on the new/unseen dataset.

### 予測値

この関数は、新規/未見データセットに対して学習したモデルを用いて、異常ラベルを生成する。

In [9]:
predictions = predict_model(iforest, data=data)
predictions.head()

Unnamed: 0,Col1,Col2,Col3,Col4,Col5,Col6,Col7,Col8,Col9,Col10,Anomaly,Anomaly_Score
0,0.263995,0.764929,0.138424,0.935242,0.605867,0.51879,0.912225,0.608234,0.723782,0.733591,0,-0.023643
1,0.546092,0.653975,0.065575,0.227772,0.845269,0.837066,0.272379,0.331679,0.429297,0.367422,0,-0.070447
2,0.336714,0.538842,0.192801,0.553563,0.074515,0.332993,0.365792,0.861309,0.899017,0.0886,0,-0.002528
3,0.092108,0.995017,0.014465,0.176371,0.24153,0.514724,0.562208,0.158963,0.073715,0.208463,1,0.049406
4,0.325261,0.805968,0.957033,0.331665,0.307923,0.355315,0.501899,0.558449,0.885169,0.182754,0,-0.016433


### Save the model

### モデルの保存

In [10]:
save_model(iforest, 'iforest_pipeline')

Transformation Pipeline and Model Successfully Saved


(Pipeline(memory=None,
          steps=[('dtypes',
                  DataTypes_Auto_infer(categorical_features=[],
                                       display_types=True, features_todrop=[],
                                       id_columns=[], ml_usecase='regression',
                                       numerical_features=[],
                                       target='UNSUPERVISED_DUMMY_TARGET',
                                       time_features=[])),
                 ('imputer',
                  Simple_Imputer(categorical_strategy='most frequent',
                                 fill_value_categorical=None,
                                 fill_value_numerical=None...
                 ('fix_perfect', 'passthrough'),
                 ('clean_names', Clean_Colum_Names()),
                 ('feature_select', 'passthrough'), ('fix_multi', 'passthrough'),
                 ('dfs', 'passthrough'), ('pca', 'passthrough'),
                 ['trained_model',
                  IFo

#### To load the model back in the environment:

#### モデルを環境に戻すには:

In [11]:
loaded_model = load_model('iforest_pipeline')
print(loaded_model)

Transformation Pipeline and Model Successfully Loaded
Pipeline(memory=None,
         steps=[('dtypes',
                 DataTypes_Auto_infer(categorical_features=[],
                                      display_types=True, features_todrop=[],
                                      id_columns=[], ml_usecase='regression',
                                      numerical_features=[],
                                      target='UNSUPERVISED_DUMMY_TARGET',
                                      time_features=[])),
                ('imputer',
                 Simple_Imputer(categorical_strategy='most frequent',
                                fill_value_categorical=None,
                                fill_value_numerical=None...
                ('fix_perfect', 'passthrough'),
                ('clean_names', Clean_Colum_Names()),
                ('feature_select', 'passthrough'), ('fix_multi', 'passthrough'),
                ('dfs', 'passthrough'), ('pca', 'passthrough'),
                ['