# **Smarter anomaly detection** - Training an anomaly detection model
*Part 2 - Model training*

## Initialization
---
This repository is structured as follow:

```sh
. smarter-anomaly-detection
|
├── data/
|   ├── interim                          # Temporary intermediate data are stored here
|   ├── processed                        # Finalized datasets ready to be moved to Amazon S3
|   └── raw                              # Immutable original data are stored here
|
└── notebooks/
    ├── 1_data_preparation.ipynb
    ├── 2_model_training.ipynb           <<< THIS NOTEBOOK <<<
    └── 3_model_evaluation.ipynb
```

### Notebook configuration update

In [None]:
!python -m pip install --upgrade pip
!pip install --quiet --upgrade sagemaker tqdm lookoutequipment

### Imports

In [1]:
import config
import os
import pandas as pd
import sagemaker

from datetime import datetime

# SDK / toolbox for managing Lookout for Equipment API calls:
import lookoutequipment as lookout

In [2]:
PROCESSED_DATA = os.path.join('..', 'data', 'processed')
TRAIN_DATA     = os.path.join(PROCESSED_DATA, 'train-data')

ROLE_ARN        = sagemaker.get_execution_role()
DATASET_NAME    = config.DATASET_NAME
MODEL_NAME      = config.MODEL_NAME
BUCKET          = config.BUCKET
PREFIX_TRAINING = config.PREFIX_TRAINING
PREFIX_LABEL    = config.PREFIX_LABEL

## Data ingestion
---

In [3]:
lookout_dataset = lookout.LookoutEquipmentDataset(
    dataset_name=DATASET_NAME,
    component_root_dir=f's3://{BUCKET}/{PREFIX_TRAINING}',
    access_role_arn=ROLE_ARN
)
lookout_dataset.create()
response = lookout_dataset.ingest_data(BUCKET, PREFIX_TRAINING)

Dataset "water-pump" does not exist, creating it...



We use the following cell to monitor the ingestion process by calling the following method, which encapsulates the [**DescribeDataIngestionJob**](https://docs.aws.amazon.com/lookout-for-equipment/latest/ug/API_DescribeDataIngestionJob.html) API and runs it every 60 seconds:

In [4]:
lookout_dataset.poll_data_ingestion(sleep_time=60)

2022-04-25 11:52:30 | Data ingestion: IN_PROGRESS
2022-04-25 11:53:30 | Data ingestion: IN_PROGRESS
2022-04-25 11:54:30 | Data ingestion: IN_PROGRESS
2022-04-25 11:55:30 | Data ingestion: SUCCESS


In case any issue arise, you can inspect the API response available as a JSON document:

In [5]:
lookout_dataset.ingestion_job_response

{'JobId': '04902d874a5ddb422f0d79a56d875b9f',
 'DatasetArn': 'arn:aws:lookoutequipment:eu-west-1:038552646228:dataset/water-pump/811910d0-4897-42b9-b35f-03cd0649a641',
 'IngestionInputConfiguration': {'S3InputConfiguration': {'Bucket': 'lookout-equipment-poc',
   'Prefix': 'smarter-ad/training-data/'}},
 'RoleArn': 'arn:aws:iam::038552646228:role/service-role/AmazonSageMaker-ExecutionRole-20210903T075832',
 'CreatedAt': datetime.datetime(2022, 4, 25, 11, 51, 27, 78000, tzinfo=tzlocal()),
 'Status': 'SUCCESS',
 'ResponseMetadata': {'RequestId': 'a6bb1563-6361-481f-926a-bd3668332417',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': 'a6bb1563-6361-481f-926a-bd3668332417',
   'content-type': 'application/x-amz-json-1.0',
   'content-length': '1063',
   'date': 'Mon, 25 Apr 2022 11:55:30 GMT'},
  'RetryAttempts': 0}}

## Model training
---

In [6]:
# Configuring time ranges:
training_start   = pd.to_datetime('2016-07-28 00:04:00')
training_end     = pd.to_datetime('2017-03-31 23:59:00')
evaluation_start = pd.to_datetime('2017-04-01 00:04:00')
evaluation_end   = pd.to_datetime('2018-08-31 23:59:00')

print(f'  Training period | from {training_start} to {training_end}')
print(f'Evaluation period | from {evaluation_start} to {evaluation_end}')

  Training period | from 2016-07-28 00:04:00 to 2017-03-31 23:59:00
Evaluation period | from 2017-04-01 00:04:00 to 2018-08-31 23:59:00


In [7]:
lookout_model = lookout.LookoutEquipmentModel(model_name=MODEL_NAME, dataset_name=DATASET_NAME)
lookout_model.set_time_periods(evaluation_start, evaluation_end, training_start, training_end)
lookout_model.set_label_data(bucket=BUCKET, prefix=PREFIX_LABEL, access_role_arn=ROLE_ARN)
lookout_model.train()

{'ModelArn': 'arn:aws:lookoutequipment:eu-west-1:038552646228:model/water-pump-model/ca3e04ec-ee1f-43ab-a5d5-fb6ec9664798',
 'Status': 'IN_PROGRESS',
 'ResponseMetadata': {'RequestId': 'c058c670-1e51-4ca8-8df1-700a12549739',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': 'c058c670-1e51-4ca8-8df1-700a12549739',
   'content-type': 'application/x-amz-json-1.0',
   'content-length': '145',
   'date': 'Mon, 25 Apr 2022 11:55:31 GMT'},
  'RetryAttempts': 0}}

A training is now in progress. Use the following cell to capture the model training progress. **This model should take around 30-45 minutes to be trained:**

In [8]:
lookout_model.poll_model_training(sleep_time=60)

2022-04-25 11:56:31 | Model training: IN_PROGRESS
2022-04-25 11:57:31 | Model training: IN_PROGRESS
2022-04-25 11:58:31 | Model training: IN_PROGRESS
2022-04-25 11:59:31 | Model training: IN_PROGRESS
2022-04-25 12:00:32 | Model training: IN_PROGRESS
2022-04-25 12:01:32 | Model training: IN_PROGRESS
2022-04-25 12:02:32 | Model training: IN_PROGRESS
2022-04-25 12:03:32 | Model training: IN_PROGRESS
2022-04-25 12:04:32 | Model training: IN_PROGRESS
2022-04-25 12:05:32 | Model training: IN_PROGRESS
2022-04-25 12:06:32 | Model training: IN_PROGRESS
2022-04-25 12:07:33 | Model training: IN_PROGRESS
2022-04-25 12:08:33 | Model training: IN_PROGRESS
2022-04-25 12:09:33 | Model training: IN_PROGRESS
2022-04-25 12:10:33 | Model training: IN_PROGRESS
2022-04-25 12:11:33 | Model training: IN_PROGRESS
2022-04-25 12:12:33 | Model training: IN_PROGRESS
2022-04-25 12:13:33 | Model training: IN_PROGRESS
2022-04-25 12:14:33 | Model training: IN_PROGRESS
2022-04-25 12:15:34 | Model training: IN_PROGRESS


## Conclusion
---
In this notebook, you ingested the data you prepared previously and trained an anomaly detection model with Amazon Lookout for Equipment.

In the next notebook of this series, you will dive into the results of this trained model.