# **Smarter anomaly detection** - Training an anomaly detection model
*Part 2 - Model training*

## Initialization
---
This repository is structured as follow:

```sh
. smarter-anomaly-detection
|
├── data/
|   ├── interim                          # Temporary intermediate data are stored here
|   ├── processed                        # Finalized datasets ready to be moved to Amazon S3
|   └── raw                              # Immutable original data are stored here
|
└── notebooks/
    ├── 1_data_preparation.ipynb
    ├── 2_model_training.ipynb           <<< THIS NOTEBOOK <<<
    └── 3_model_evaluation.ipynb
```

### Notebook configuration update

In [None]:
!python -m pip install --upgrade pip
!pip install --quiet --upgrade sagemaker tqdm lookoutequipment

### Imports

In [None]:
import synthetic_config as config
import os
import pandas as pd
import sagemaker

from datetime import datetime

# SDK / toolbox for managing Lookout for Equipment API calls:
import lookoutequipment as lookout

In [None]:
PROCESSED_DATA = os.path.join('..', 'data', 'processed')
TRAIN_DATA     = os.path.join(PROCESSED_DATA, 'train-data')

ROLE_ARN        = sagemaker.get_execution_role()
DATASET_NAME    = config.DATASET_NAME
MODEL_NAME      = config.MODEL_NAME
BUCKET          = config.BUCKET
PREFIX_TRAINING = config.PREFIX_TRAINING
PREFIX_LABEL    = config.PREFIX_LABEL

## Data ingestion
---

In [None]:
lookout_dataset = lookout.LookoutEquipmentDataset(
    dataset_name=DATASET_NAME,
    component_root_dir=f's3://{BUCKET}/{PREFIX_TRAINING}',
    access_role_arn=ROLE_ARN
)
lookout_dataset.create()
response = lookout_dataset.ingest_data(BUCKET, PREFIX_TRAINING)

We use the following cell to monitor the ingestion process by calling the following method, which encapsulates the [**DescribeDataIngestionJob**](https://docs.aws.amazon.com/lookout-for-equipment/latest/ug/API_DescribeDataIngestionJob.html) API and runs it every 60 seconds:

In [None]:
lookout_dataset.poll_data_ingestion(sleep_time=15)

In case any issue arise, you can inspect the API response available as a JSON document:

In [None]:
lookout_dataset.ingestion_job_response

## Model training
---

In [None]:
# Configuring time ranges:
training_start   = pd.to_datetime('2021-01-01 00:00:00')
training_end     = pd.to_datetime('2021-05-31 23:55:00')
evaluation_start = pd.to_datetime('2021-06-01 00:00:00')
evaluation_end   = pd.to_datetime('2021-12-31 23:55:00')

print(f'  Training period | from {training_start} to {training_end}')
print(f'Evaluation period | from {evaluation_start} to {evaluation_end}')

In [None]:
lookout_model = lookout.LookoutEquipmentModel(model_name=MODEL_NAME, dataset_name=DATASET_NAME)
lookout_model.set_time_periods(evaluation_start, evaluation_end, training_start, training_end)
lookout_model.set_label_data(bucket=BUCKET, prefix=PREFIX_LABEL, access_role_arn=ROLE_ARN)
lookout_model.train()

A training is now in progress. Use the following cell to capture the model training progress. **This model should take around 30-45 minutes to be trained:**

In [None]:
lookout_model.poll_model_training(sleep_time=60)

## Conclusion
---
In this notebook, you ingested the data you prepared previously and trained an anomaly detection model with Amazon Lookout for Equipment.

In the next notebook of this series, you will dive into the results of this trained model.