# Multivariate Anomaly Detection Demo Notebook

## Contents

1. [Introduction](#intro)
2. [Prerequisites](#pre)
3. [Train a Model](#train)
4. [List Models](#list)


## 1. Introdution <a class="anchor" id="intro"></a>
This notebook shows how to use [Multivariate Anomaly Detection](https://docs.microsoft.com/en-us/azure/cognitive-services/anomaly-detector/overview-multivariate) in Anomaly Detector service. Please follow the steps to try it out, you can either [join Teams Group](https://forms.office.com/Pages/ResponsePage.aspx?id=v4j5cvGGr0GRqy180BHbRxSkyhztUNZCtaivu8nmhd1UQ1VFRDA0V1dUMDJRMFhOTzFHQ1lDTVozWi4u) for any questions, or email us via AnomalyDetector@microsoft.com

## 2. Prerequisites <a class="anchor" id="pre"></a>


* [Create an Azure subscription](https://azure.microsoft.com/free/cognitive-services) if you don't have one.
* [Create an Anomaly Detector resource](https://ms.portal.azure.com/#create/Microsoft.CognitiveServicesAnomalyDetector) and get your `endpoint` and `key`, you'll use these later.
* (**optional**) [Install Azure CLI](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli) A helpful tool to manipulate your Azure resources. You can use Azure CLI to retrieve credential information without pasting them as plain text.
* (**optional**) Login with Azure CLI `az login`

## 3. Export the following environment variables

BLOB_SAS_TEMPLATE


* **Install** the anomaly detector SDK and storage packages using following codes ‚¨áÔ∏è, and **import** packages.

In [None]:
# Install required packages. Use the following commands to install the anomaly detector SDK and required packages.
# ! pip install --upgrade azure-ai-anomalydetector
# ! pip install azure-storage-blob
# ! pip install azure-mgmt-storage

In [None]:
# Install optional packages to see interactive visualization in this Jupyter notebook.
# ! pip install plotly==5.5.0
# ! pip install notebook>=5.3 
# ! pip install ipywidgets>=7.5
# ! pip install pandas

In [None]:
# Import related packages:

from azure.ai.anomalydetector import AnomalyDetectorClient
from azure.ai.anomalydetector.models import DetectionRequest, ModelInfo, DetectionStatus
from azure.core.credentials import AzureKeyCredential
from azure.core.exceptions import HttpResponseError
from azure.storage.blob import BlobClient, BlobServiceClient, generate_blob_sas, BlobSasPermissions
from datetime import datetime, timedelta
from dotenv import load_dotenv
from pathlib import Path
import os
import pandas as pd
import tempfile
import time
import zipfile

In [None]:
# Load environment variables

env_path = Path('..') / '.env'
load_dotenv(dotenv_path=env_path)

storage_connection_string = os.environ.get('storage_connection_string')
anomaly_detector_endpoint = os.environ.get('anomaly_detector_endpoint')
anomaly_detector_key = os.environ.get('anomaly_detector_key')

temp_dir = tempfile.gettempdir()
blob_name = "training_mvad.zip"
model_id = ''

zip_filename = temp_dir + blob_name

### Dataset

We will use a simulated dataset **([multivariate_sample_data.csv](https://github.com/Azure-Samples/AnomalyDetector/blob/master/ipython-notebook/SDK%20Sample/multivariate_sample_data.csv))** in the Github repository. This dataset contains five variables which represent different variables from an equipment.

If you'd like to use your own dataset to run this notebook, you should do the following steps first (üé¨[video instruction](https://msit.microsoftstream.com/video/afa00840-98dc-ae72-fad1-f1ec0fe830c1)/[video backup](https://github.com/Azure-Samples/AnomalyDetector/blob/master/ipython-notebook/media/How%20to%20generate%20a%20SAS.mp4)):
1. (optional) Split your full csv files into individual csv files that each file contains the data for one variable.
1. Compress your local csv files(one metric per file), see [input data schema](https://docs.microsoft.com/en-us/azure/cognitive-services/anomaly-detector/concepts/best-practices-multivariate#input-data-schema)..
1. Upload the compressed file to Azure Blob.
1. Generate an `SAS URL` for your compressed file.

In [None]:
# data visualization
df = pd.read_csv("./training/sensors.csv", index_col="timestamp")
df

Next let's draw an interactive plot. You may zoom in/out through clicking 'autoscale' and select an area or select a variabe for further investigation.

### Sample code to generate SAS (for reference only)


In [None]:
# Load Azure Anomaly Detector helper functions

class MultivariateSample:

    def __init__(self, anomaly_detector_endpoint=None, anomaly_detector_key=None, model_id=None, connection_string=None, container=None, blob_name=None):
        self.blob_name = blob_name
        self.container = container
        self.connection_string = connection_string
        self.model_id = model_id
        self.anomaly_detector_endpoint = anomaly_detector_endpoint
        self.anomaly_detector_key = anomaly_detector_key

        # Create an Anomaly Detector client

        # <client>
        self.ad_client = AnomalyDetectorClient(AzureKeyCredential(self.anomaly_detector_key), self.anomaly_detector_endpoint)
        # </client>        

    def zip_data(self, df, zip_filename):
        # Zip data files
        zip_file = zipfile.ZipFile(zip_filename, "w", zipfile.ZIP_DEFLATED)

        for variable in df.columns:
            individual_df = pd.DataFrame(df[variable].values, index=df.index, columns=["value"])
            individual_df.to_csv(temp_dir + "/" + variable + ".csv", index=True)
            zip_file.write(temp_dir + "/" + variable + ".csv", arcname=variable + ".csv")

        zip_file.close()

    def upload_blob(self, filename):
        blob_client = BlobClient.from_connection_string(self.connection_string, container_name=self.container, blob_name=self.blob_name)
        with open(filename, "rb") as f:
            blob_client.upload_blob(f, overwrite=True)

    def train_model(self, start_time, end_time, sliding_window):
        data_source = self.generate_data_source_sas(self.container, self.blob_name)
        data_feed = ModelInfo(start_time=start_time, end_time=end_time, source=data_source, sliding_window=sliding_window)
        response_header = self.ad_client.train_multivariate_model(data_feed, cls=lambda *args: [args[i] for i in range(len(args))])[-1]
        trained_model_id = response_header['Location'].split("/")[-1]
        print(f"model id: {trained_model_id}")

        model_status = self.ad_client.get_multivariate_model(trained_model_id).model_info.status
        print(f"model status: {model_status}")

        while model_status != "READY" and model_status != "FAILED":   
            time.sleep(10)         
            model_status = self.ad_client.get_multivariate_model(trained_model_id).model_info.status
            print(f"model status: {model_status}")            

    def detect(self, start_time, end_time):
        # Detect anomaly in the same data source (but a different interval)
        try:
            data_source = self.generate_data_source_sas(self.container, self.blob_name)
            detection_req = DetectionRequest(source=data_source, start_time=start_time, end_time=end_time)
            response_header = self.ad_client.detect_anomaly(self.model_id, detection_req,
                                                            cls=lambda *args: [args[i] for i in range(len(args))])[-1]
            result_id = response_header['Location'].split("/")[-1]

            # Get results (may need a few seconds)
            r = self.ad_client.get_detection_result(result_id)
            print("Get detection result...(it may take a few seconds)")

            while r.summary.status != DetectionStatus.READY and r.summary.status != DetectionStatus.FAILED:
                r = self.ad_client.get_detection_result(result_id)
                print("waiting for anomaly detection result...")
                time.sleep(1)

            if r.summary.status == DetectionStatus.FAILED:
                print("Detection failed.")
                if r.summary.errors:
                    for error in r.summary.errors:
                        print("Error code: {}. Message: {}".format(error.code, error.message))
                else:
                    print("None")
                return None

        except HttpResponseError as e:
            print('Error code: {}'.format(e.error.code), 'Error message: {}'.format(e.error.message))
            return None
        except Exception as e:
            raise e

        return r

    def list_models(self):
        model_list = list(self.ad_client.list_multivariate_model(skip=0, top=100))
        model_summary = pd.DataFrame([{"model_id": m.model_id, "status": m.status} for m in model_list[:50]])
        display(model_summary)

        model = model_list[0]
        vars(model)

    def generate_data_source_sas(self, container, blob_name):
        BLOB_SAS_TEMPLATE = "{blob_endpoint}{container_name}/{blob_name}?{sas_token}"

        blob_service_client = BlobServiceClient.from_connection_string(conn_str=self.connection_string)
        sas_token = generate_blob_sas(account_name=blob_service_client.account_name,
                                    container_name=container, blob_name=blob_name,
                                    account_key=blob_service_client.credential.account_key,
                                    permission=BlobSasPermissions(read=True),
                                    expiry=datetime.utcnow() + timedelta(days=1))
        blob_sas = BLOB_SAS_TEMPLATE.format(blob_endpoint=blob_service_client.primary_endpoint,
                                            container_name=container, blob_name=blob_name, sas_token=sas_token)
        return blob_sas

In [None]:

sample = MultivariateSample(anomaly_detector_endpoint, anomaly_detector_key, model_id, storage_connection_string, 'data', blob_name)

if not df.empty:
    sample.zip_data(df, zip_filename)
    sample.upload_blob(zip_filename)

## 3. Train a model <a class="anchor" id="train"></a>

Before you train a model, you should specify the `subscription key` and `endpoint` of your Anomaly Detector service to create an Anomaly Detector client in the following cell.

- Specify the timespan of training data using `start_time` and `end_time`.

In [None]:
start_time = df.index[0]
end_time = df.index[-1]
sliding_window = 50

sample.train_model(start_time, end_time, sliding_window)

# Hyperparameter of model - controls how much data for input
# If the data has natural period eg weekly or daily - fit the data to the pattern with the sliding_window
# If data changes rapid then helpful to have a longer sliding window - ideally covering a pattern
# If the data doesn't change much then a smaller window is good - reduces processing time
# Sliding_window controls min length of data - A sliding window of 28 means you must at least provide 28 data points


### Get Model Status
‚òïÔ∏èTraining process might take few minutes to few hours (depending on the data size, in this sample case it'll take you within 3 minutes), take a cup of coffee and come back then, waiting for its status to be **READY**.

## 4.  List Models <a class="anchor" id="list"></a>
List models that have been trained previously.

In [None]:
sample.list_models()