# Real-time anomaly detection using Timeseries Insights API 

### Summary

This notebooks demostrate how to use Google Cloud's [Timeseries Insights API](https://cloud.google.com/timeseries-insights) for real time anomaly detection in time series data. This tutorial covers API dataset creation, querying for anomaly, append new data and deletion of unwanted API datasets. 

### Prerequisites

- Timeseries dataset wth attributes that you want to detect anomaly on

### Objectives

 - Setup resources
 - Create dataset json file 
 - Create and list all API dataset using json file in cloud storage bucket
 - Querying for anomaly in an API dataset
 - Append new data stream to an existing dataset
 - Consume the results for further analysis
 - Deleteting unwanted API dataset
 - Cleanup

#### Setup resources

In [2]:
# Install dependencises

from oauth2client.client import GoogleCredentials
from google.cloud import bigquery
from google.oauth2 import service_account
from google.cloud import storage
import pandas as pd
import json
import requests
import datetime
from datetime import date
import time

In [3]:
# Setup variables 

project = !gcloud config get-value project
PROJECT_ID = project[0]
REGION = "us-central1"
BUCKET_NAME = "<YOUR-BUCKET-NAME>"
FILE_PATH = "<local-path-to-your-processed-file>/data-file.json"

print(PROJECT_ID, REGION, BUCKET_NAME)

pnishit-mlai us-central1 <YOUR-BUCKET-NAME>


#### Create and upoad dataset file to cloud storage bucket 

In [None]:
# Create bucket and upload the processed data file

! gsutil mb -l $REGION -c standard gs://$BUCKET_NAME

! gcloud alpha storage cp $FILE_PATH gs://$BUCKET_NAME/

#### Allowlist the API

Since the API is in public preview at the moment, you'll need to allowlist Google cloud project you are working on. Once the API is GA, you don't need to perform this step and can skip it. 

Click [here](https://services.google.com/fb/forms/timeseries-insights-api-preview-registration/) to allowlist your Google cloud project. You won't be able to access the api till this step is completed. 

#### Authenticate your Google Cloud account

**If you are using Vertex AI Workbench Notebooks**, your environment is already authenticated. Skip this step.

In the Cloud Console, go to the [Create service account key](https://console.cloud.google.com/apis/credentials/serviceaccountkey) page.

1. **Click Create service account**.

2. In the **Service account name** field, enter a name, and click **Create**.

3. In the **Grant this service account access to project** section, click the Role drop-down list. Select "Other" form the list, and scroll down and select **Timeseries Insights DataSet Owner**.

4. Click Create. A JSON file that contains your key downloads to your local environment.

In [None]:
# Helper functions 

def query_ts(method, endpoint, data, auth_token):
    data = str(data)
    headers = {'Content-type': 'application/json', "Authorization": f"Bearer {auth_token}"}
    
    if method == "GET":
        resp = requests.get(endpoint, headers=headers)
    if method == "POST":
        resp = requests.post(endpoint, data=data, headers=headers)
    if method == "DELETE":
        resp = requests.delete(endpoint, headers=headers)

    return(resp.json())

In [None]:
# authorize the service account

!gcloud auth activate-service-account --key-file {key_file}
token_array = !gcloud auth print-access-token 
token_array

#### Create and list API dataset

The first step in anomaly detection is to create dataset using the json data file from cloud storage bucket. The data file needs to be in the cloud storage bucket as during the dataset creation payload, it requires path to the file. Once you call the dataset create API method, tt can take a while to create a dataset depending on the dataset size. 

A list dataset method can be called to check the status of all datasets for Timeseries Insights API. All the datasets that have been loaded correctly will have status as `LOADED` and the ones that are currently being indexes will have status as `LOADING`. Note that a dataset can be queries for anomaly only after indexing is done and the dataset status changes to `LOADED`

In [None]:
# Create dataset using API

anomaly_dataset_payload = {
    "name": "anomaly-data", 
    "ttl": "3000000s", # Set this only if using appending later. This tells API what records to discard. Events > ttl are discarded 
    "dataNames": [
        "measure",
        "Humidity",
        "Light",
        "h2",
        "temp",
    ],
    "dataSources": [
        {"uri": f"gs://{BUCKET_NAME}/data-file.json"} 
    ], 
} 

The above json objet is what is used to create the dataset in the Timeseries Insights API. The `name` attribute is just the name of API dataset. You can set any descriptive name you wish for it. `ttl` attribute stands for time to live (in seconds) which can be used to discard events when new data is being appended. This tells the API which older records to discard when creating the timeseries. The records with timestamp older than ttl values is discarded. Here ttl value is set to 3 million seconds which is roughly 35 days.

Next `dataNames` attribute contains all the dimensions from your dataset that you want to index. Note that you can index all or subset of dimensions from your dataset. You can only query for anomalies using the dimensions that are indexed. In this case, all the dimensions from dataset have been used. Lastly, `dataSources` attribute contains the cloud storage bucket uri of json file that contains all your data.

**Note: Once dataset has been created and indexes from given file, that file is no longer necessary for API to funtion. Adding new data to the file won't automatically create indexes on new data. You can only index new data using append method.**

In [None]:

res = query_ts(method="POST", endpoint=ts_endpoint, data=anomaly_dataset_payload, auth_token=token_array[0])
res

#### List dataset

After running above command to create dataset, you can check the status using following commands

In [None]:
listdata = query_ts(method="GET", endpoint=ts_endpoint, data="", auth_token=token_array[0])
listdata

In [None]:
#### Querying for anomaly