# Testing the Timeseries Insights API with Jabil's use case

This notebook contains the code used to interact with the Timeseries Insights API with some publicly available data, and determine how to perform real-time anomaly detection with the data. 

Data was imported into BigQuery. From BiqQuery, we extracted the data, and transformed it in order to fit the Event data JSON format needed to interact with the Timeseries Insights API which can be found [here.](https://cloud.google.com/timeseries-insights/docs/reference/rest/v1/projects.datasets/appendEvents#Event)

After converting the data into a JSONL file, this notebook will show how we went about creating the Timeseries dataset, streaming data to that dataset, and querying the dataset for an anomaly.


## Libraries, SA token authorization, and helper functions
Importing necessary libraries, setting project varibale, and adding functions to help with interacting with Timeseries Insights API

In [None]:
%load_ext autoreload
%autoreload 2
!pip3 install --upgrade oauth2client 
!pip3 install pandas_bokeh
!pip install pyfarmhash
!pip install oauth2client

In [3]:
from oauth2client.client import GoogleCredentials
from google.cloud import bigquery
from google.oauth2 import service_account
from google.cloud import storage
from google.cloud.storage import blob
import pandas as pd
import json
import requests
import matplotlib.pyplot as plt
import pandas_bokeh
import datetime
from datetime import date
import time

In [45]:
#Variables
PROJECT_ID="<project-id>"

#Key file for API access and TSI API endopoint
key_file = '/home/jupyter/TSI-API/keyfile/key.json' # JSON file has key of service account for Vertex AI
ts_endpoint =  f'https://timeseriesinsights.googleapis.com/v1/projects/{PROJECT_ID}/datasets'

In [46]:
# reads json file and returns request body

def read_json_file(path):
    with open(path) as json_file:
        query = json.load(json_file)
        
    return query

In [47]:
# Function to interact with time series API

def query_ts(method, endpoint, data, auth_token):
    data = str(data)
    headers = {'Content-type': 'application/json', "Authorization": f"Bearer {auth_token}"}
    
    if method == "GET":
        resp = requests.get(endpoint, headers=headers)
    if method == "POST":
        resp = requests.post(endpoint, data=data, headers=headers)
    if method == "DELETE":
        resp = requests.delete(endpoint, headers=headers)
    #print(resp.content)
    return(resp.json())

In [None]:
!gcloud auth activate-service-account --key-file {key_file}
!gcloud auth print-access-token
token_array = !gcloud auth print-access-token 
token = token_array[0]
token

## Creation of bucket and list for test data
We will now take the JSON from the full data, and extract all the data from 06/14-06/27. We will be changing the datest on all of this data from be equal to x number of days back from today's date, as if today was 06/28 (this is so that streaming can be properly tested - the Timeseries Insights API requires that appended events be within a certain window of today's date). So for instance, the data from 06/6 will be equal to two days back from today's date.

The data from 06/14-06/24 (which will look like roughly data from 16 days ago to 6 days ago) will be store in a Cloud Storage bucket that will be used to create a dataset in batch form.

The data from 06/24-06/27 will be put into a list so that it can be used to test the appendEvents functionality.

In [None]:
# Extract data difference

from datetime import date
today = date.today()
days = datetime.timedelta(5)
new = today - days
print(today)
print(new)

In [15]:
# Initialize variables

batch_events = [] #list for items to go in Cloud Storage to generate dataset
batch_full = [] # as sanity check, another dataset with all events done in batch will be done to check anomaly detection
events = [] # list of items that will be appended via append events

sampletoday = '2021-06-28'
dateobjtoday = datetime.datetime.strptime(sampletoday, '%Y-%m-%d')
todaysdate = dateobjtoday.date()


#reading events from json file, and appending to events list and batch JSON file
with open('/home/jupyter/TSI-API/jbl-full-ts2.json', 'r') as myfile:
    for line in myfile:
        json_load = json.loads(line)

        if json_load['eventTime'] >= "2021-06-14T00:00:04+00:00" and json_load['eventTime'] < "2021-06-23T00:00:04+00:00":
            
            date_current = datetime.datetime.strptime(json_load['eventTime'], "%Y-%m-%dT%H:%M:%S+00:00")
            shortform = date_current.date()
            stringdate = shortform.strftime('%Y-%m-%d')
            diff = todaysdate - shortform

            real_date = date.today()
            input_date = date.today() - diff
            stringnewdate = input_date.strftime('%Y-%m-%d')
  
            json_load['eventTime'] = json_load['eventTime'].replace(stringdate, stringnewdate)

            batch_events.append(json_load)
            batch_full.append(json_load)
            
        elif json_load['eventTime'] >= "2021-06-23T00:00:04+00:00" and json_load['eventTime'] < "2021-06-27T00:00:04+00:00":
            
            date_current = datetime.datetime.strptime(json_load['eventTime'], "%Y-%m-%dT%H:%M:%S+00:00")
            shortform = date_current.date()
 
            stringdate = shortform.strftime('%Y-%m-%d')
            diff = todaysdate - shortform
      
            real_date = date.today()
            input_date = date.today() - diff
            stringnewdate = input_date.strftime('%Y-%m-%d')

            json_load['eventTime'] = json_load['eventTime'].replace(stringdate, stringnewdate)

            events.append(json_load)
            batch_full.append(json_load)

In [42]:
#print length of lists as sanity check

print(len(events))
print(len(batch_events))
print(len(batch_full))

14775
25423
40198


## Write batch data to bucket
We will take the two lists that include both all the data points and the first half of data points, and store them in Cloud Storage so that we can reference these files in CS when creating the dataset

In [19]:
#write batch data to file and then export to Cloud Storage bucket
with open('/home/jupyter/TSI-API/data/batchdata.json', 'w') as file_out:
    for d in batch_events:
        json.dump(d, file_out)
        file_out.write('\n')
    
#full batch data
with open('/home/jupyter/TSI-API/data/batchdatafull.json', 'w') as file_out:
    for d in batch_full:
        json.dump(d, file_out)
        file_out.write('\n')
    
client = storage.Client(project='{PROJECT_ID}')
bucket = client.get_bucket('tsi-data')
blob = bucket.blob('batch1.json')
with open('/home/jupyter/TSI-API/data/batchdata.json', 'rb') as file_out:
    blob.upload_from_file(file_out)

blob2 = bucket.blob('batch_full1.json')
with open('/home/jupyter/TSI-API/data/batchdatafull.json', 'rb') as file_out:
    blob2.upload_from_file(file_out)

## Create new dataset from bucket data
Now we will create a Timeseries Insights API dataset from our batch data that we just stored in a bucket in the file 'batch1.json'.
To understand the reasoning behind the JSON payload format we have to use to send data to the API, please see the documentation [here.](https://cloud.google.com/timeseries-insights/docs/reference/rest/v1/projects.datasets#DataSet)

In [None]:
# Create dataset using API

file_data = {
    "name": "test", 
    "ttl": "30000000s",
    "dataNames": [
        "measure",
        "Humidity",
        "Light",
        "h2_raw",
        "temp",
    ],
    "dataSources": [
        {"uri": "gs://tsi-data/batch1.json"} # sample of data in Cloud Storage JSON file
    ], 
} 
res = query_ts(method="POST", endpoint=ts_endpoint, data=file_data, auth_token=token)
res

### list data to check when creates
Now we can call the datasets.list method of the API, which returns all the loaded datasets and those in the process of being loaded. This let's us know when our dataset is ready for querying.

In [None]:
listdata = query_ts(method="GET", endpoint=ts_endpoint, data="", auth_token=token)
listdata

## Append more recent events to dataset in streaming fashion
This cell loops through a number of the items in our events list (all items after the events we ulpoaded in bath form) and then inidvidually appends those events to the dataset in order to simulate what would be streaming time. For this example, we are going to only append the first 100 events

In [None]:
url_endpoint = f'https://timeseriesinsights.googleapis.com/v1/projects/{PROJECT_ID}/datasets/partialbatch:appendEvents'

#body containing streamed events
request_body = {
   "events":[]
}

#choose number of appends - any number greator than 1 appends all events:
appends = 1 # use this variable to toggle between 0 and >0 for uploading one or several events
i = 0

#TODO sleep 3 sec
if appends > 0:
    # iterating though 100 events and appending to data set
    for event in events[0:100] : 
        request_body['events'] = [event]
        res = query_ts('POST', url_endpoint, request_body, token)
        time.sleep(.1) # sleep to make sure not too many request are sent to API at once
        i = i + 1
        if i % 100 == 0:
            print('another 100') #check to see progress
else:
    #testing one append
    request_body['events'] = events[0]
    res = query_ts('POST', url_endpoint, request_body, token)
        
# printing result of last api call    

print('done')
print(events[90]) # We are printing this event so that we can extract a timestamp to query the dataset later

## Query appeneded event
We will now test out the query funcitonality of the API, which determines if the point we query is an anomaly or not, based on serveral timeseries parameters we specify to the API. The documentation on the format of the query can be found [here.](https://cloud.google.com/timeseries-insights/docs/reference/rest/v1/projects.datasets/query)

In [50]:
request_body = {
    "detectionTime": "2022-04-20T00:35:20Z", # Input the date of a timestamp that was appneded to your dataset. We will query the timestamp at point event[90]
    "slicingParams": {
        "dimensionNames": ["measure"]
        },
    "timeseriesParams": {
        "forecastHistory": "43000s",
        "granularity": "450s",
        "metric": "temp"
        },
    "forecastParams": {
        "sensitivity": 0.90,
        "noiseThreshold": 12.0,
        "seasonalityHint": "DAILY"
        },
   
    "returnNonAnomalies": "true",
    "returnTimeseries": "true"
}

dataset = "test"

# get forecast

query_ds_endpt = f'https://timeseriesinsights.googleapis.com/v1/projects/{PROJECT_ID}/datasets/{dataset}:query'
res = query_ts(method="POST", endpoint=query_ds_endpt, data=request_body, auth_token=token)
res

{'name': 'projects/pnishit-mlai/datasets/test',
 'anomalyDetectionResult': {'nonAnomalies': [{'dimensions': [{'name': 'measure',
      'stringVal': 'LTTH'}],
    'result': {'holdoutErrors': {'mdape': 0.018393232600039378,
      'rmd': 0.3233111289100961},
     'trainingErrors': {'mdape': 0.010853995807362472,
      'rmd': 0.017651694827175206},
     'forecastStats': {'density': '95', 'numAnomaliesInHoldout': 5},
     'history': {'point': [{'time': '2022-04-19T12:30:00Z',
        'value': 690.0871092569982},
       {'time': '2022-04-19T12:37:30Z', 'value': 691.4770979135792},
       {'time': '2022-04-19T12:45:00Z', 'value': 728.5865483626636},
       {'time': '2022-04-19T12:52:30Z', 'value': 692.4363495380096},
       {'time': '2022-04-19T13:00:00Z', 'value': 692.5467742864121},
       {'time': '2022-04-19T13:07:30Z', 'value': 692.5157960164504},
       {'time': '2022-04-19T13:15:00Z', 'value': 691.6372840701262},
       {'time': '2022-04-19T13:22:30Z', 'value': 690.5362924507635},
    

## Finally, testing out the last API method, evaluateSlice
This endpoint evaluates a specific "slice" of a dataset instead of performing anomaly detection on multiple. Our data is one-dimensional in nature, so we only have one "slice" to begin with. More information on the method can be found [here.](https://cloud.google.com/timeseries-insights/docs/reference/rest/v1/projects.datasets/evaluateSlice)

In [36]:
file_data = {
    "pinnedDimensions": [
        {
            "name":"measure",
            "stringVal":"LTTH"
        }
        ],
    "detectionTime": "2022-04-20T00:30:13Z", #this time can be wherever you want to query
    "timeseriesParams": {
        "forecastHistory": "50000s",
        "granularity": "450s",
        "metric": "temp"
        },
    "forecastParams": {
        "sensitivity": 0.90,
        "noiseThreshold": 12.0,
        "seasonalityHint": "DAILY"
        }
}

# Fetch timeseries for inspection
dataset_name = "test"
fetch_ds_endpt = f'https://timeseriesinsights.googleapis.com/v1/projects/{PROJECT_ID}/datasets/{dataset_name}:evaluateSlice'
res = query_ts(method="POST", endpoint=fetch_ds_endpt, data=file_data, auth_token=token)